jjzjj

为什么会报错 malloc(): invalid size (unsorted)?

全能资料库 2023-08-09 原文

问题:为什么会报错 malloc(): invalid size (unsorted)?

我在https://github.com/JamesRead5737/webcrawler/blob/master/crawler.c有一个网络爬虫代码,它产生了一些我无法解释的奇怪错误。最常见的是它中止并出现错误malloc(): invalid size (unsorted)

回溯显示:

(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff760e859 in __GI_abort () at abort.c:79
#2  0x00007ffff76793ee in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff77a3285 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff768147c in malloc_printerr (str=str@entry=0x7ffff77a5a50 "malloc(): invalid size (unsorted)") at malloc.c:5347
#4  0x00007ffff7684234 in _int_malloc (av=av@entry=0x7ffff77d4b80 <main_arena>, bytes=bytes@entry=8200) at malloc.c:3736
#5  0x00007ffff7686419 in __GI___libc_malloc (bytes=8200) at malloc.c:3066
#6  0x00005555555578b3 in html_link_find (url=0x55555d0f8b08 "https://www.android.com/intl/en_us//security-center/", 
    html=0x55555f9e6c00 "<!DOCTYPE html>\n<html lang=\"en\" dir=\"ltr\">\n  <head>\n    <meta charset=\"utf-8\">\n    <title>Android Security Center</title>\n    <meta content=\"initial-scale=1, minimum-scale=1, width=device-width\" name="...) at crawler.c:455
#7  0x0000555555557d70 in html_parse (url=0x55555d0f8b08 "https://www.android.com/intl/en_us//security-center/", 
    html=0x55555f9e6c00 "<!DOCTYPE html>\n<html lang=\"en\" dir=\"ltr\">\n  <head>\n    <meta charset=\"utf-8\">\n    <title>Android Security Center</title>\n    <meta content=\"initial-scale=1, minimum-scale=1, width=device-width\" name="...) at crawler.c:536
#8  0x00005555555582cc in check_multi_info (g=0x7ffffffe0970) at crawler.c:678
#9  0x00005555555583db in event_cb (g=0x7ffffffe0970, fd=1164, revents=1) at crawler.c:706
#10 0x0000555555559829 in crawler_init () at crawler.c:1154
#11 0x0000555555559ae9 in main (argc=1, argv=0x7fffffffe018) at crawler.c:1207

这将我带到写着sql_current->next = (SqlNode *)malloc(sizeof(SqlNode));的代码行,据我所知,这应该不会导致任何错误。

谷歌搜索错误表明问题可能出现在代码中的任何位置,并且与所讨论的行完全无关。那正确吗?

MySQL数据库设置如下:

USE crawl;
CREATE TABLE IF NOT EXISTS `crawled` (`id` int NOT NULL AUTO_INCREMENT, `url` varchar(768) DEFAULT NULL, `title` varchar(768) DEFAULT NULL, `date` varchar(128) DEFAULT NULL, `links` int DEFAULT 0, `backlinks` int DEFAULT 0, `frontier` int DEFAULT 1, PRIMARY KEY (`id`), UNIQUE KEY `url` (`url`), KEY `title` (`title`), KEY `frontier` (`frontier`)) ENGINE=InnoDB AUTO_INCREMENT=1;
CREATE TABLE IF NOT EXISTS `emails` (`email` varchar(2084) NOT NULL, `id` int NOT NULL AUTO_INCREMENT, PRIMARY KEY (`id`), UNIQUE KEY `email` (`email`)) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;
INSERT INTO crawled (url) VALUES ('http://www.bing.com'),('http://www.yahoo.com'),('http://www.google.com');

如果有问题的线路很好,我怎么能找到真正的问题?

编辑

这是一些 valgrind 输出:

==318618== Memcheck, a memory error detector
==318618== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==318618== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==318618== Command: ./a.out
==318618== Parent PID: 2591
==318618== 
==318618== Warning: ignored attempt to set SIGKILL handler in sigaction();
==318618==          the SIGKILL signal is uncatchable
==318618== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==318618==    at 0x515AACE: epoll_ctl (syscall-template.S:78)
==318618==    by 0x10D052: crawler_init (crawler.c:987)
==318618==    by 0x10DAE8: main (crawler.c:1207)
==318618==  Address 0x1ffefe28ac is on thread 1's stack
==318618==  in frame #1, created by crawler_init (crawler.c:956)
==318618== 
==318618== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==318618==    at 0x515AACE: epoll_ctl (syscall-template.S:78)
==318618==    by 0x10C61C: setsock (crawler.c:769)
==318618==    by 0x10C6BC: addsock (crawler.c:782)
==318618==    by 0x10CE70: sock_cb (crawler.c:921)
==318618==    by 0x48B70B1: singlesocket (multi.c:2593)
==318618==    by 0x48B7878: multi_socket (multi.c:2839)
==318618==    by 0x48B8053: curl_multi_socket_action (multi.c:2956)
==318618==    by 0x10C4E8: timer_cb (crawler.c:741)
==318618==    by 0x10D7C7: crawler_init (crawler.c:1152)
==318618==    by 0x10DAE8: main (crawler.c:1207)
==318618==  Address 0x1ffefe2584 is on thread 1's stack
==318618==  in frame #1, created by setsock (crawler.c:749)
==318618== 
==318618== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==318618==    at 0x515AACE: epoll_ctl (syscall-template.S:78)
==318618==    by 0x10C61C: setsock (crawler.c:769)
==318618==    by 0x10CE8F: sock_cb (crawler.c:923)
==318618==    by 0x48B70B1: singlesocket (multi.c:2593)
==318618==    by 0x48B7878: multi_socket (multi.c:2839)
==318618==    by 0x48B8053: curl_multi_socket_action (multi.c:2956)
==318618==    by 0x10C3BA: event_cb (crawler.c:703)
==318618==    by 0x10D828: crawler_init (crawler.c:1154)
==318618==    by 0x10DAE8: main (crawler.c:1207)
==318618==  Address 0x1ffefe25a4 is on thread 1's stack
==318618==  in frame #1, created by setsock (crawler.c:749)
==318618== 
==318618== 
==318618== HEAP SUMMARY:
==318618==     in use at exit: 149,695,831 bytes in 27,400 blocks
==318618==   total heap usage: 2,198,504 allocs, 2,171,104 frees, 3,507,931,785 bytes allocated
==318618== 
==318618== LEAK SUMMARY:
==318618==    definitely lost: 1,889,627 bytes in 9,067 blocks
==318618==    indirectly lost: 0 bytes in 0 blocks
==318618==      possibly lost: 2,137,116 bytes in 27 blocks
==318618==    still reachable: 145,669,088 bytes in 18,306 blocks
==318618==         suppressed: 0 bytes in 0 blocks
==318618== Rerun with --leak-check=full to see details of leaked memory
==318618== 
==318618== Use --track-origins=yes to see where uninitialised values come from
==318618== ERROR SUMMARY: 295 errors from 3 contexts (suppressed: 0 from 0)
==318618== 
==318618== 1 errors in context 1 of 3:
==318618== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==318618==    at 0x515AACE: epoll_ctl (syscall-template.S:78)
==318618==    by 0x10D052: crawler_init (crawler.c:987)
==318618==    by 0x10DAE8: main (crawler.c:1207)
==318618==  Address 0x1ffefe28ac is on thread 1's stack
==318618==  in frame #1, created by crawler_init (crawler.c:956)
==318618== 
==318618== 
==318618== 79 errors in context 2 of 3:
==318618== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==318618==    at 0x515AACE: epoll_ctl (syscall-template.S:78)
==318618==    by 0x10C61C: setsock (crawler.c:769)
==318618==    by 0x10CE8F: sock_cb (crawler.c:923)
==318618==    by 0x48B70B1: singlesocket (multi.c:2593)
==318618==    by 0x48B7878: multi_socket (multi.c:2839)
==318618==    by 0x48B8053: curl_multi_socket_action (multi.c:2956)
==318618==    by 0x10C3BA: event_cb (crawler.c:703)
==318618==    by 0x10D828: crawler_init (crawler.c:1154)
==318618==    by 0x10DAE8: main (crawler.c:1207)
==318618==  Address 0x1ffefe25a4 is on thread 1's stack
==318618==  in frame #1, created by setsock (crawler.c:749)
==318618== 
==318618== 
==318618== 215 errors in context 3 of 3:
==318618== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==318618==    at 0x515AACE: epoll_ctl (syscall-template.S:78)
==318618==    by 0x10C61C: setsock (crawler.c:769)
==318618==    by 0x10C6BC: addsock (crawler.c:782)
==318618==    by 0x10CE70: sock_cb (crawler.c:921)
==318618==    by 0x48B70B1: singlesocket (multi.c:2593)
==318618==    by 0x48B7878: multi_socket (multi.c:2839)
==318618==    by 0x48B8053: curl_multi_socket_action (multi.c:2956)
==318618==    by 0x10C4E8: timer_cb (crawler.c:741)
==318618==    by 0x10D7C7: crawler_init (crawler.c:1152)
==318618==    by 0x10DAE8: main (crawler.c:1207)
==318618==  Address 0x1ffefe2584 is on thread 1's stack
==318618==  in frame #1, created by setsock (crawler.c:749)
==318618== 
==318618== ERROR SUMMARY: 295 errors from 3 contexts (suppressed: 0 from 0)

编辑

这是崩溃的一些 valgrind 输出:

==319842== Memcheck, a memory error detector
==319842== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==319842== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==319842== Command: ./a.out
==319842== Parent PID: 2591
==319842== 
==319842== Warning: ignored attempt to set SIGKILL handler in sigaction();
==319842==          the SIGKILL signal is uncatchable
==319842== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==319842==    at 0x515AACE: epoll_ctl (syscall-template.S:78)
==319842==    by 0x10D052: crawler_init (crawler.c:987)
==319842==    by 0x10DAE8: main (crawler.c:1207)
==319842==  Address 0x1ffefe28ac is on thread 1's stack
==319842==  in frame #1, created by crawler_init (crawler.c:956)
==319842== 
==319842== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==319842==    at 0x515AACE: epoll_ctl (syscall-template.S:78)
==319842==    by 0x10C61C: setsock (crawler.c:769)
==319842==    by 0x10C6BC: addsock (crawler.c:782)
==319842==    by 0x10CE70: sock_cb (crawler.c:921)
==319842==    by 0x48B70B1: singlesocket (multi.c:2593)
==319842==    by 0x48B7878: multi_socket (multi.c:2839)
==319842==    by 0x48B8053: curl_multi_socket_action (multi.c:2956)
==319842==    by 0x10C4E8: timer_cb (crawler.c:741)
==319842==    by 0x10D7C7: crawler_init (crawler.c:1152)
==319842==    by 0x10DAE8: main (crawler.c:1207)
==319842==  Address 0x1ffefe2584 is on thread 1's stack
==319842==  in frame #1, created by setsock (crawler.c:749)
==319842== 
==319842== Syscall param epoll_ctl(event) points to uninitialised byte(s)
==319842==    at 0x515AACE: epoll_ctl (syscall-template.S:78)
==319842==    by 0x10C61C: setsock (crawler.c:769)
==319842==    by 0x10CE8F: sock_cb (crawler.c:923)
==319842==    by 0x48B70B1: singlesocket (multi.c:2593)
==319842==    by 0x48B7878: multi_socket (multi.c:2839)
==319842==    by 0x48B8053: curl_multi_socket_action (multi.c:2956)
==319842==    by 0x10C3BA: event_cb (crawler.c:703)
==319842==    by 0x10D828: crawler_init (crawler.c:1154)
==319842==    by 0x10DAE8: main (crawler.c:1207)
==319842==  Address 0x1ffefe25a4 is on thread 1's stack
==319842==  in frame #1, created by setsock (crawler.c:749)
==319842== 
==319842== Invalid write of size 1
==319842==    at 0x48436E4: mempcpy (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==319842==    by 0x50CD1D8: _IO_default_xsputn (genops.c:386)
==319842==    by 0x50CD1D8: _IO_default_xsputn (genops.c:370)
==319842==    by 0x50B227B: __vfprintf_internal (vfprintf-internal.c:1688)
==319842==    by 0x50C0278: __vsprintf_internal (iovsprintf.c:95)
==319842==    by 0x509D047: sprintf (sprintf.c:30)
==319842==    by 0x10B88F: html_link_find (crawler.c:452)
==319842==    by 0x10BD6F: html_parse (crawler.c:536)
==319842==    by 0x10C2CB: check_multi_info (crawler.c:678)
==319842==    by 0x10C3DA: event_cb (crawler.c:706)
==319842==    by 0x10D828: crawler_init (crawler.c:1154)
==319842==    by 0x10DAE8: main (crawler.c:1207)
==319842==  Address 0xf107d18 is 0 bytes after a block of size 8,200 alloc'd
==319842==    at 0x483B7F3: malloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==319842==    by 0x10B736: html_link_find (crawler.c:440)
==319842==    by 0x10BD6F: html_parse (crawler.c:536)
==319842==    by 0x10C2CB: check_multi_info (crawler.c:678)
==319842==    by 0x10C3DA: event_cb (crawler.c:706)
==319842==    by 0x10D828: crawler_init (crawler.c:1154)
==319842==    by 0x10DAE8: main (crawler.c:1207)
==319842== 

valgrind: m_mallocfree.c:305 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 8272, hi = 3625731377157460067.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.


host stacktrace:
==319842==    at 0x58046FFA: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux)
==319842==    by 0x58047127: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux)
==319842==    by 0x580472CB: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux)
==319842==    by 0x580514B4: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux)
==319842==    by 0x5803DE9A: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux)
==319842==    by 0x5803CD9F: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux)
==319842==    by 0x58041F04: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux)
==319842==    by 0x5803C1D8: ??? (in /usr/lib/x86_64-linux-gnu/valgrind/memcheck-amd64-linux)
==319842==    by 0x1002EC6B18: ???
==319842==    by 0x1002CADF2F: ???
==319842==    by 0x1002CADF17: ???
==319842==    by 0x1002CADF2F: ???
==319842==    by 0x1002CADF3F: ???

sched status:
  running_tid=1

Thread 1: status = VgTs_Runnable (lwpid 319842)
==319842==    at 0x50CD2B4: _IO_default_xsputn (genops.c:394)
==319842==    by 0x50CD2B4: _IO_default_xsputn (genops.c:370)
==319842==    by 0x50B2165: __vfprintf_internal (vfprintf-internal.c:1719)
==319842==    by 0x50C0278: __vsprintf_internal (iovsprintf.c:95)
==319842==    by 0x509D047: sprintf (sprintf.c:30)
==319842==    by 0x10B88F: html_link_find (crawler.c:452)
==319842==    by 0x10BD6F: html_parse (crawler.c:536)
==319842==    by 0x10C2CB: check_multi_info (crawler.c:678)
==319842==    by 0x10C3DA: event_cb (crawler.c:706)
==319842==    by 0x10D828: crawler_init (crawler.c:1154)
==319842==    by 0x10DAE8: main (crawler.c:1207)
client stack range: [0x1FFEFDB000 0x1FFF000FFF] client SP: 0x1FFEFDB600
valgrind stack range: [0x1002BAE000 0x1002CADFFF] top usage: 10344 of 1048576


Note: see also the FAQ in the source distribution.
It contains workarounds to several common problems.
In particular, if Valgrind aborted or crashed after
identifying problems in your program, there's a good chance
that fixing those problems will prevent Valgrind aborting or
crashing, especially if it happened in m_mallocfree.c.

If that doesn't help, please report this bug to: www.valgrind.org

In the bug report, send all the above text, the valgrind
version, and what OS and version you are using.  Thanks.

解答

将评论转移到答案。

似乎函数html_link_find()在第 440 行分配了内存,然后在第 452 行调用了sprintf()来格式化某些东西,并且它将分配的内存的末尾覆盖了 1 个字节,这足以杀死malloc()

一种部分解决方法是使用snprintf()而不是sprintf()— 但您还需要测试返回值以避免数据截断。有可能某些应该以空值终止的数据不是,这可能导致部分溢出。

有关为什么会报错 malloc(): invalid size (unsorted)?的更多相关文章

  1. ruby - 为什么我可以在 Ruby 中使用 Object#send 访问私有(private)/ protected 方法? - 2

    类classAprivatedeffooputs:fooendpublicdefbarputs:barendprivatedefzimputs:zimendprotecteddefdibputs:dibendendA的实例a=A.new测试a.foorescueputs:faila.barrescueputs:faila.zimrescueputs:faila.dibrescueputs:faila.gazrescueputs:fail测试输出failbarfailfailfail.发送测试[:foo,:bar,:zim,:dib,:gaz].each{|m|a.send(m)resc

  2. ruby-on-rails - Rails - 子类化模型的设计模式是什么? - 2

    我有一个模型:classItem项目有一个属性“商店”基于存储的值,我希望Item对象对特定方法具有不同的行为。Rails中是否有针对此的通用设计模式?如果方法中没有大的if-else语句,这是如何干净利落地完成的? 最佳答案 通常通过Single-TableInheritance. 关于ruby-on-rails-Rails-子类化模型的设计模式是什么?,我们在StackOverflow上找到一个类似的问题: https://stackoverflow.co

  3. ruby - 什么是填充的 Base64 编码字符串以及如何在 ruby​​ 中生成它们? - 2

    我正在使用的第三方API的文档状态:"[O]urAPIonlyacceptspaddedBase64encodedstrings."什么是“填充的Base64编码字符串”以及如何在Ruby中生成它们。下面的代码是我第一次尝试创建转换为Base64的JSON格式数据。xa=Base64.encode64(a.to_json) 最佳答案 他们说的padding其实就是Base64本身的一部分。它是末尾的“=”和“==”。Base64将3个字节的数据包编码为4个编码字符。所以如果你的输入数据有长度n和n%3=1=>"=="末尾用于填充n%

  4. ruby - 解析 RDFa、微数据等的最佳方式是什么,使用统一的模式/词汇(例如 schema.org)存储和显示信息 - 2

    我主要使用Ruby来执行此操作,但到目前为止我的攻击计划如下:使用gemsrdf、rdf-rdfa和rdf-microdata或mida来解析给定任何URI的数据。我认为最好映射到像schema.org这样的统一模式,例如使用这个yaml文件,它试图描述数据词汇表和opengraph到schema.org之间的转换:#SchemaXtoschema.orgconversion#data-vocabularyDV:name:namestreet-address:streetAddressregion:addressRegionlocality:addressLocalityphoto:i

  5. ruby - 为什么 4.1%2 使用 Ruby 返回 0.0999999999999996?但是 4.2%2==0.2 - 2

    为什么4.1%2返回0.0999999999999996?但是4.2%2==0.2。 最佳答案 参见此处:WhatEveryProgrammerShouldKnowAboutFloating-PointArithmetic实数是无限的。计算机使用的位数有限(今天是32位、64位)。因此计算机进行的浮点运算不能代表所有的实数。0.1是这些数字之一。请注意,这不是与Ruby相关的问题,而是与所有编程语言相关的问题,因为它来自计算机表示实数的方式。 关于ruby-为什么4.1%2使用Ruby返

  6. ruby - ruby 中的 TOPLEVEL_BINDING 是什么? - 2

    它不等于主线程的binding,这个toplevel作用域是什么?此作用域与主线程中的binding有何不同?>ruby-e'putsTOPLEVEL_BINDING===binding'false 最佳答案 事实是,TOPLEVEL_BINDING始终引用Binding的预定义全局实例,而Kernel#binding创建的新实例>Binding每次封装当前执行上下文。在顶层,它们都包含相同的绑定(bind),但它们不是同一个对象,您无法使用==或===测试它们的绑定(bind)相等性。putsTOPLEVEL_BINDINGput

  7. ruby - Infinity 和 NaN 的类型是什么? - 2

    我可以得到Infinity和NaNn=9.0/0#=>Infinityn.class#=>Floatm=0/0.0#=>NaNm.class#=>Float但是当我想直接访问Infinity或NaN时:Infinity#=>uninitializedconstantInfinity(NameError)NaN#=>uninitializedconstantNaN(NameError)什么是Infinity和NaN?它们是对象、关键字还是其他东西? 最佳答案 您看到打印为Infinity和NaN的只是Float类的两个特殊实例的字符串

  8. ruby-on-rails - 如果 Object::try 被发送到一个 nil 对象,为什么它会起作用? - 2

    如果您尝试在Ruby中的nil对象上调用方法,则会出现NoMethodError异常并显示消息:"undefinedmethod‘...’fornil:NilClass"然而,有一个tryRails中的方法,如果它被发送到一个nil对象,它只返回nil:require'rubygems'require'active_support/all'nil.try(:nonexisting_method)#noNoMethodErrorexceptionanymore那么try如何在内部工作以防止该异常? 最佳答案 像Ruby中的所有其他对象

  9. ruby - 为什么 SecureRandom.uuid 创建一个唯一的字符串? - 2

    关闭。这个问题需要detailsorclarity.它目前不接受答案。想改进这个问题吗?通过editingthispost添加细节并澄清问题.关闭8年前。Improvethisquestion为什么SecureRandom.uuid创建一个唯一的字符串?SecureRandom.uuid#=>"35cb4e30-54e1-49f9-b5ce-4134799eb2c0"SecureRandom.uuid方法创建的字符串从不重复?

  10. ruby - 当使用::指定模块时,为什么 Ruby 不在更高范围内查找类? - 2

    我刚刚被困在这个问题上一段时间了。以这个基地为例:moduleTopclassTestendmoduleFooendend稍后,我可以通过这样做在Foo中定义扩展Test的类:moduleTopmoduleFooclassSomeTest但是,如果我尝试通过使用::指定模块来最小化缩进:moduleTop::FooclassFailure这失败了:NameError:uninitializedconstantTop::Foo::Test这是一个错误,还是仅仅是Ruby解析变量名的方式的逻辑结果? 最佳答案 Isthisabug,or

随机推荐