wget 1.13.4 crashed with SIGSEGV in malloc_consolidate()
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
wget |
Unknown
|
Unknown
|
|||
wget (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
wget on Precise (wget v1.13.4) crashes with segfault when mirroring sites. THIS IS A REGRESSION: the same behavior does not occur in wget v1.12 as found on Ubuntu Lucid.
System being tested is fully up to date Ubuntu Precise x64 (Server), test machine on Lucid (which does not segfault) is fully up to date Ubuntu Lucid x64 (Server).
Sample output:
<pre>
me@box:/tmp$ wget -m --delete-after http://
[several pages of OK output redacted]
Served from: www.[redacted] @ 2012-07-07 11:56:50 -->] done.
2012-07-07 13:56:50 ERROR 404: Not Found.
Dequeuing http://
Queue count 144, maxcount 149.
--2012-07-07 13:56:50-- http://
Disabling further reuse of socket 3.
Closed fd 3
Found www.[redacted] in host_name_
Segmentation fault
</pre>
Using wget directly on the page which appears to have been processing during the segfault does NOT result in another segfault:
<pre>
root@www:/tmp# wget --delete-after http://
--2012-07-07 14:00:03-- http://
Resolving www.[redacted] (www.[redacted])... 173.193.169.42
Connecting to www.[redacted] (www.[redacted]
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: `index.html'
[ <=> ] 84,289 --.-K/s in 0s
2012-07-07 14:00:07 (161 MB/s) - `index.html' saved [84289]
Removing index.html.
</pre>
summary: |
- segfault in wget 1.13.4 + wget 1.13.4 crashed with SIGSEGV in malloc_consolidate() |
Ah HA! I don't know about the bug in wget, but I found the oddness in the site being crawled which was *causing* wget to trip *its* bug:
<script src=”http:// ajax.googleapis .com/ajax/ libs/jquery/ 1.5/jquery. min.js”></script>
Took forever to spot this: somebody put "pretty quotes" in a CSS file in a WordPress theme - browsers, wget included, don't recognize the pretty quotes as quotes for coding purposes, so you end up trying to fetch a really, really broken URL:
http:// www.[redacted] /%E2%80% 9Dhttp: /ajax.googleapi s.com/ajax/ libs/jquery/ 1.5/jquery. min.js% E2%80%9D
Most browsers just try to get that URL, fail at it, and move on with life: but the new version of wget in 12.04 actually *segfaults* when it encounters that, which of course it only will if recursion is turned on. If it helps: you really do ONLY get this in recursion; an attempt to fetch the botched URL manually - either using the HTML escape codes, or using the prettyquotes directly at the shell - results in the expected 404, not a segfault.