Comment 1 for bug 1022124

Revision history for this message
Jim Salter (jrssnet) wrote : Re: segfault in wget 1.13.4

Ah HA! I don't know about the bug in wget, but I found the oddness in the site being crawled which was *causing* wget to trip *its* bug:

<script src=”http://ajax.googleapis.com/ajax/libs/jquery/1.5/jquery.min.js”></script>

Took forever to spot this: somebody put "pretty quotes" in a CSS file in a WordPress theme - browsers, wget included, don't recognize the pretty quotes as quotes for coding purposes, so you end up trying to fetch a really, really broken URL:

http://www.[redacted]/%E2%80%9Dhttp:/ajax.googleapis.com/ajax/libs/jquery/1.5/jquery.min.js%E2%80%9D

Most browsers just try to get that URL, fail at it, and move on with life: but the new version of wget in 12.04 actually *segfaults* when it encounters that, which of course it only will if recursion is turned on. If it helps: you really do ONLY get this in recursion; an attempt to fetch the botched URL manually - either using the HTML escape codes, or using the prettyquotes directly at the shell - results in the expected 404, not a segfault.