Took forever to spot this: somebody put "pretty quotes" in a CSS file in a WordPress theme - browsers, wget included, don't recognize the pretty quotes as quotes for coding purposes, so you end up trying to fetch a really, really broken URL:
Most browsers just try to get that URL, fail at it, and move on with life: but the new version of wget in 12.04 actually *segfaults* when it encounters that, which of course it only will if recursion is turned on. If it helps: you really do ONLY get this in recursion; an attempt to fetch the botched URL manually - either using the HTML escape codes, or using the prettyquotes directly at the shell - results in the expected 404, not a segfault.
Ah HA! I don't know about the bug in wget, but I found the oddness in the site being crawled which was *causing* wget to trip *its* bug:
<script src=”http:// ajax.googleapis .com/ajax/ libs/jquery/ 1.5/jquery. min.js”></script>
Took forever to spot this: somebody put "pretty quotes" in a CSS file in a WordPress theme - browsers, wget included, don't recognize the pretty quotes as quotes for coding purposes, so you end up trying to fetch a really, really broken URL:
http:// www.[redacted] /%E2%80% 9Dhttp: /ajax.googleapi s.com/ajax/ libs/jquery/ 1.5/jquery. min.js% E2%80%9D
Most browsers just try to get that URL, fail at it, and move on with life: but the new version of wget in 12.04 actually *segfaults* when it encounters that, which of course it only will if recursion is turned on. If it helps: you really do ONLY get this in recursion; an attempt to fetch the botched URL manually - either using the HTML escape codes, or using the prettyquotes directly at the shell - results in the expected 404, not a segfault.