Comment 9 for bug 1022124

Revision history for this message
Jason Conti (jconti) wrote :

The troublesome code from the valgrind log in iri.c (and the only major change to iri.c since lucid) is from commit 2f6aa1d7417df1dfc58597777686fbd77179b9fd:

diff --git a/src/iri.c b/src/iri.c
index 08cfde4..9b16639 100644
--- a/src/iri.c
+++ b/src/iri.c
@@ -264,6 +264,21 @@ remote_to_utf8 (struct iri *i, const char *str, const char **new)
   if (!i->uri_encoding)
     return false;

+ /* When `i->uri_encoding' == "UTF-8" there is nothing to convert. But we must
+ test for non-ASCII symbols for correct hostname processing in `idn_encode'
+ function. */
+ if (!strcmp (i->uri_encoding, "UTF-8"))
+ {
+ int i, len = strlen (str);
+ for (i = 0; i < len; i++)
+ if ((unsigned char) str[i] >= (unsigned char) '\200')
+ {
+ *new = strdup (str);
+ return true;
+ }
+ return false;
+ }
+
   cd = iconv_open ("UTF-8", i->uri_encoding);
   if (cd == (iconv_t)(-1))
     return false;

Might be worth seeing if dropping that patch (which was only added to avoid converting to UTF-8 twice, and seems kind of unsafe) and rebuilding wget fixes the issue. If so, might be worth raising a bug upstream so they can work out a proper fix.