Comment 18 for bug 326718

Revision history for this message
floid (jkanowitz) wrote :

Figures, I sit down to finally debug this and find that glibc 2.9-0ubuntu12 was released yesterday; it contains a patch which appears to... well, at least patch the issue. Lookups, even with apt, appear to be working with this version!

So is the fix appropriate? From the changelog:

glibc (2.9-0ubuntu12) jaunty; urgency=low

  * debian/patches/all/fedora-nss_dns-gethostbyname4-disable.diff: Patch
    from Fedora 2.9-3 to temporarily disable _nss_dns_gethostbyname4_r,
    which caused problems for systems with broken IPv6 connectivity
    (LP: #313218, https://bugzilla.redhat.com/show_bug.cgi?id=459756).

I am having trouble finding this particular .diff - where am I supposed to look? - but assume it is substantially similar to:

http://pasky.or.cz/~pasky/dev/glibc/glibc-2.10-dns-no-gethostbyname4.diff

found via Google.

...so apparently parallel lookups were codified as _nss_dns_gethostbyname4_r. Fair enough.

That version bears the following comment: "This should work in theory, but it turns out that many cheap DSL modems and similar devices have buggy DNS servers - if the AAAA query arrives too quickly after the A query, the server will generate only a single reply with the A query id but returning an error for the AAAA query; we get stuck waiting for the second reply."

This blames "cheap DSL modems and similar devices," but If I understand my own dumps (see: tcpdump_snippet - search race.txt for example) with the "broken" resolver, this was not the case for my configuration: separate queries were issued from the same source port but with different IDs, the nameserver properly responded to both, then for inexplicable reasons the resolver reissued the same queries (reusing the same IDs, but a new source port) a second time, before blindly trying a third set of requests with the search domain appended. Rather than "getting stuck waiting," it rapidly repeated itself.

Clearly something was screwy with the resolver algorithm, rather than the particular DNS server, unless I have overlooked some subtle noncompliance in the responses. If I get a chance, I hope to explore the "_nss_dns_gethostbyname4_r" behavior in greater depth and come to an absolute and reasoned conclusion. :}

To reiterate, though: In the meantime, this patch, reverting to the earlier strategies, does seem to "fix everything."