[regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

Bug #417757 reported by camper365
This bug affects 219 people
Affects Status Importance Assigned to Milestone
eglibc (Ubuntu)
Invalid
High
Matthias Klose
Karmic
Fix Released
High
Unassigned
Lucid
Won't Fix
High
Matthias Klose
glibc (Fedora)
Fix Released
High

Bug Description

In Karmic, DNS lookups take a very long time with some routers, because glibc's DNS resolver tries to do IPv6 (AAAA) lookups even if there are no (non-loopback) IPv6 interfaces configured. Routers which do not repond to this cause the lookup to take 20 seconds (until the IPv6 query times out).

*** PLEASE DO NOT COMMENT ON THIS BUG unless you have something constructive to say. Everything that can be said has already been said, and if you comment, you are just adding noise. Please let those that actually know what they are doing concentrate on fixing this bug from now on. ***

If disabling IPv6 or using good DNS servers like openDNS fixes the problem, you are not dealing with this bug. Please refrain from complaining here in that case

Revision history for this message
camper365 (camper365) wrote :
Revision history for this message
Micah Gersten (micahg) wrote :

Thank you for reporting this to Ubuntu. Could you please see if you have the same trouble with other browsers such as epiphany-webkit or midori?

Changed in firefox-3.5 (Ubuntu):
status: New → Incomplete
Revision history for this message
camper365 (camper365) wrote :

Yes it does apply to other browsers.

Revision history for this message
Micah Gersten (micahg) wrote :

This would appear to be more than a Firefox problem since other browsers are involved. I'm removing the Firefox 3.5 package from the bug and asking for reassignment to the appropriate package.

tags: added: needs-reassignment
affects: firefox-3.5 (Ubuntu) → ubuntu
Changed in ubuntu:
status: Incomplete → New
summary: - Firefox is slow by default due to IPv6 DNS lookups
+ Browsers are slow by default due to IPv6 DNS lookups
Revision history for this message
Martin Olsson (mnemo) wrote : Re: Browsers are slow by default due to IPv6 DNS lookups

I've been struggling with this bug as well, for me it started with updates I installed on 3rd sept even though I had no problems like this in karmic earlier (at this point I installed updates from about two weeks back though). It affects all network apps (not just browsers). I originally filed a ticket with my ISP because I thought it was their DNS servers that were slow.

Revision history for this message
Martin Olsson (mnemo) wrote :

What I was seeing what 20-40 seconds page loads for certain webpages, when I set network.dns.disableIPv6 to true most pages loads with 1-3 seconds.

Martin Olsson (mnemo)
summary: - Browsers are slow by default due to IPv6 DNS lookups
+ [karmic regression] all network apps / browsers suffer from multi-second
+ delays by default due to IPv6 DNS lookups
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Jeroen Massar (massar) wrote : Re: [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

This is a problem with the DNS resolver.

This problem will occur for any DNS request which the DNS resolver does not support.
The proper solution is to fix the DNS resolver.

What happens:
 - Program is IPv6 enabled.
 - When it looks up a hostname, getaddrinfo() asks first for a AAAA record
 - the DNS resolver sees the request for the AAAA record, goes "uhmmm I dunno what it is, lets throw it away"
 - DNS client (getaddrinfo() in libc) waits for a response..... has to time out as there is no response. (THIS IS THE DELAY)
 - No records received yet, thus getaddrinfo() goes for a the A record request. This works.
 - Program gets the A records and uses those.

This does NOT only affect IPv6 (AAAA) records, it also affects any other DNS record that the resolver does not support.
Generally these resolvers are embedded into the "NAT boxes" that consumers have.

Working solution, as we are on Linux anyway: don't use the DNS resolver in the NAT box, but install eg pdns-recursor and use that.

Of course that does not fix the broken box, which might be the NAT box, or the resolvers at the ISP.
Some other people start using OpenDNS because those "work" (But that is not really true either: https://lists.dns-oarc.net/pipermail/dns-operations/2009-July/004217.html)

Note that the DNS queries go over IPv4 (transport), there is no IPv6 _connectivity_ involved here.

371 comments hidden view all 418 comments
Revision history for this message
In , clodoaldo.pinto.neto (clodoaldo.pinto.neto-redhat-bugs) wrote :

@44(In reply to comment #44)

I opened another bug as this one is not the one I'm experiencing:

https://bugzilla.redhat.com/show_bug.cgi?id=520304

370 comments hidden view all 418 comments
Revision history for this message
Markus Thielmann (thielmann) wrote :

I'm not convinced, that this is a resolver bug. I'm running an IPv6 enabled system (aiccu tunnel with sixxs.net), so all IPv6 requests are answered by an IPv6 enabled DNS server. I'm still experiencing the same problems. Additional to that, this bug was introduced by Karmic and didn't happen before.

371 comments hidden view all 418 comments
Revision history for this message
In , matzilla (matzilla-redhat-bugs) wrote :

Created attachment 362819
Capture of dns trafic while wget

Capture of dns trafic while wget lwn.net with fedora11+normal updates
provider is orange/france telecom. dns server looks like it's the ip of the livebox (there is several millions of users behind a livebox I think)

wget a little more than 10s
real 0m10.922s
user 0m0.005s
sys 0m0.013s

dns capture gives :
Fedora send two request , one A and one AAAA
answer for A is given at once
after 5s, retry of A (why ? we already have received the result ...)
answer with A immediately
then retry of AAAA, this time after the A answer come

after 10s, there is no dns trafic but wget decides to get the page (it was resolving host before)

after15s and 20s, the dns server timeout the aaaa answer

this looks timing related and dependant on the behaviour of the dns server as the same pc has a different behaviour when connecting to another provider.
as the dns answer immediately with a good ip,why is the linux waiting for so long ?

so my questions :
is the linux correctly getting the first A answer or is it rejecting it (believing it to the be an anwser not matching for the aaaa) ?
what are the timing used ?

Revision history for this message
In , matzilla (matzilla-redhat-bugs) wrote :

I'm using glibc 2.10.1-5
disabling ipv6 in about config firefox also workaround the pb.(didn't try globally as this should be working by default)

371 comments hidden view all 418 comments
Revision history for this message
Bernard Bou (bbou) wrote :

The 5 second lag occurs with the Livebox (used by Orange, 12 million broadband internet customers in Europe). Better fix this unless you want a number of users to tweak their config files to either disable ipv6 or disable box-based dns server, not something anybody enjoys doing.

Revision history for this message
max123 (maxrest) wrote :

I also suffer from this problem, it _is_ the DNS-resolver, like Jeroen analysed - there should really be a fix for Karmic RS..

Revision history for this message
Jeroen Massar (massar) wrote :

@ Markus's #8 comment: as I mentioned "Note that the DNS queries go over IPv4 (transport), there is no IPv6 _connectivity_ involved here.".

You also state 'so all IPv6 requests are answered by an IPv6 enabled DNS server."; well, unless you configured IPv6 DNS resolver addresses in your /etc/resolv.conf then queries will still go over IPv4 (transport), even though they are AAAA queries. AICCU only provides IPv6 connectivity (transport) it does not configure DNS resolvers though.

@ Bernard's #9 comment: most likely your livebox contains one of these broken DNS resolvers. Happens a lot that CPEs have this issue. Try the below to check this out. Configuring resolv.conf with OpenDNS or other working DNS servers (eg the ones of your ISP directly, instead of the livebox) might solve your problem. Do also please realize that this problem ALSO occurs on other platforms than Linux, eg Windows, which is what the majority of people are using; what to use is a choice of the user afterall....

To verify this, do a:
for i in `cat /etc/resolv.conf | grep ^nameserver | cut -f2 -d' '`; do dig @$i www.microsoft.com AAAA; done

This should return quite quickly, even though no AAAA records for www.microsoft.com exist yet. Now, if you have a broken resolver somewhere along the way, these requests won't return quickly (unless they are locally or on-path cached as negative).

Revision history for this message
Stephen Hall (stephen-richard-hall) wrote :

I have had the same problem, affecting all network activities. Particularly a problem when performing upgrades via aptitude. Problem completely solved by specifying Opendns as my DNS servers. I do not have this problem when running Jaunty, XP or Vista.

Revision history for this message
David Solbach (d-vidsolbach) wrote :

Just updated to karmic and experienced the same problem (1-3 second delays on dns lookup).
Switching off IPv6 dns support in firefox "fixes" the problem.

Do do that open up Mozilla or Firefox and type in 'about:config' in the address bar
Scroll down to "network.dns.disableIPv6", it's defaulted to a value of false, change it to true.

hope that helps.

Revision history for this message
camper365 (camper365) wrote :

That's a "fix" in Firefox, but the problem still exists in every other network app (evolution, aptitude, etc.)
So web browsing (which is what most users are doing anyway) is normal speed but everything else is behaving like you have dial-up.
After the final release, if the bug isn't fixed in every app people might start complaining to their isp or even drop ubuntu (or not upgrade to Karmic)

Revision history for this message
Pconfig (thomas9999) wrote :

I also notice the same problem in kubuntu. I remember this happened before on my upgrade to intrepid.

Revision history for this message
Brian Pitts (bpitts) wrote :

This is still present in the today's build. I don't understand why this isn't prioritized as release critical, since it makes web browsing and other network-related tasks unbearably slow.

Revision history for this message
max123 (maxrest) wrote :

Right, I also stress the incredible delay in thunderbird, network apps on the shell, update manager and everything else network related due to ipv6 lookups without having an ipv6 ip again!

At the university, where I get ipv6 as well, everything works as usual but at home with ipv4 its hardly usable..

Please investigate into this bug, I provide myself for testing things on this topic..

regards, max

Revision history for this message
Pconfig (thomas9999) wrote :

Temporary workaround can be found here:

http://ubuntuforums.org/archive/index.php/t-1281820.html

This proves that it has something to do with DNS resolving.

Revision history for this message
Jeroen Massar (massar) wrote :

For everybody not reading the other comments, #7 actually explains what goes on....

Yes, indeed, probably the best solution is to use just install a local DNS resolver (pdns-resolver), which hits the roots/gtld's etc itself. This is not very friendly to the general Internet, but heck, with the largest DNS server doing short TTLs and based on geography it might not matter too much.

Thus kids, "apt-get install pdns-recursor" and edit your /etc/resolv.conf to point to 127.0.0.1 when you get hit by this issue.

Revision history for this message
Pconfig (thomas9999) wrote :

I really think the opendns workaround is better at the time. But both solutions aren't good enough. You can't tell your grandmother to edit some config files because her internet is slow

Revision history for this message
camper365 (camper365) wrote :

I agree that this bug should be considered release critical, if not just applying the workaround. What could be added to network-manager is a feature for when you connect it tries to obtain an ipv6 dns and if it succeeds, it uses the network dns resolver or if it fails then it uses pdns-resolver (I just don't think that would work for this release, maybe in Lucid)

Revision history for this message
Zack Evans (zevans23) wrote :

I have had a privoxy go-slow - several seconds on every lookup - since installing Karmic beta. Hadn't really noticed a problem in any other app but web browsing did sometimes feel sluggish.

In a brainwave just now I have tried disabling ipv6 (using grub method) and now privoxy is working beautifully. I have also noticed that web browsing feels snappier generally, so I think this was slowing -all- of my apps down by a large enough fraction for me to feel the difference now.

Just to reiterate: it's repeatable for me with privoxy. IPV6 on - privoxy massive latency. IPV6 off - privoxy works fine.

I have a Draytek so blaming the router isn't practical - these have a MASSIVE installed base. Whether it's strictly the router's fault or not, it would not be ubuntu of Ubuntu to get all academically correct about it, we need some sort of workaround that can be achieved by clicking buttons.

To be honest, only the advanced users would want IPv6 anyway, so why not have it off by default and make it very easy to switch on?

Revision history for this message
Zack Evans (zevans23) wrote :

Should also say quite happy to test any other proposed workaround.

Revision history for this message
camper365 (camper365) wrote :

I have found that when I ping a site (for example, www.google.com) and I ping the url (www.google.com) it takes a while, but if I ping the IP address (63.251.179.13) then the lag is gone

Revision history for this message
Ragnarel (ragnarel) wrote : Re: [Bug 417757] Re: [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

¿Why I haven't delays with wireless connection and yes with wired?

Revision history for this message
Jeroen Massar (massar) wrote : Re: [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

@ Pconfig / #20

> You can't tell your grandmother to edit some config files because her internet is slow

Does your grandmother use Ubuntu then? If so, then just help her out in fixing the issue :)

@ Zack Evans / #23

> I have a Draytek so blaming the router isn't practical - these have a MASSIVE installed base

This problem also is in effect when the user has Windows and IPv6 enabled on that. The problem lies in the DNS resolver (which might not be the NAT box (what you call "router") but might be even your ISP, and thus you can avoid the problem by not using the DNS resolver in the NAT box. You might of course also try to upgrade your router, maybe they fixed the problem (you upgrade your Ubuntu and other things too, because they have issues, thus try that)

> To be honest, only the advanced users would want IPv6 anyway, so why not have it off by default and make it very easy to switch on?

Because in a few years or so you will have to enable IPv6 as there won't be any new hosts with IPv4 addresses. As such, better bite the apple today and fix those IPv6 issues, then wait till you really need it.

@ camper365 / #24

yes, that is correct, as when you ping www.google.com it has to lookup the hostname in DNS, while if you ping the address, it doesn't. DNS resolving (thus figuring out which address belongs to the requested hostname) is where the problem lies. See the hints about OpenDNS or pdns-recursor to solve it.

@ Ragnarel / #25

as per comment #11 try a:
  for i in `cat /etc/resolv.conf | grep ^nameserver | cut -f2 -d' '`; do dig @$i www.microsoft.com AAAA; done
when connected to wireless and when not connected to wireless. Or just for that matter, check if you are using the same nameservers when connected to wireless and wired, if they are different then you already got a small part of the answer.

Revision history for this message
Martin Olsson (mnemo) wrote :

Since we're running out of time, maybe we can just ship "network.dns.disableIPv6==true" as the Firefox default? I'd love a real fix for this bug but the RC is coming up very very soon now.

Revision history for this message
Markus Thielmann (thielmann) wrote :

Is it possible, that some patch changed the usage order of the nameserver from /etc/resolv.conf?

My router does deliver a "dead" nameserver via DHCP [1], which was never a problem since Ubuntu used to question the first (local) nameserver. The local nameserver resolves any given request without a problem [2]. If I remove the dead nameserver from resolv.conf, I no longer have any problems resolving DNS queries.

So it *might* be a solution to just change the usage order of the DNS servers to solve this "bug". Please notice, that a lot of users never experienced this problem before Karmic, so it might be hard to blame their hardware for this, even if it might be technically true... :-)

[1] It's a SE515, which delivers 217.237.151.97, despite any configuration.
[2] dig @192.168.1.1 www.microsoft.com AAAA without any noticeable delay

Revision history for this message
Micah Gersten (micahg) wrote :

There's at least enough information here to confirm the issue. I'll see if I can get someone to look at it.

Changed in network-manager (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Revision history for this message
DodgeV83 (spamfrelow) wrote :

This 100% fixed my problem!

1. In /etc/dhcp3/dhclient.conf add the following line:

prepend domain-name-servers 208.67.222.222,208.67.220.220;

2. In /etc/nsswitch.conf edit this line

hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

to this

hosts: files dns

I'm not sure which one of these did it to be honest, but it's fixed!

Revision history for this message
Darren Worrall (dazworrall) wrote :

My results are a little different. At the moment I'm using a draytek router, and am indeed suffering slow resolution in all my apps. Running the snippet above:

for i in `cat /etc/resolv.conf | grep ^nameserver | cut -f2 -d' '`; do dig @$i www.microsoft.com AAAA; done

Is very quick though. I consistently have slow resolution when running updates, but the same command against archive.ubuntu.com is also very quick.

The router has something to do with it I'm sure, my router at home doesn't give me any trouble at all, but querying so directly like this is reproducibly fast, while querying indirectly through update-manager, is reproducibly slow.

Revision history for this message
csulok (shikakaa) wrote :

For the what ultimately AND universally fixed/worked around the problem was the following:

edit /etc/sysctl.conf and add the following to the bottom:

#Disable IPv6
net.ipv6.conf.all.disable_ipv6=1

Revision history for this message
Jeroen Massar (massar) wrote :

@ csulok / #32

What that does is avoid fixing the problem. You disable IPv6, and thus glibc plays smart and does not resolve AAAA records anymore.

Your DNS resolver though is still broken. You might not notice it now, but if for instance per next year DNSSEC gets turned on you will run into it again.... (and you will probably just disable DNSSEC....)

Revision history for this message
Ragnarel (ragnarel) wrote : Re: [Bug 417757] Re: [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

csulok, your trick didn't solve my problem.

2009/10/21 Jeroen Massar <email address hidden>:
> @ csulok / #32
>
> What that does is avoid fixing the problem. You disable IPv6, and thus
> glibc plays smart and does not resolve AAAA records anymore.
>
> Your DNS resolver though is still broken. You might not notice it now,
> but if for instance per next year DNSSEC gets turned on you will run
> into it again.... (and you will probably just disable DNSSEC....)
>
> --
> [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups
> https://bugs.launchpad.net/bugs/417757
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>

Revision history for this message
Antonio Roberts (hellocatfood) wrote : Re: [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

I'm experiencing these problems on my Dell Studio 1555 laptop with Karmic Beta. I hope it gets fixed soon!

Revision history for this message
Nech (gerard-guadall) wrote :

I have two targets
07:02.0 Network controller: Broadcom Corporation BCM4318 [AirForce One 54g] 802.11g Wireless LAN Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)

I work better using wifi than using wired connection.

Revision history for this message
Jeroen Massar (massar) wrote :

@ Nech / #36

> I work better using wifi than using wired connection.

So, like I ask everybody else, check to see if there is a huge latency time difference when doing:

for i in `cat /etc/resolv.conf | grep ^nameserver | cut -f2 -d' '`; do dig @$i www.microsoft.com AAAA; done

Over the wired or wireless; quiker maybe is to check if you get a different set of DNS servers when connected over wired or wireless (just check if /etc/resolv.conf changes).

Revision history for this message
Nech (gerard-guadall) wrote :

I think the problem is not DNS. Actually, when I visit different websites in a short period of time, then everything get saturated. Some websites not load, and other take up to 2 or 3 minutes to do so. I tried it using Google Chromium also, and the result was the same. Is a new instalation the karmic, and the upgrade just happened

Results of wireless
-----------------------
; <<>> DiG 9.6.1-P1 <<>> @80.58.0.33 www.microsoft.com AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49350
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;www.microsoft.com. IN AAAA

;; ANSWER SECTION:
www.microsoft.com. 2921 IN CNAME toggle.www.ms.akadns.net.
toggle.www.ms.akadns.net. 247 IN CNAME g.www.ms.akadns.net.
g.www.ms.akadns.net. 265 IN CNAME lb1.www.ms.akadns.net.

;; AUTHORITY SECTION:
akadns.net. 90 IN SOA internal.akadns.net. hostmaster.akamai.com. 1256289512 90000 90000 90000 180

;; Query time: 104 msec
;; SERVER: 80.58.0.33#53(80.58.0.33)
;; WHEN: Fri Oct 23 11:20:03 2009
;; MSG SIZE rcvd: 170

wired
-------
; <<>> DiG 9.6.1-P1 <<>> @80.58.0.33 www.microsoft.com AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1196
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;www.microsoft.com. IN AAAA

;; ANSWER SECTION:
www.microsoft.com. 2805 IN CNAME toggle.www.ms.akadns.net.
toggle.www.ms.akadns.net. 140 IN CNAME g.www.ms.akadns.net.
g.www.ms.akadns.net. 144 IN CNAME lb1.www.ms.akadns.net.

;; AUTHORITY SECTION:
akadns.net. 91 IN SOA internal.akadns.net. hostmaster.akamai.com. 1256289631 90000 90000 90000 180

;; Query time: 102 msec
;; SERVER: 80.58.0.33#53(80.58.0.33)
;; WHEN: Fri Oct 23 11:22:00 2009
;; MSG SIZE rcvd: 170

Revision history for this message
Jeroen Massar (massar) wrote :

@ Nech / #38

As you have the same DNS server for both wired and wireless, most very likely _your problem_ is not a DNS issue* like what the others show here.

* = unless an upstream of your DNS server has the "drop unknown DNS records" problem and your resolver caches the negative answer correctly, which will cause any subsequent query, like the ones above, to be quick again.

To solve your problem, I guess you'll have to take a peek with Wireshark...

Revision history for this message
Zack Evans (zevans23) wrote :

My problem goes away if I disable IPv6. If I boot with IPv6 though, so I have the problem, DNS lookups from the command line happen quickly.

for i in `cat /etc/resolv.conf | grep ^nameserver | cut -f2 -d' '`; do dig @$i www.microsoft.com AAAA; done

is practically instant. (I have also tried it with some other hostnames to check it is not cacheing hiding the problem.)

I should add I did not have this problem in Jaunty, and no equipment has changed, only the upgrade to Karmic.

So, as I type, with IPv6 enabled, Privoxy is grinding, everything else seems OK.

If I reboot with IPv6 off, Privoxy and everything else will be OK. DNS AAAA lookups seem OK whether enabled or disabled.

So is there some other subtle interaction between Privoxy and IPv6?

camper365 (camper365)
Changed in linux (Ubuntu):
status: New → Confirmed
Micah Gersten (micahg)
tags: added: metabug
Changed in linux (Ubuntu):
assignee: nobody → IPv6 Task Force (ipv6)
Changed in network-manager (Ubuntu):
assignee: nobody → IPv6 Task Force (ipv6)
Changed in network-manager (Ubuntu):
assignee: IPv6 Task Force (ipv6) → nobody
Changed in linux (Ubuntu):
assignee: IPv6 Task Force (ipv6) → nobody
Changed in linux (Ubuntu):
assignee: nobody → IPv6 Task Force (ipv6)
Changed in network-manager (Ubuntu):
assignee: nobody → IPv6 Task Force (ipv6)
Martin Olsson (mnemo)
Changed in linux (Ubuntu):
assignee: IPv6 Task Force (ipv6) → nobody
Changed in network-manager (Ubuntu):
assignee: IPv6 Task Force (ipv6) → nobody
Martin Pitt (pitti)
Changed in linux (Ubuntu Lucid):
importance: Undecided → High
Changed in linux (Ubuntu Karmic):
importance: Undecided → High
tags: added: regression-release
removed: needs-reassignment
Changed in linux (Ubuntu Karmic):
milestone: none → karmic-updates
Martin Pitt (pitti)
Changed in network-manager (Ubuntu Karmic):
status: New → Invalid
Changed in network-manager (Ubuntu Lucid):
status: Confirmed → Invalid
Martin Pitt (pitti)
affects: linux (Ubuntu Lucid) → glibc (Ubuntu Lucid)
Changed in glibc (Ubuntu Lucid):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
status: Confirmed → Triaged
description: updated
Changed in glibc (Ubuntu Karmic):
status: New → Triaged
Changed in glibc (Ubuntu Lucid):
assignee: Canonical Foundations Team (canonical-foundations) → Matthias Klose (doko)
jordan.sc (jordanjsc)
Changed in glibc (Ubuntu Karmic):
status: Triaged → Fix Committed
Micah Gersten (micahg)
Changed in glibc (Ubuntu Karmic):
status: Fix Committed → Triaged
Nech (gerard-guadall)
Changed in glibc (Ubuntu Karmic):
status: Triaged → In Progress
status: In Progress → Confirmed
finno (finnegan)
Changed in glibc (Ubuntu Lucid):
status: Triaged → Invalid
Martin Pitt (pitti)
Changed in glibc (Ubuntu Lucid):
status: Invalid → Confirmed
Changed in glibc (Fedora):
status: Unknown → Confirmed
Carropa (carropa)
Changed in glibc (Ubuntu Karmic):
status: Confirmed → Fix Released
status: Fix Released → Confirmed
description: updated
341 comments hidden view all 418 comments
Revision history for this message
In , jclere (jclere-redhat-bugs) wrote :

I have change the dns proxy option to off in my router (Netopia-3000) that fixes the problems on my boxes (f11 and f12).

Matthias Klose (doko)
Changed in glibc (Ubuntu Lucid):
status: Confirmed → Fix Released
Changed in glibc (Ubuntu Karmic):
status: Confirmed → In Progress
Martin Pitt (pitti)
tags: added: verification-needed
affects: glibc (Ubuntu Karmic) → eglibc (Ubuntu Karmic)
Changed in eglibc (Ubuntu Karmic):
status: In Progress → Fix Committed
Martin Pitt (pitti)
tags: added: verification-done
removed: verification-needed
Martin Pitt (pitti)
Changed in eglibc (Ubuntu Lucid):
status: Fix Released → Confirmed
leucomax (w-smetanig)
Changed in eglibc (Ubuntu Karmic):
status: Fix Committed → Fix Released
Martin Pitt (pitti)
Changed in eglibc (Ubuntu Karmic):
status: Fix Released → Fix Committed
Changed in eglibc (Ubuntu Karmic):
status: Fix Committed → Fix Released
Eshant (guptaeshant)
Changed in eglibc (Ubuntu Karmic):
status: Fix Released → Fix Committed
status: Fix Committed → Fix Released
Changed in eglibc (Ubuntu Lucid):
status: Confirmed → Fix Released
Martin Pitt (pitti)
Changed in eglibc (Ubuntu Lucid):
status: Fix Released → Triaged
Changed in eglibc (Ubuntu Karmic):
status: Fix Released → Invalid
Micah Gersten (micahg)
Changed in eglibc (Ubuntu Karmic):
status: Invalid → Fix Released
Changed in eglibc (Ubuntu Karmic):
status: Fix Released → Fix Committed
Steve Langasek (vorlon)
Changed in eglibc (Ubuntu Karmic):
status: Fix Committed → Fix Released
Changed in eglibc (Ubuntu Karmic):
status: Fix Released → Incomplete
status: Incomplete → In Progress
Martin Pitt (pitti)
Changed in eglibc (Ubuntu Karmic):
status: In Progress → Fix Released
Changed in eglibc (Ubuntu Karmic):
status: Fix Released → In Progress
Martin Pitt (pitti)
Changed in eglibc (Ubuntu Karmic):
status: In Progress → Fix Released
description: updated
Emmet Hikory (persia)
tags: added: ipv6
description: updated
Revision history for this message
In , cdahlin (cdahlin-redhat-bugs) wrote :

This is reliable. Some programs have perfect DNS, some don't work at all.

[sadmac@foucault coding]$ ping edge.launchpad.net
PING edge.launchpad.net (91.189.89.225) 56(84) bytes of data.
64 bytes from wildcard-launchpad-net.banana.canonical.com (91.189.89.225): icmp_seq=1 ttl=42 time=101 ms
64 bytes from wildcard-launchpad-net.banana.canonical.com (91.189.89.225): icmp_seq=2 ttl=42 time=102 ms
64 bytes from wildcard-launchpad-net.banana.canonical.com (91.189.89.225): icmp_seq=3 ttl=42 time=102 ms
64 bytes from wildcard-launchpad-net.banana.canonical.com (91.189.89.225): icmp_seq=4 ttl=42 time=102 ms
^C
--- edge.launchpad.net ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3816ms
rtt min/avg/max/mdev = 101.601/101.983/102.238/0.235 ms
[sadmac@foucault coding]$ bzr co lp:libnih libnih-error
bzr: ERROR: Connection error: Could not resolve 'edge.launchpad.net' [Errno -2] Name or service not known
[sadmac@foucault coding]$

Revision history for this message
In , cdahlin (cdahlin-redhat-bugs) wrote :

Above is on latest F12

steve (swchoi-choi)
Changed in eglibc (Ubuntu Lucid):
status: Triaged → Confirmed
Steve Langasek (vorlon)
Changed in eglibc (Ubuntu Lucid):
status: Confirmed → Triaged
Johan (deberghes-johan)
summary: - [karmic regression] all network apps / browsers suffer from multi-second
- delays by default due to IPv6 DNS lookups
+ [regression] all network apps / browsers suffer from multi-second delays
+ by default due to IPv6 DNS lookups
Revision history for this message
In , triage (triage-redhat-bugs) wrote :

This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 11 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message
In , tytus64 (tytus64-redhat-bugs) wrote :

Just like Casey I am experiencing the same problem on F12.

Clipper:~ $ ping peach.mycompany.com
PING peach.mycompany.com (10.26.1.61) 56(84) bytes of data.
64 bytes from peach.mycompany.com (10.26.1.61): icmp_seq=1 ttl=63 time=0.267 ms
64 bytes from peach.mycompany.com (10.26.1.61): icmp_seq=2 ttl=63 time=0.311 ms
^C
--- peach.mycompany.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1371ms
rtt min/avg/max/mdev = 0.267/0.289/0.311/0.022 ms
Clipper:~ $ ssh peach.mycompany.com
ssh: Could not resolve hostname peach.mycompany.com: No address associated with hostname

Here is the output from Wireshark when ssh command is issued:

24921 4305.188943 10.4.1.236 10.1.1.151 DNS Standard query A peach.mycompany.com
24922 4305.188983 10.4.1.236 10.1.1.151 DNS Standard query AAAA peach.mycompany.com
24923 4305.189460 10.1.1.151 10.4.1.236 DNS Standard query response A 10.26.1.61
24924 4305.189475 10.1.1.151 10.4.1.236 DNS Standard query response

The only way for me to fix this problem is to put the following line in /etc/hosts
10.26.1.61 peach

I should mention that this problem is _not_ unique to this network - the F12 machine is a laptop and I can see this problem at work as well as at home. It does not happen all the time but often enough to be annoying. Not sure what's the trigger.

Clipper:~ $ rpm -qa |grep glibc
glibc-2.11.1-4.i686
glibc-2.11.1-4.x86_64
glibc-headers-2.11.1-4.x86_64
glibc-common-2.11.1-4.x86_64
glibc-devel-2.11.1-4.x86_64
glibc-debuginfo-2.11.1-1.x86_64

Let me know if I can help in anyway to debug it.

Revision history for this message
In , triage (triage-redhat-bugs) wrote :

Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Revision history for this message
In , tore (tore-redhat-bugs-1) wrote :

There is no question that the underlying problem here is defective DNS resolvers that choke on perfectly legitimate AAA queries. That said, there are a couple of issues present in software shipped by Fedora that cause the problem to manifest itself as slowdowns noticeable by end users:

1) When called with the AI_ADDRCONFIG flag, libc's getaddrinfo() function does not disregard link-local IPv6 addresses when determining whether or not the local host has usable IPv6 connectivity. Since every IPv6-capable OS will have link-local IPv6 addresses assigned to all interfaces - regardless of any external connectivity being available or not - this essentially makes AI_ADDRCONFIG on Linux useless for the purpose of suppressing AAAA queries when they're not useful.

I've submitted a bug to the GNU libc upstream about this issue at <http://sourceware.org/bugzilla/show_bug.cgi?id=12377>.

getaddrinfo() on other operating systems (such as Apple Mac OS X and Microsoft Windows) does disregard link-local IPv6 addresses when called with AI_ADDRCONFIG, which is why the problem appears to affect GNU/Linux distributions more than other operating systems.

2) Many applications do not set the AI_ADDRCONFIG flag when calling getaddrinfo(). This includes, notably, Mozilla Firefox. However, a patch to correct this has recently been committed to the mozilla-central developement repo and will likely be part of Firefox 4.0 beta 11 (hopefully also 3.6.15), see <https://bugzilla.mozilla.org/show_bug.cgi?id=614526>. Microsoft Windows enables the use of AI_ADDRCONFIG as the system-wide default, as far as I know, which explains why it is able to cope better with those broken middleware boxes. Mac OS X does not set AI_ADDRCONFIG by default, however it has an extremely short timeout waiting for AAAA responses after the A response has been answered (around 125ms), which in turn hides the problem from most end users. Additionally, most major browsers (except Firefox) do set AI_ADDRCONFIG explicitly, which suppress the problematic AAAA queries in the first place.

So what Fedora could to avoid this problem is 1) to develop and include a patch to glibc that makes getaddrinfo() ignore link-local addresses for AI_ADDRCONFIG purposes, and 2) to back-port the NSPR patch already committed to mozilla-central to the version of Firefox shipped (or wait until Mozilla releases a new version with the patch already included).

Tore

Revision history for this message
In , tore (tore-redhat-bugs-1) wrote :

Created attachment 478268
Solution 1/2: Make getaddrinfo()+AI_ADDRCONFIG ignore link-locals

Revision history for this message
In , tore (tore-redhat-bugs-1) wrote :

Created attachment 478270
Solution 2/2: Make Mozilla Firefox use AI_ADDRCONFIG when calling getaddrinfo()

Revision history for this message
In , tore (tore-redhat-bugs-1) wrote :

The two patches I've just attached solves this problem for most users:

The first makes getaddrinfo() ignore link-local addresses when called with the AI_ADDRCONFIG flag set. This makes getaddrinfo() avoid querying for AAAAs when the host has no IPv6 connectivity, provided that the AI_ADDRCONFIG flag is set. This brings glibc's getaddrinfo() behaviour in line with Mac OS X and Windows.

The second makes Mozilla Firefox use AI_ADDRCONFIG when calling getaddrinfo(). Note that the Mozilla release drivers have already approved this patch for inclusion on the 3.6.x branch, and it has already been commited to Firefox 4.0 (it's included in beta11).

Please apply.

(Of course, there might be applications other than Mozilla Firefox that does not set AI_ADDRCONFIG as well, which would require similar patches. However, Mozilla Firefox is the obvious one and likely the source of most user complaints.)

Tore

Revision history for this message
In , tore (tore-redhat-bugs-1) wrote :

Okay, so this is still a problem. What happens is:

1) the user enters some host name into his web browser or other application of choice running on a machine connected to an IPv4-only Ethernet network
2) the application kicks of an getaddrinfo() call for the host name, using AF_UNSPEC and AI_ADDRCONFIG
3) getaddrinfo() transmits both IN A (IPv4) and IN AAAA (IPv6) DNS queries to the upstream resolver
4) The upstream resolver, which is typically some cheapo home gateway or something, don't understand the IN AAAA queries and either doesn't respond to them at all, or screw them up somehow
5) getaddrinfo() doesn't get a valid answer for the IN AAAA queries (valid answer could include NXDOMAIN or NODATA status codes), retransmits them, sits around waiting
6) user wonders why the web page or whatever takes "forever" to load, goes to submit/comment on bugs such as this one
7) getaddrinfo() finally times out the IN AAAA queries, returns IPv4 results to the application
8) lather rinse repeat

AI_ADDRCONFIG *should* have solved this issue, by suppressing IN AAAA queries from IPv4-only machines. However, the auto-configured IPv6 link local addresses on all Ethernet interfaces, causes getaddrinfo() to consider that the machine has IPv6, and therefore it won't suppress IN AAAA queries anymore. More info here:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG#Problem_2:_IN_AAAA_DNS_query_suppression_from_Ethernet-connected_IPv4-only_hosts

Revision history for this message
In , psimerda (psimerda-redhat-bugs) wrote :

I see no reason why this shouldn't be fixed. We are working on solutions, all information here:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG

Related fedora feature page:

https://fedoraproject.org/wiki/Features/DualstackNetworking

Adding to the 'dualstack' tracker bug and modified the summary.

Revision history for this message
In , psimerda (psimerda-redhat-bugs) wrote :

*** Bug 697149 has been marked as a duplicate of this bug. ***

Revision history for this message
In , psimerda (psimerda-redhat-bugs) wrote :

*** Bug 459756 has been marked as a duplicate of this bug. ***

Revision history for this message
In , fedora-admin-xmlrpc (fedora-admin-xmlrpc-redhat-bugs) wrote :

This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.

Revision history for this message
In , bcotton (bcotton-redhat-bugs) wrote :

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Revision history for this message
In , markzzzsmith (markzzzsmith-redhat-bugs) wrote :

I disagree with making getaddrinfo() consider a host with only IPv6 link-local addresses to not have IPv6 connectivity. It does, it has IPv6 connectivity to all other IPv6 hosts on the directly attached links which also all have link-local addresses. This is actually the point of hosts automatically configuring link-local addresses on all interfaces all the time, and the IPv6 Addressing Architecture specifying that all interfaces must have link-local addresses - so that they can at a minimum always reach their on-link neighbors via link-local addressing. This is also why protocols such as IPv6 neighbor discovery, Multicast Listener Discovery and routing protocols such as OSPF use link-local addresses as source and/or destination addresses to reach their neighbors.

Combine autoconfigured IPv6 link-local addresses with a service discovery protocol such as Multicast DNS or SSDP, and you have Zero Configuration networking without any user intervention. Compare that with IPv4, where support for 169.254.0.0/16 is patchy, because it is done in userspace via DHCPv4. IPv6 will universally and reliably provide zero configuration networking.

I think it is reasonable to hosts more robust against broken devices in the network, but ignoring IPv6 link-local connectivity and then suppressing AAAA queries is not the solution. Happy Eyeballs (RFC6555) and IPv6 source and destination selection (RFC6724) are.

Revision history for this message
In , tore (tore-redhat-bugs-1) wrote :

(In reply to comment #64)
> I disagree with making getaddrinfo() consider a host with only IPv6
> link-local addresses to not have IPv6 connectivity.

That's not what this bug report is about. It's about suppressing DNS "IN AAAA" queries if the host only has link-local addresses and AI_ADDRCONFIG is supplied.

> Combine autoconfigured IPv6 link-local addresses with a service discovery
> protocol such as Multicast DNS or SSDP,

This bug report is specifically about DNS. It's not about MDNS, SSDP, /etc/hosts, or any other NSS backends.

> I think it is reasonable to hosts more robust against broken devices in the
> network, but ignoring IPv6 link-local connectivity and then suppressing AAAA
> queries is not the solution.

How is it useful for a host with only link-local addresses to perforn "IN AAAA" DNS queries? Keep in mind that in order to use a link-local address, you need to supply an interface scope (e.g., "fe80::1%eth0"), and that DNS cannot supply this information.

Tore

Revision history for this message
In , markzzzsmith (markzzzsmith-redhat-bugs) wrote :
Download full text (3.9 KiB)

(In reply to comment #65)
> (In reply to comment #64)
> > I disagree with making getaddrinfo() consider a host with only IPv6
> > link-local addresses to not have IPv6 connectivity.
>
> That's not what this bug report is about. It's about suppressing DNS "IN
> AAAA" queries if the host only has link-local addresses and AI_ADDRCONFIG is
> supplied.
>

I understand that.

The AI_ADDRCONFIG flag does not preclude link-local addresses. From RFC3493:

" If the AI_ADDRCONFIG flag is specified, IPv4 addresses shall be
   returned only if an IPv4 address is configured on the local system,
   and IPv6 addresses shall be returned only if an IPv6 address is
   configured on the local system. The loopback address is not
   considered for this case as valid as a configured address."

Note that loopback addresses are, so the designers specifically thought about exclusion of addresses types.

> > Combine autoconfigured IPv6 link-local addresses with a service discovery
> > protocol such as Multicast DNS or SSDP,
>
> This bug report is specifically about DNS. It's not about MDNS, SSDP,
> /etc/hosts, or any other NSS backends.
>

getaddrinfo() is not a DNS protocol specific API call, and is used in front of all those NSS backends, so that applications don't have to be exposed to how the address information was determined. For example, I run MDNS at home, and when I enable it, all of my IPv6 applications automatically work with it.

Here is what RFC3493 describes it as:

6.1 Protocol-Independent Nodename and Service Name Translation

   Nodename-to-address translation is done in a protocol-independent
   fashion using the getaddrinfo() function.

getaddrinfo() can return all the information necessary to use a link-local address i.e. both the address, and in the interface index, via the sin6_scope_id field of the sockaddr_in6 structure that is returned via the ai_addr field.

By making AI_ADDRCONFIG ignore link-local addresses, the getaddrinfo() call becomes broken for NSS backends that can provide both the link-local address and the corresponding interface index, such as MDNS or any other future ones.

Perhaps the DNS NSS backend could provide it, by returning the interface index of the interface it received the response on if a link-local address is returned. On the common single-homed host, this is likely to be the correct interface index for the link-local address.

> > I think it is reasonable to hosts more robust against broken devices in the
> > network, but ignoring IPv6 link-local connectivity and then suppressing AAAA
> > queries is not the solution.
>
> How is it useful for a host with only link-local addresses to perforn "IN
> AAAA" DNS queries? Keep in mind that in order to use a link-local address,
> you need to supply an interface scope (e.g., "fe80::1%eth0"), and that DNS
> cannot supply this information.
>

Something else outside of DNS could provide the interface information, and the application combines them. Specifying a hostname (perhaps in /etc/hosts) and an interface will be much simpler than specifying literal link-local addresses because getaddrinfo() won't lookup IPv6 addresses when the host only has link-local addresses.

T...

Read more...

Revision history for this message
In , horsley1953 (horsley1953-redhat-bugs) wrote :

I thought the obvious problem with this was addressed way up near the top in comment 20 when someone pointed out that using the same port for the IPv4 and IPv6 queries gave firewalls fits. Ulrich Drepper had one of his standard "purity over practicality" tantrums and refused to change it to use two different ports to accommodate dumb firewalls, but since Ulrich is gone now, perhaps saner heads could revisit that? (And perhaps revisit all other bug fixes rejected over the years by Ulrich? :-).

Revision history for this message
In , psimerda (psimerda-redhat-bugs) wrote :
Download full text (5.7 KiB)

(In reply to comment #66)
> The AI_ADDRCONFIG flag does not preclude link-local addresses. From RFC3493:
>
>
> " If the AI_ADDRCONFIG flag is specified, IPv4 addresses shall be
> returned only if an IPv4 address is configured on the local system,
> and IPv6 addresses shall be returned only if an IPv6 address is
> configured on the local system. The loopback address is not
> considered for this case as valid as a configured address."
>
> Note that loopback addresses are, so the designers specifically thought
> about exclusion of addresses types.

We are not discussing the designers' virtues but the technical issues. The RFC is (1) INFORMATIONAL and (2) wrong. For more detailed information, see:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG

(any comments of technical value welcome)

> For example, I run MDNS at home,
> and when I enable it, all of my IPv6 applications automatically work with it.

This is not true. Just try mDNS with link-local addresses (which you mentioned) and you will realize that this feature is absent with the current glibc and nss-mdns.

> getaddrinfo() can return all the information necessary to use a link-local
> address i.e. both the address, and in the interface index, via the
> sin6_scope_id field of the sockaddr_in6 structure that is returned via the
> ai_addr field.

getaddrinfo() can, while the NSS backends cannot. Therefore currently getaddrinfo() would only return scope_id for IPv6 literals, not mDNS nor any similar protocol.

> By making AI_ADDRCONFIG ignore link-local addresses, the getaddrinfo() call
> becomes broken for NSS backends

Currently false. You can't break a feature that is absent.

> Perhaps the DNS NSS backend could provide it,

You don't need scope_id for global addresses and you don't need this information with DNS responses at all.

> Something else outside of DNS could provide the interface information, and
> the application combines them.

I don't see the need for that. DNS returns global addresses. Global addresses don't need scope_id.

> Specifying a hostname (perhaps in /etc/hosts)

Not sure whether /etc/hosts can be used to provide scope_id.

> and an interface will be much simpler

There is currently no standard way to do that. And I don't think it is valuable enough to seek standardization for that.

> than specifying literal link-local
> addresses because getaddrinfo() won't lookup IPv6 addresses when the host
> only has link-local addresses.

There's a much easier solution. Just don't apply the same rules to mDNS you apply to DNS. I believe all of this is already described in:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG

> The "Happy Eyeballs" technique (RFC6555) wasn't just intended to be applied
> to web browsers, according to this draft from Fred Baker:
>
> "Happier Eyeballs"
> https://www.ietf.org/id/draft-baker-happier-eyeballs-00.txt
>
> and could probably be applied to the DNS "application". For example, off the
> top of my head:
>
> 1) issue a standard DNS query including both A and AAAA queries.

In the case described by this bug report, there's no need to query AAAA as global routing is not available anywa...

Read more...

Revision history for this message
In , tore (tore-redhat-bugs-1) wrote :
Download full text (3.6 KiB)

Pavel already responded to most of your points, so I'll try to avoid just repeating his points.)

(In reply to comment #66)

> getaddrinfo() is not a DNS protocol specific API call, and is used in front
> of all those NSS backends, so that applications don't have to be exposed to
> how the address information was determined. For example, I run MDNS at home,
> and when I enable it, all of my IPv6 applications automatically work with it.

Well, again, this isn't about MDNS. Nobody is suggesting to make the MDNS backend ignore link-locals if called from getaddrinfo() w/AI_ADDRCONFIG. This bug is specifically about the DNS backend's behaviour; MDNS is out of scope.

> By making AI_ADDRCONFIG ignore link-local addresses, the getaddrinfo() call
> becomes broken for NSS backends that can provide both the link-local address
> and the corresponding interface index, such as MDNS or any other future ones.

See above, this is about the DNS backend *only*.

> Perhaps the DNS NSS backend could provide it, by returning the interface
> index of the interface it received the response on if a link-local address
> is returned. On the common single-homed host, this is likely to be the
> correct interface index for the link-local address.

This is a flawed assumption, even in the single-homed host case. One obvious example: If you run a local caching resolver (which I believe NetworkManager has native support for doing these days), you'll end up with all the returned link-local addresses being scoped to the "lo" interface, which is probably not what you want.

> Something else outside of DNS could provide the interface information, and
> the application combines them. Specifying a hostname (perhaps in /etc/hosts)
> and an interface will be much simpler than specifying literal link-local
> addresses because getaddrinfo() won't lookup IPv6 addresses when the host
> only has link-local addresses.

It would appear to me that the proper thing for such an application to do is to simply not use AI_ADDRCONFIG. However, does such an application actually exist, or are you inventing it just to support your position?

> 1) issue a standard DNS query including both A and AAAA queries.

Not possible. There is no single DNS query that requests both A and AAAA responses. (If, by any chance, you want to say "ANY" right now - don't, it doesn't do what you think it does.)

> 2) if no response is received after 400ms (roughly half way around the
> world), issue two individual queries, one for an A and one for an AAAA.

Two individual queries is what's being done today, and that's the only thing you can do. Also, 400ms, even multiple seconds, is too short a timeout - the major part a DNS lookup isn't the single RTT to the resolver listed in /etc/resolv.conf - it's waiting for that resolver to actually find the record in question. This is the sum of all RTTs to all the authoritative name servers in the delegation chain, potentially including timeouts and retransmits at some of the steps.

The only way to get Happy Eyeballs-ish behaviour using getaddrinfo() is if you run an IPv4-only thread with getaddrinfo(AF_INET)->connect(AF_INET), and a similar one for AF_INET6. You can't do it wit...

Read more...

Revision history for this message
In , markzzzsmith (markzzzsmith-redhat-bugs) wrote :
Download full text (6.3 KiB)

(In reply to comment #69)
> Pavel already responded to most of your points, so I'll try to avoid just
> repeating his points.)
>
> (In reply to comment #66)
>
> > getaddrinfo() is not a DNS protocol specific API call, and is used in front
> > of all those NSS backends, so that applications don't have to be exposed to
> > how the address information was determined. For example, I run MDNS at home,
> > and when I enable it, all of my IPv6 applications automatically work with it.
>
> Well, again, this isn't about MDNS. Nobody is suggesting to make the MDNS
> backend ignore link-locals if called from getaddrinfo() w/AI_ADDRCONFIG.
> This bug is specifically about the DNS backend's behaviour; MDNS is out of
> scope.
>

There were no qualifiers on your described behaviour. You may have been talking about DNS, but the description of the change of behaviour to AI_ADDRCONFIG did not specify that it was limited to the DNS backend.

> > By making AI_ADDRCONFIG ignore link-local addresses, the getaddrinfo() call
> > becomes broken for NSS backends that can provide both the link-local address
> > and the corresponding interface index, such as MDNS or any other future ones.
>
> See above, this is about the DNS backend *only*.
>

Again, you had no qualifiers.

> > Perhaps the DNS NSS backend could provide it, by returning the interface
> > index of the interface it received the response on if a link-local address
> > is returned. On the common single-homed host, this is likely to be the
> > correct interface index for the link-local address.
>
> This is a flawed assumption, even in the single-homed host case. One obvious
> example: If you run a local caching resolver (which I believe NetworkManager
> has native support for doing these days), you'll end up with all the
> returned link-local addresses being scoped to the "lo" interface, which is
> probably not what you want.
>

Then the cache is broken. It should be caching all the information that would be returned in the sockaddr structure returned to getaddrinfo(), not just the returned IP addresses.

> > Something else outside of DNS could provide the interface information, and
> > the application combines them. Specifying a hostname (perhaps in /etc/hosts)
> > and an interface will be much simpler than specifying literal link-local
> > addresses because getaddrinfo() won't lookup IPv6 addresses when the host
> > only has link-local addresses.
>
> It would appear to me that the proper thing for such an application to do is
> to simply not use AI_ADDRCONFIG. However, does such an application actually
> exist, or are you inventing it just to support your position?
>

I don't know if an application like this exists, but I doubt you know absolutely that it doesn't exist. Where are the restrictions that say such an application can't exist? The definition of AI_ADDRCONFIG didn't prohibit them, or even make recommendations against them.

You're asserting that link-locals aren't being stored in DNS. How do you know that? Have you queried all the DNS space in the world?

There has been no prohibitions on link-local addresses being put in DNS, and now you are creating one, and are asserting that as you've...

Read more...

Revision history for this message
In , tore (tore-redhat-bugs-1) wrote :
Download full text (4.2 KiB)

(In reply to comment #70)

> There were no qualifiers on your described behaviour. You may have been
> talking about DNS, but the description of the change of behaviour to
> AI_ADDRCONFIG did not specify that it was limited to the DNS backend.

The title of this bug is:

«getaddrinfo() with AI_ADDRCONFIG doesn't suppress ****AAAA DNS queries**** on IPv4-only networks»

(emphasis mine)

If that's not a crystal clear qualifier, I don't know what is.

> Then the cache is broken. It should be caching all the information that
> would be returned in the sockaddr structure returned to getaddrinfo(), not
> just the returned IP addresses.

glibc speaks to a caching resolver (e.g. dnsmasq) on 127.0.0.1 or ::1, using the regular DNS protocol. The DNS protocol has no means of communicating an interface scope ID. So how exactly would this work?

> You're asserting that link-locals aren't being stored in DNS. How do you
> know that? Have you queried all the DNS space in the world?

No, but I am asserting that storing link-locals in DNS is completely pointless, as it cannot possibly work, because there is no way the DNS protocol can communicate a scope ID to the querier.

> There has been no prohibitions on link-local addresses being put in DNS, and
> now you are creating one, and are asserting that as you've never seen reason
> to do so, nobody else has either.
>
> Here is a realistic scenario where link-local addresses would usefully be
> stored in DNS.
>
> An organisation may want to create organisation wide unique static
> link-local addresses, assigning them to their routers' interfaces. This
> would make the link-local addresses independent of the MAC addresses of the
> routers interfaces, and would also make the use of link-local addresses as
> e.g., static route next hops, simpler and less error prone because there are
> no intentional duplicates. e.g., their first router's first configured
> interface would have fe80::1, their e.g. 10th router's first configured
> interface might be fe80::15, depending on how many interfaces the other
> routers have.
>
> To document the static link local addresses the following sorts of DNS
> records are created (using router10, interface eth0 as an example)
>
> eth0.rtr10.example.com. IN AAAA fe80::15
> eth0.rtr10.example.com. IN TXT "Ethernet 0 on Router 10, MAC addr
> 02:00:00:00:00:01"
>
> 5.1.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.8.e.f.ip6.arpa.
> IN PTR eth0.rtr10.example.com.

Red herring. Making getaddrinfo() with AI_ADDRCONFIG suppress IN AAAA queries have nothing to do with this use of DNS as essentially a documentation tool, and would not "prohibit" this in any way.

What you can't do, though, is e.g. "ssh eth0.rtr10.example.com" - regardless of presence of IPv4 addresses on the host, IPv6 addresses on the host, and whether or not the ssh application uses AI_ADDRCONFIG.

> The happy eyeball behaviour would be within the DNS backend, hidden from the
> application behind the getaddrinfo(,AI_ADDRCONFIG) call.

Well, you can say that this behaviour is already present within getaddrinfo(). If called without AI_ADDRCONFIG (or with on a host that has the required global addresses configur...

Read more...

Revision history for this message
In , markzzzsmith (markzzzsmith-redhat-bugs) wrote :
Download full text (7.0 KiB)

(In reply to comment #71)
> (In reply to comment #70)
>
> > There were no qualifiers on your described behaviour. You may have been
> > talking about DNS, but the description of the change of behaviour to
> > AI_ADDRCONFIG did not specify that it was limited to the DNS backend.
>
> The title of this bug is:
>
> «getaddrinfo() with AI_ADDRCONFIG doesn't suppress ****AAAA DNS queries****
> on IPv4-only networks»
>
> (emphasis mine)
>
> If that's not a crystal clear qualifier, I don't know what is.
>

You continue to miss the point. *Your* proposal on how to *fix* the problem was to change the behaviour of AI_ADDRCONFIG, regardless of the backend, because you didn't *specify* what backend your solution applied to.

Even then, the title is not actually saying what the problem is. The actual problem is CPE or DNS servers that did not handle IPv6 AAAA queries correctly.

> > Then the cache is broken. It should be caching all the information that
> > would be returned in the sockaddr structure returned to getaddrinfo(), not
> > just the returned IP addresses.
>
> glibc speaks to a caching resolver (e.g. dnsmasq) on 127.0.0.1 or ::1, using
> the regular DNS protocol.

It can also speak to a caching resolver directly, as it does with nscd. That is the cache that I though you were talking about, because it is part of glibc.

> The DNS protocol has no means of communicating an
> interface scope ID. So how exactly would this work?
>
> > You're asserting that link-locals aren't being stored in DNS. How do you
> > know that? Have you queried all the DNS space in the world?
>
> No, but I am asserting that storing link-locals in DNS is completely
> pointless,

Well, as you saw, I pointed out a valid and reasonable use for storing link-locals in DNS, so it isn't completely pointless.

> as it cannot possibly work, because there is no way the DNS
> protocol can communicate a scope ID to the querier.
>

It doesn't need to, that information can be gleaned from some other source, such as a command line option, or a configuration file. In this instance, DNS is still useful, because it is providing a much simpler and easier to type name for an IPv6 link-local address, even though in itself it isn't enough information to use the returned link-local address by itself.

> > There has been no prohibitions on link-local addresses being put in DNS, and
> > now you are creating one, and are asserting that as you've never seen reason
> > to do so, nobody else has either.
> >
> > Here is a realistic scenario where link-local addresses would usefully be
> > stored in DNS.
> >
> > An organisation may want to create organisation wide unique static
> > link-local addresses, assigning them to their routers' interfaces. This
> > would make the link-local addresses independent of the MAC addresses of the
> > routers interfaces, and would also make the use of link-local addresses as
> > e.g., static route next hops, simpler and less error prone because there are
> > no intentional duplicates. e.g., their first router's first configured
> > interface would have fe80::1, their e.g. 10th router's first configured
> > interface might be fe80::15, depending on how many int...

Read more...

Revision history for this message
In , tore (tore-redhat-bugs-1) wrote :
Download full text (7.9 KiB)

(In reply to comment #72)

> You continue to miss the point. *Your* proposal on how to *fix* the problem
> was to change the behaviour of AI_ADDRCONFIG, regardless of the backend,
> because you didn't *specify* what backend your solution applied to.

My very first response to you in this thread, in comment #65, began like this:

«That's not what this bug report is about. It's about suppressing DNS "IN AAAA" queries [...]»

So how you can claim that I am not clear about talking specifically about DNS is beyond me, to be honest. It should in any case be clear by now, I hope.

> Even then, the title is not actually saying what the problem is. The actual
> problem is CPE or DNS servers that did not handle IPv6 AAAA queries
> correctly.

Such CPEs and DNS servers are buggy, true. However, it is in our users' best interest to help them avoid tickling these bugs, because it leads to crappy user experiences and bug reports with a huge number of subscribers:

https://bugzilla.redhat.com/show_bug.cgi?id=459756
https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/417757

It's sucks extra, because this is perceived to be a Linux-specific problem. MS Windows and Apple Max OS X does interpret the AI_ADDRCONFIG flag in the proposed way (i.e., it will suppress IN AAAA queries if the host only has link-local addresses configured. (I haven't verified that this behaviour still is in place in the latest versions of those operating systems though.)

> > glibc speaks to a caching resolver (e.g. dnsmasq) on 127.0.0.1 or ::1, using
> > the regular DNS protocol.
>
> It can also speak to a caching resolver directly, as it does with nscd. That
> is the cache that I though you were talking about, because it is part of
> glibc.

NSCD is *not* a resolver. NSCD knows nothing of AAAA queries or the DNS protocol at all. The only thing NSCD can do is to cache results that came from NSS backends, such as - you guessed it - DNS.

> Well, as you saw, I pointed out a valid and reasonable use for storing
> link-locals in DNS, so it isn't completely pointless.

This is still a red herring. If you use DNS as a documentation tool like you've outlined, there's no reason why you'd use AI_ADDRCONFIG when extracting the records. Otherwise the "documentation" would look different when read on a computer with no IPv6 addresses (not even link-locals), or on a Mac/Windows computer with IPv6 link-locals, than it would when read on a machine with global IPv6 (or a Linux machine with IPv6 link-locals). I find it far-fetched that anyone would use getaddrinfo() for "reading" such "DNS documentation" to begin with, as you cannot retrieve your TXT records with it, for example.

> > as it cannot possibly work, because there is no way the DNS
> > protocol can communicate a scope ID to the querier.
>
> It doesn't need to, that information can be gleaned from some other source,
> such as a command line option, or a configuration file.

FWIW, it simply doesn't work in a fully updated Fedora 18:

tore@wrath:~$ host ll.fud.no
ll.fud.no has IPv6 address fe80::230:1bff:febc:7f23
tore@wrath:~$ ssh ll.fud.no%eth0
ssh: Could not resolve hostname ll.fud.no%eth0: Name or service not known
tore@wrath:~$ ssh f...

Read more...

Revision history for this message
In , psimerda (psimerda-redhat-bugs) wrote :
Download full text (4.7 KiB)

(In reply to comment #73)
> My very first response to you in this thread, in comment #65, began like
> this:
>
> «That's not what this bug report is about. It's about suppressing DNS "IN
> AAAA" queries [...]»
>
> So how you can claim that I am not clear about talking specifically about
> DNS is beyond me, to be honest. It should in any case be clear by now, I
> hope.

I think it is crystal clear and if anyone wants to have a good summary, there's still:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG

> > Even then, the title is not actually saying what the problem is. The actual
> > problem is CPE or DNS servers that did not handle IPv6 AAAA queries
> > correctly.
>
> Such CPEs and DNS servers are buggy, true. However, it is in our users' best
> interest to help them avoid tickling these bugs, because it leads to crappy
> user experiences and bug reports with a huge number of subscribers:

+1

> It's sucks extra, because this is perceived to be a Linux-specific problem.
> MS Windows and Apple Max OS X does interpret the AI_ADDRCONFIG flag in the
> proposed way (i.e., it will suppress IN AAAA queries if the host only has
> link-local addresses configured.

Please keep being specific whether you're talking about DNS or generally. I don't think the Apple folks would break their link-local name resolution deliberately.

> NSCD is *not* a resolver.

+1

Using NSCD with network name resolution and AI_ADDRCONFIG sounds dangerous to me.

> > Well, as you saw, I pointed out a valid and reasonable use for storing
> > link-locals in DNS, so it isn't completely pointless.

This is irrelevant to the problem in this bug report, as NSS backends currently don't convey scope_id at all.

With that in mind, I think we should stop polluting this bug report with link-local in DNS as it's irrelevant in the current situation. Start a new bug report and link it from here, if you're still interested, and describe your use case there.

> The standard CLI frontend for getaddrinfo(), "getent ahosts", *doesn't* use
> AI_ADDRCONFIG, for a very good reason.

+1

According to my own micro-research, AI_ADDRCONFIG only good for one specific purpose which is a loop over getaddrinfo results with connect() in each step.

https://fedoraproject.org/wiki/Networking/NameResolution#Connecting_to_services_using_getaddrinfo.28.29

> AI_ADDRCONFIG is counter-productive if your ultimate goal is to learn what
> address records are in DNS, because it would arbitrarily hide records from
> you depending on the machine you're running it on - IN AAAA records would be
> hidden on IPv4-only machines, and IN A records would be hidden on IPv6-only
> machines.

+1

> AI_ADDRCONFIG is only useful if the main goal isn't to dump the addresses,
> but to actually get an address that in turn will be used as a destination
> for communication with. As in, when getaddrinfo() is just a mandatory step
> towards the ultimate goal making a connect() somewhere.

Exactly. Any other discussions than those related to getaddrinfo()+connect() should be kept off this bug report.

> host tore@wrath:~$ host ll.fud.no
> ll.fud.no has IPv6 address fe80::230:1bff:febc:7f23
> tore@wrath:~$ ssh ll...

Read more...

no longer affects: network-manager (Ubuntu)
no longer affects: network-manager (Ubuntu Karmic)
no longer affects: network-manager (Ubuntu Lucid)
Revision history for this message
In , bcotton (bcotton-redhat-bugs) wrote :

This message is a notice that Fedora 19 is now at end of life. Fedora
has stopped maintaining and issuing updates for Fedora 19. It is
Fedora's policy to close all bug reports from releases that are no
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 19 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Revision history for this message
In , oliver.henshaw (oliver.henshaw-redhat-bugs) wrote :

"options single-request-reopen" in /etc/resolv.conf seems to be an effective workaround for a broken DNS resolver, even for applications that don't use AI_ADDRCONFIG.

Maybe this behaviour could be chosen unconditionally on nodes where the only IPv6 addresses are link-local? Or is this best discussed in another bug?

Revision history for this message
In , horsley1953 (horsley1953-redhat-bugs) wrote :

(In reply to Phil Oester from comment #9)
> But the question remains, WHY did the behavior change? Originally, glibc
> DID use unique ports for the AAAA and A queries. From a "predictability"
> perspective, that is a more secure approach, no? Similar to how ISNs are
> now randomized in TCP.
>
> It seems many people's problems would be solved by going back to the
> (arguably more secure) method of using distinct ports for the A and AAAA
> queries.

Since Ulrich is no longer around to defend to the death indefensible decisions, maybe it is time to just go ahead and put back the separate ports, the elimination of which caused all the problems in the first place.

Revision history for this message
In , codonell (codonell-redhat-bugs) wrote :

(In reply to Tom Horsley from comment #77)
> (In reply to Phil Oester from comment #9)
> > But the question remains, WHY did the behavior change? Originally, glibc
> > DID use unique ports for the AAAA and A queries. From a "predictability"
> > perspective, that is a more secure approach, no? Similar to how ISNs are
> > now randomized in TCP.
> >
> > It seems many people's problems would be solved by going back to the
> > (arguably more secure) method of using distinct ports for the A and AAAA
> > queries.
>
> Since Ulrich is no longer around to defend to the death indefensible
> decisions, maybe it is time to just go ahead and put back the separate
> ports, the elimination of which caused all the problems in the first place.

The glibc community is consensus driven. Someone needs to write up a plan and drive it forward. The glibc team can do this, but this particular issue is lower on the overall priority list for stub resolver fixes. Principally we have no way to test this easily, so we're trying to build out our testing infrastructure to get coverage. In the past this was all tested by hand, and we can see how badly that turned out.

Revision history for this message
In , oliver.henshaw (oliver.henshaw-redhat-bugs) wrote :

Testing on the F21 live image, I don't have a problem.

Probsbly this is https://sourceware.org/git/?p=glibc.git;a=commit;h=16b293a7a6f65d8ff348a603d19e8fd4372fa3a9 in glibc 2.20. I wonder if this resolves all broken DNS resolver issues - is there anyone on F21 who still has problems with bad routers and AAAA DNS queries in getaddrinfo()?

Rolf Leggewie (r0lf)
Changed in eglibc (Ubuntu Lucid):
status: Triaged → Won't Fix
Revision history for this message
In , jkurik (jkurik-redhat-bugs) wrote :

This bug appears to have been reported against 'rawhide' during the Fedora 23 development cycle.
Changing version to '23'.

(As we did not run this process for some time, it could affect also pre-Fedora 23 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 23 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora23

Martin Pitt (pitti)
Changed in eglibc (Ubuntu):
status: Triaged → Invalid
Revision history for this message
In , bcotton (bcotton-redhat-bugs) wrote :

This message is a reminder that Fedora 23 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 23. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as EOL if it remains open with a Fedora 'version'
of '23'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not
able to fix it before Fedora 23 is end of life. If you would still like
to see this bug fixed and are able to reproduce it against a later version
of Fedora, you are encouraged change the 'version' to a later Fedora
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

Revision history for this message
In , oliver.henshaw (oliver.henshaw-redhat-bugs) wrote :

Seems like no-one has reported a problem since the F21 release.

Revision history for this message
In , fweimer (fweimer-redhat-bugs) wrote :

(In reply to Oliver Henshaw from comment #79)
> Testing on the F21 live image, I don't have a problem.
>
> Probsbly this is
> https://sourceware.org/git/?p=glibc.git;a=commit;
> h=16b293a7a6f65d8ff348a603d19e8fd4372fa3a9 in glibc 2.20. I wonder if this
> resolves all broken DNS resolver issues - is there anyone on F21 who still
> has problems with bad routers and AAAA DNS queries in getaddrinfo()?

Agreed. We have not seen further reports of the issue, so closing this bug.

Revision history for this message
In , cpanceac (cpanceac-redhat-bugs) wrote :

However, i've seen sometimes when a web page takes many seconds to load, but since i've not investigated the problem. the root cause may be completely different.

Changed in glibc (Fedora):
importance: Unknown → High
status: Confirmed → Fix Released
Displaying first 40 and last 40 comments. View all 418 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.