Bug #417757 “[regression] all network apps / browsers suffer fro...” : Bugs : eglibc package : Ubuntu

Revision history for this message

camper365 (camper365) wrote on 2009-08-23:

#1

Dependencies.txt Edit (3.3 KiB, text/plain; charset="utf-8")
ExtensionSummary.txt Edit (290 bytes, text/plain; charset="utf-8")
profile_default_pluginreg.dat.txt Edit (3.3 KiB, text/plain; charset="utf-8")
profiles.ini.txt Edit (94 bytes, text/plain; charset="utf-8")

Revision history for this message

Micah Gersten (micahg) wrote on 2009-08-24:

#2

Thank you for reporting this to Ubuntu. Could you please see if you have the same trouble with other browsers such as epiphany-webkit or midori?

Changed in firefox-3.5 (Ubuntu):
status:	New → Incomplete

Revision history for this message

camper365 (camper365) wrote on 2009-08-24:

#3

Yes it does apply to other browsers.

Revision history for this message

Micah Gersten (micahg) wrote on 2009-08-26:

#4

This would appear to be more than a Firefox problem since other browsers are involved. I'm removing the Firefox 3.5 package from the bug and asking for reassignment to the appropriate package.

tags:	added: needs-reassignment
affects:	firefox-3.5 (Ubuntu) → ubuntu
Changed in ubuntu:
status:	Incomplete → New
summary:	- Firefox is slow by default due to IPv6 DNS lookups + Browsers are slow by default due to IPv6 DNS lookups

Revision history for this message

Martin Olsson (mnemo) wrote on 2009-09-06: Re: Browsers are slow by default due to IPv6 DNS lookups

#5

I've been struggling with this bug as well, for me it started with updates I installed on 3rd sept even though I had no problems like this in karmic earlier (at this point I installed updates from about two weeks back though). It affects all network apps (not just browsers). I originally filed a ticket with my ISP because I thought it was their DNS servers that were slow.

Revision history for this message

Martin Olsson (mnemo) wrote on 2009-09-06:

#6

What I was seeing what 20-40 seconds page loads for certain webpages, when I set network.dns.disableIPv6 to true most pages loads with 1-3 seconds.

Martin Olsson (mnemo) on 2009-09-06

summary:	- Browsers are slow by default due to IPv6 DNS lookups + [karmic regression] all network apps / browsers suffer from multi-second + delays by default due to IPv6 DNS lookups
affects:	ubuntu → linux (Ubuntu)

Revision history for this message

Jeroen Massar (massar) wrote on 2009-09-06: Re: [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

#7

This is a problem with the DNS resolver.

This problem will occur for any DNS request which the DNS resolver does not support.
The proper solution is to fix the DNS resolver.

What happens:
- Program is IPv6 enabled.
- When it looks up a hostname, getaddrinfo() asks first for a AAAA record
- the DNS resolver sees the request for the AAAA record, goes "uhmmm I dunno what it is, lets throw it away"
- DNS client (getaddrinfo() in libc) waits for a response..... has to time out as there is no response. (THIS IS THE DELAY)
- No records received yet, thus getaddrinfo() goes for a the A record request. This works.
- Program gets the A records and uses those.

This does NOT only affect IPv6 (AAAA) records, it also affects any other DNS record that the resolver does not support.
Generally these resolvers are embedded into the "NAT boxes" that consumers have.

Working solution, as we are on Linux anyway: don't use the DNS resolver in the NAT box, but install eg pdns-recursor and use that.

Of course that does not fix the broken box, which might be the NAT box, or the resolvers at the ISP.
Some other people start using OpenDNS because those "work" (But that is not really true either: https://lists.dns-oarc.net/pipermail/dns-operations/2009-July/004217.html)

Note that the DNS queries go over IPv4 (transport), there is no IPv6 _connectivity_ involved here.

Revision history for this message

In Red Hat Bugzilla #505105, clodoaldo.pinto.neto (clodoaldo.pinto.neto-redhat-bugs) wrote on 2009-09-17:

#379

@44(In reply to comment #44)

I opened another bug as this one is not the one I'm experiencing:

https://bugzilla.redhat.com/show_bug.cgi?id=520304

Revision history for this message

Markus Thielmann (thielmann) wrote on 2009-09-18:

#8

I'm not convinced, that this is a resolver bug. I'm running an IPv6 enabled system (aiccu tunnel with sixxs.net), so all IPv6 requests are answered by an IPv6 enabled DNS server. I'm still experiencing the same problems. Additional to that, this bug was introduced by Karmic and didn't happen before.

Revision history for this message

In Red Hat Bugzilla #505105, matzilla (matzilla-redhat-bugs) wrote on 2009-09-27:

#380

Created attachment 362819
Capture of dns trafic while wget

Capture of dns trafic while wget lwn.net with fedora11+normal updates
provider is orange/france telecom. dns server looks like it's the ip of the livebox (there is several millions of users behind a livebox I think)

wget a little more than 10s
real 0m10.922s
user 0m0.005s
sys 0m0.013s

dns capture gives :
Fedora send two request , one A and one AAAA
answer for A is given at once
after 5s, retry of A (why ? we already have received the result ...)
answer with A immediately
then retry of AAAA, this time after the A answer come

after 10s, there is no dns trafic but wget decides to get the page (it was resolving host before)

after15s and 20s, the dns server timeout the aaaa answer

this looks timing related and dependant on the behaviour of the dns server as the same pc has a different behaviour when connecting to another provider.
as the dns answer immediately with a good ip,why is the linux waiting for so long ?

so my questions :
is the linux correctly getting the first A answer or is it rejecting it (believing it to the be an anwser not matching for the aaaa) ?
what are the timing used ?

Revision history for this message

In Red Hat Bugzilla #505105, matzilla (matzilla-redhat-bugs) wrote on 2009-09-27:

#381

I'm using glibc 2.10.1-5
disabling ipv6 in about config firefox also workaround the pb.(didn't try globally as this should be working by default)

Revision history for this message

Bernard Bou (bbou) wrote on 2009-10-05:

#9

The 5 second lag occurs with the Livebox (used by Orange, 12 million broadband internet customers in Europe). Better fix this unless you want a number of users to tweak their config files to either disable ipv6 or disable box-based dns server, not something anybody enjoys doing.

Revision history for this message

max123 (maxrest) wrote on 2009-10-07:

#10

I also suffer from this problem, it _is_ the DNS-resolver, like Jeroen analysed - there should really be a fix for Karmic RS..

Revision history for this message

Jeroen Massar (massar) wrote on 2009-10-07:

#11

@ Markus's #8 comment: as I mentioned "Note that the DNS queries go over IPv4 (transport), there is no IPv6 _connectivity_ involved here.".

You also state 'so all IPv6 requests are answered by an IPv6 enabled DNS server."; well, unless you configured IPv6 DNS resolver addresses in your /etc/resolv.conf then queries will still go over IPv4 (transport), even though they are AAAA queries. AICCU only provides IPv6 connectivity (transport) it does not configure DNS resolvers though.

@ Bernard's #9 comment: most likely your livebox contains one of these broken DNS resolvers. Happens a lot that CPEs have this issue. Try the below to check this out. Configuring resolv.conf with OpenDNS or other working DNS servers (eg the ones of your ISP directly, instead of the livebox) might solve your problem. Do also please realize that this problem ALSO occurs on other platforms than Linux, eg Windows, which is what the majority of people are using; what to use is a choice of the user afterall....

To verify this, do a:
for i in `cat /etc/resolv.conf | grep ^nameserver | cut -f2 -d' '`; do dig @$i www.microsoft.com AAAA; done

This should return quite quickly, even though no AAAA records for www.microsoft.com exist yet. Now, if you have a broken resolver somewhere along the way, these requests won't return quickly (unless they are locally or on-path cached as negative).

Revision history for this message

Stephen Hall (stephen-richard-hall) wrote on 2009-10-07:

#12

I have had the same problem, affecting all network activities. Particularly a problem when performing upgrades via aptitude. Problem completely solved by specifying Opendns as my DNS servers. I do not have this problem when running Jaunty, XP or Vista.

Revision history for this message

David Solbach (d-vidsolbach) wrote on 2009-10-11:

#13

Just updated to karmic and experienced the same problem (1-3 second delays on dns lookup).
Switching off IPv6 dns support in firefox "fixes" the problem.

Do do that open up Mozilla or Firefox and type in 'about:config' in the address bar
Scroll down to "network.dns.disableIPv6", it's defaulted to a value of false, change it to true.

hope that helps.

Revision history for this message

camper365 (camper365) wrote on 2009-10-15:

#14

That's a "fix" in Firefox, but the problem still exists in every other network app (evolution, aptitude, etc.)
So web browsing (which is what most users are doing anyway) is normal speed but everything else is behaving like you have dial-up.
After the final release, if the bug isn't fixed in every app people might start complaining to their isp or even drop ubuntu (or not upgrade to Karmic)

Revision history for this message

Pconfig (thomas9999) wrote on 2009-10-18:

#15

I also notice the same problem in kubuntu. I remember this happened before on my upgrade to intrepid.

Revision history for this message

Brian Pitts (bpitts) wrote on 2009-10-18:

#16

This is still present in the today's build. I don't understand why this isn't prioritized as release critical, since it makes web browsing and other network-related tasks unbearably slow.

Revision history for this message

max123 (maxrest) wrote on 2009-10-18:

#17

Right, I also stress the incredible delay in thunderbird, network apps on the shell, update manager and everything else network related due to ipv6 lookups without having an ipv6 ip again!

At the university, where I get ipv6 as well, everything works as usual but at home with ipv4 its hardly usable..

Please investigate into this bug, I provide myself for testing things on this topic..

regards, max

Revision history for this message

Pconfig (thomas9999) wrote on 2009-10-18:

#18

Temporary workaround can be found here:

http://ubuntuforums.org/archive/index.php/t-1281820.html

This proves that it has something to do with DNS resolving.

Revision history for this message

Jeroen Massar (massar) wrote on 2009-10-18:

#19

For everybody not reading the other comments, #7 actually explains what goes on....

Yes, indeed, probably the best solution is to use just install a local DNS resolver (pdns-resolver), which hits the roots/gtld's etc itself. This is not very friendly to the general Internet, but heck, with the largest DNS server doing short TTLs and based on geography it might not matter too much.

Thus kids, "apt-get install pdns-recursor" and edit your /etc/resolv.conf to point to 127.0.0.1 when you get hit by this issue.

Revision history for this message

Pconfig (thomas9999) wrote on 2009-10-18:

#20

I really think the opendns workaround is better at the time. But both solutions aren't good enough. You can't tell your grandmother to edit some config files because her internet is slow

Revision history for this message

camper365 (camper365) wrote on 2009-10-18:

#21

I agree that this bug should be considered release critical, if not just applying the workaround. What could be added to network-manager is a feature for when you connect it tries to obtain an ipv6 dns and if it succeeds, it uses the network dns resolver or if it fails then it uses pdns-resolver (I just don't think that would work for this release, maybe in Lucid)

Revision history for this message

Zack Evans (zevans23) wrote on 2009-10-19:

#22

I have had a privoxy go-slow - several seconds on every lookup - since installing Karmic beta. Hadn't really noticed a problem in any other app but web browsing did sometimes feel sluggish.

In a brainwave just now I have tried disabling ipv6 (using grub method) and now privoxy is working beautifully. I have also noticed that web browsing feels snappier generally, so I think this was slowing -all- of my apps down by a large enough fraction for me to feel the difference now.

Just to reiterate: it's repeatable for me with privoxy. IPV6 on - privoxy massive latency. IPV6 off - privoxy works fine.

I have a Draytek so blaming the router isn't practical - these have a MASSIVE installed base. Whether it's strictly the router's fault or not, it would not be ubuntu of Ubuntu to get all academically correct about it, we need some sort of workaround that can be achieved by clicking buttons.

To be honest, only the advanced users would want IPv6 anyway, so why not have it off by default and make it very easy to switch on?

Revision history for this message

Zack Evans (zevans23) wrote on 2009-10-19:

#23

Should also say quite happy to test any other proposed workaround.

Revision history for this message

camper365 (camper365) wrote on 2009-10-19:

#24

I have found that when I ping a site (for example, www.google.com) and I ping the url (www.google.com) it takes a while, but if I ping the IP address (63.251.179.13) then the lag is gone

Revision history for this message

Ragnarel (ragnarel) wrote on 2009-10-19: Re: [Bug 417757] Re: [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

#25

¿Why I haven't delays with wireless connection and yes with wired?

Revision history for this message

Jeroen Massar (massar) wrote on 2009-10-19: Re: [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

#26

@ Pconfig / #20

> You can't tell your grandmother to edit some config files because her internet is slow

Does your grandmother use Ubuntu then? If so, then just help her out in fixing the issue :)

@ Zack Evans / #23

> I have a Draytek so blaming the router isn't practical - these have a MASSIVE installed base

This problem also is in effect when the user has Windows and IPv6 enabled on that. The problem lies in the DNS resolver (which might not be the NAT box (what you call "router") but might be even your ISP, and thus you can avoid the problem by not using the DNS resolver in the NAT box. You might of course also try to upgrade your router, maybe they fixed the problem (you upgrade your Ubuntu and other things too, because they have issues, thus try that)

> To be honest, only the advanced users would want IPv6 anyway, so why not have it off by default and make it very easy to switch on?

Because in a few years or so you will have to enable IPv6 as there won't be any new hosts with IPv4 addresses. As such, better bite the apple today and fix those IPv6 issues, then wait till you really need it.

@ camper365 / #24

yes, that is correct, as when you ping www.google.com it has to lookup the hostname in DNS, while if you ping the address, it doesn't. DNS resolving (thus figuring out which address belongs to the requested hostname) is where the problem lies. See the hints about OpenDNS or pdns-recursor to solve it.

@ Ragnarel / #25

as per comment #11 try a:
for i in `cat /etc/resolv.conf | grep ^nameserver | cut -f2 -d' '`; do dig @$i www.microsoft.com AAAA; done
when connected to wireless and when not connected to wireless. Or just for that matter, check if you are using the same nameservers when connected to wireless and wired, if they are different then you already got a small part of the answer.

Revision history for this message

Martin Olsson (mnemo) wrote on 2009-10-19:

#27

Since we're running out of time, maybe we can just ship "network.dns.disableIPv6==true" as the Firefox default? I'd love a real fix for this bug but the RC is coming up very very soon now.

Revision history for this message

Markus Thielmann (thielmann) wrote on 2009-10-19:

#28

Is it possible, that some patch changed the usage order of the nameserver from /etc/resolv.conf?

My router does deliver a "dead" nameserver via DHCP [1], which was never a problem since Ubuntu used to question the first (local) nameserver. The local nameserver resolves any given request without a problem [2]. If I remove the dead nameserver from resolv.conf, I no longer have any problems resolving DNS queries.

So it *might* be a solution to just change the usage order of the DNS servers to solve this "bug". Please notice, that a lot of users never experienced this problem before Karmic, so it might be hard to blame their hardware for this, even if it might be technically true... :-)

[1] It's a SE515, which delivers 217.237.151.97, despite any configuration.
[2] dig @192.168.1.1 www.microsoft.com AAAA without any noticeable delay

Revision history for this message

Micah Gersten (micahg) wrote on 2009-10-19:

#29

There's at least enough information here to confirm the issue. I'll see if I can get someone to look at it.

Changed in network-manager (Ubuntu):
importance:	Undecided → High
status:	New → Confirmed

Revision history for this message

DodgeV83 (spamfrelow) wrote on 2009-10-20:

#30

This 100% fixed my problem!

1. In /etc/dhcp3/dhclient.conf add the following line:

prepend domain-name-servers 208.67.222.222,208.67.220.220;

2. In /etc/nsswitch.conf edit this line

hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

to this

hosts: files dns

I'm not sure which one of these did it to be honest, but it's fixed!

Revision history for this message

Darren Worrall (dazworrall) wrote on 2009-10-20:

#31

My results are a little different. At the moment I'm using a draytek router, and am indeed suffering slow resolution in all my apps. Running the snippet above:

for i in `cat /etc/resolv.conf | grep ^nameserver | cut -f2 -d' '`; do dig @$i www.microsoft.com AAAA; done

Is very quick though. I consistently have slow resolution when running updates, but the same command against archive.ubuntu.com is also very quick.

The router has something to do with it I'm sure, my router at home doesn't give me any trouble at all, but querying so directly like this is reproducibly fast, while querying indirectly through update-manager, is reproducibly slow.

Revision history for this message

csulok (shikakaa) wrote on 2009-10-21:

#32

For the what ultimately AND universally fixed/worked around the problem was the following:

edit /etc/sysctl.conf and add the following to the bottom:

#Disable IPv6
net.ipv6.conf.all.disable_ipv6=1

Revision history for this message

Jeroen Massar (massar) wrote on 2009-10-21:

#33

@ csulok / #32

What that does is avoid fixing the problem. You disable IPv6, and thus glibc plays smart and does not resolve AAAA records anymore.

Your DNS resolver though is still broken. You might not notice it now, but if for instance per next year DNSSEC gets turned on you will run into it again.... (and you will probably just disable DNSSEC....)

Revision history for this message

Ragnarel (ragnarel) wrote on 2009-10-21: Re: [Bug 417757] Re: [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

#34

csulok, your trick didn't solve my problem.

2009/10/21 Jeroen Massar <email address hidden>:
> @ csulok / #32
>
> What that does is avoid fixing the problem. You disable IPv6, and thus
> glibc plays smart and does not resolve AAAA records anymore.
>
> Your DNS resolver though is still broken. You might not notice it now,
> but if for instance per next year DNSSEC gets turned on you will run
> into it again.... (and you will probably just disable DNSSEC....)
>
> --
> [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups
> https://bugs.launchpad.net/bugs/417757
> You received this bug notification because you are a direct subscriber
> of a duplicate bug.
>

Revision history for this message

Antonio Roberts (hellocatfood) wrote on 2009-10-22: Re: [karmic regression] all network apps / browsers suffer from multi-second delays by default due to IPv6 DNS lookups

#35

I'm experiencing these problems on my Dell Studio 1555 laptop with Karmic Beta. I hope it gets fixed soon!

Revision history for this message

Nech (gerard-guadall) wrote on 2009-10-23:

#36

I have two targets
07:02.0 Network controller: Broadcom Corporation BCM4318 [AirForce One 54g] 802.11g Wireless LAN Controller (rev 02)
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)

I work better using wifi than using wired connection.

Revision history for this message

Jeroen Massar (massar) wrote on 2009-10-23:

#37

@ Nech / #36

> I work better using wifi than using wired connection.

So, like I ask everybody else, check to see if there is a huge latency time difference when doing:

for i in `cat /etc/resolv.conf | grep ^nameserver | cut -f2 -d' '`; do dig @$i www.microsoft.com AAAA; done

Over the wired or wireless; quiker maybe is to check if you get a different set of DNS servers when connected over wired or wireless (just check if /etc/resolv.conf changes).

Revision history for this message

Nech (gerard-guadall) wrote on 2009-10-23:

#38

I think the problem is not DNS. Actually, when I visit different websites in a short period of time, then everything get saturated. Some websites not load, and other take up to 2 or 3 minutes to do so. I tried it using Google Chromium also, and the result was the same. Is a new instalation the karmic, and the upgrade just happened

Results of wireless
-----------------------
; <<>> DiG 9.6.1-P1 <<>> @80.58.0.33 www.microsoft.com AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49350
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;www.microsoft.com. IN AAAA

;; ANSWER SECTION:
www.microsoft.com. 2921 IN CNAME toggle.www.ms.akadns.net.
toggle.www.ms.akadns.net. 247 IN CNAME g.www.ms.akadns.net.
g.www.ms.akadns.net. 265 IN CNAME lb1.www.ms.akadns.net.

;; AUTHORITY SECTION:
akadns.net. 90 IN SOA internal.akadns.net. hostmaster.akamai.com. 1256289512 90000 90000 90000 180

;; Query time: 104 msec
;; SERVER: 80.58.0.33#53(80.58.0.33)
;; WHEN: Fri Oct 23 11:20:03 2009
;; MSG SIZE rcvd: 170

wired
-------
; <<>> DiG 9.6.1-P1 <<>> @80.58.0.33 www.microsoft.com AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 1196
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;www.microsoft.com. IN AAAA

;; ANSWER SECTION:
www.microsoft.com. 2805 IN CNAME toggle.www.ms.akadns.net.
toggle.www.ms.akadns.net. 140 IN CNAME g.www.ms.akadns.net.
g.www.ms.akadns.net. 144 IN CNAME lb1.www.ms.akadns.net.

;; AUTHORITY SECTION:
akadns.net. 91 IN SOA internal.akadns.net. hostmaster.akamai.com. 1256289631 90000 90000 90000 180

;; Query time: 102 msec
;; SERVER: 80.58.0.33#53(80.58.0.33)
;; WHEN: Fri Oct 23 11:22:00 2009
;; MSG SIZE rcvd: 170

Revision history for this message

Jeroen Massar (massar) wrote on 2009-10-23:

#39

@ Nech / #38

As you have the same DNS server for both wired and wireless, most very likely _your problem_ is not a DNS issue* like what the others show here.

* = unless an upstream of your DNS server has the "drop unknown DNS records" problem and your resolver caches the negative answer correctly, which will cause any subsequent query, like the ones above, to be quick again.

To solve your problem, I guess you'll have to take a peek with Wireshark...

Revision history for this message

Zack Evans (zevans23) wrote on 2009-10-23:

#40

My problem goes away if I disable IPv6. If I boot with IPv6 though, so I have the problem, DNS lookups from the command line happen quickly.

for i in `cat /etc/resolv.conf | grep ^nameserver | cut -f2 -d' '`; do dig @$i www.microsoft.com AAAA; done

is practically instant. (I have also tried it with some other hostnames to check it is not cacheing hiding the problem.)

I should add I did not have this problem in Jaunty, and no equipment has changed, only the upgrade to Karmic.

So, as I type, with IPv6 enabled, Privoxy is grinding, everything else seems OK.

If I reboot with IPv6 off, Privoxy and everything else will be OK. DNS AAAA lookups seem OK whether enabled or disabled.

So is there some other subtle interaction between Privoxy and IPv6?

camper365 (camper365) on 2009-10-29

Changed in linux (Ubuntu):
status:	New → Confirmed

Micah Gersten (micahg) on 2009-11-01

tags:

added: metabug

Christian Kosmowski (ckosmowski) on 2009-11-03

Changed in linux (Ubuntu):
assignee:	nobody → IPv6 Task Force (ipv6)

Christian Kosmowski (ckosmowski) on 2009-11-03

Changed in network-manager (Ubuntu):
assignee:	nobody → IPv6 Task Force (ipv6)

Laurent Bigonville (bigon) on 2009-11-03

Changed in network-manager (Ubuntu):
assignee:	IPv6 Task Force (ipv6) → nobody
Changed in linux (Ubuntu):
assignee:	IPv6 Task Force (ipv6) → nobody

Christian Kosmowski (ckosmowski) on 2009-11-03

Changed in linux (Ubuntu):
assignee:	nobody → IPv6 Task Force (ipv6)
Changed in network-manager (Ubuntu):
assignee:	nobody → IPv6 Task Force (ipv6)

Martin Olsson (mnemo) on 2009-11-03

Changed in linux (Ubuntu):
assignee:	IPv6 Task Force (ipv6) → nobody
Changed in network-manager (Ubuntu):
assignee:	IPv6 Task Force (ipv6) → nobody

Martin Pitt (pitti) on 2009-11-04

Changed in linux (Ubuntu Lucid):
importance:	Undecided → High
Changed in linux (Ubuntu Karmic):
importance:	Undecided → High
tags:	added: regression-release removed: needs-reassignment
Changed in linux (Ubuntu Karmic):
milestone:	none → karmic-updates

Martin Pitt (pitti) on 2009-11-04

Changed in network-manager (Ubuntu Karmic):
status:	New → Invalid
Changed in network-manager (Ubuntu Lucid):
status:	Confirmed → Invalid

Martin Pitt (pitti) on 2009-11-04

affects:	linux (Ubuntu Lucid) → glibc (Ubuntu Lucid)
Changed in glibc (Ubuntu Lucid):
assignee:	nobody → Canonical Foundations Team (canonical-foundations)
status:	Confirmed → Triaged
description:	updated
Changed in glibc (Ubuntu Karmic):
status:	New → Triaged

Robbie Williamson (robbiew) on 2009-11-04

Changed in glibc (Ubuntu Lucid):
assignee:	Canonical Foundations Team (canonical-foundations) → Matthias Klose (doko)

jordan.sc (jordanjsc) on 2009-11-04

Changed in glibc (Ubuntu Karmic):
status:	Triaged → Fix Committed

Micah Gersten (micahg) on 2009-11-04

Changed in glibc (Ubuntu Karmic):
status:	Fix Committed → Triaged

Nech (gerard-guadall) on 2009-11-05

Changed in glibc (Ubuntu Karmic):
status:	Triaged → In Progress
status:	In Progress → Confirmed

finno (finnegan) on 2009-11-07

Changed in glibc (Ubuntu Lucid):
status:	Triaged → Invalid

Martin Pitt (pitti) on 2009-11-08

Changed in glibc (Ubuntu Lucid):
status:	Invalid → Confirmed

Bug Watch Updater (bug-watch-updater) on 2009-11-10

Changed in glibc (Fedora):
status:	Unknown → Confirmed

Carropa (carropa) on 2009-11-10

Changed in glibc (Ubuntu Karmic):
status:	Confirmed → Fix Released
status:	Fix Released → Confirmed

Jeremy Visser (jeremy-visser) on 2009-12-01

description:

updated

Revision history for this message

In Red Hat Bugzilla #505105, jclere (jclere-redhat-bugs) wrote on 2009-12-23:

#382

I have change the dns proxy option to off in my router (Netopia-3000) that fixes the problems on my boxes (f11 and f12).

Matthias Klose (doko) on 2009-12-24

Changed in glibc (Ubuntu Lucid):
status:	Confirmed → Fix Released
Changed in glibc (Ubuntu Karmic):
status:	Confirmed → In Progress

Martin Pitt (pitti) on 2010-01-03

tags:	added: verification-needed
affects:	glibc (Ubuntu Karmic) → eglibc (Ubuntu Karmic)
Changed in eglibc (Ubuntu Karmic):
status:	In Progress → Fix Committed

Martin Pitt (pitti) on 2010-01-04

tags:

added: verification-done
removed: verification-needed

Martin Pitt (pitti) on 2010-01-07

Changed in eglibc (Ubuntu Lucid):
status:	Fix Released → Confirmed

leucomax (w-smetanig) on 2010-01-09

Changed in eglibc (Ubuntu Karmic):
status:	Fix Committed → Fix Released

Martin Pitt (pitti) on 2010-01-10

Changed in eglibc (Ubuntu Karmic):
status:	Fix Released → Fix Committed

Launchpad Janitor (janitor) on 2010-01-10

Changed in eglibc (Ubuntu Karmic):
status:	Fix Committed → Fix Released

Eshant (guptaeshant) on 2010-01-15

Changed in eglibc (Ubuntu Karmic):
status:	Fix Released → Fix Committed
status:	Fix Committed → Fix Released

Erwin Aberger (erwin-710) on 2010-01-17

Changed in eglibc (Ubuntu Lucid):
status:	Confirmed → Fix Released

Martin Pitt (pitti) on 2010-01-17

Changed in eglibc (Ubuntu Lucid):
status:	Fix Released → Triaged

Erwin Aberger (erwin-710) on 2010-01-18

Changed in eglibc (Ubuntu Karmic):
status:	Fix Released → Invalid

Micah Gersten (micahg) on 2010-01-18

Changed in eglibc (Ubuntu Karmic):
status:	Invalid → Fix Released

The Master (outshined-89) on 2010-03-01

Changed in eglibc (Ubuntu Karmic):
status:	Fix Released → Fix Committed

Steve Langasek (vorlon) on 2010-03-01

Changed in eglibc (Ubuntu Karmic):
status:	Fix Committed → Fix Released

The Master (outshined-89) on 2010-03-02

Changed in eglibc (Ubuntu Karmic):
status:	Fix Released → Incomplete
status:	Incomplete → In Progress

Martin Pitt (pitti) on 2010-03-02

Changed in eglibc (Ubuntu Karmic):
status:	In Progress → Fix Released

The Master (outshined-89) on 2010-03-03

Changed in eglibc (Ubuntu Karmic):
status:	Fix Released → In Progress

Martin Pitt (pitti) on 2010-03-03

Changed in eglibc (Ubuntu Karmic):
status:	In Progress → Fix Released

Hilton Shumway (hillshum) on 2010-03-09

description:

updated

Emmet Hikory (persia) on 2010-03-10

tags:

added: ipv6

Hilton Shumway (hillshum) on 2010-03-13

description:

updated

Revision history for this message

In Red Hat Bugzilla #505105, cdahlin (cdahlin-redhat-bugs) wrote on 2010-04-06:

#383

This is reliable. Some programs have perfect DNS, some don't work at all.

[sadmac@foucault coding]$ ping edge.launchpad.net
PING edge.launchpad.net (91.189.89.225) 56(84) bytes of data.
64 bytes from wildcard-launchpad-net.banana.canonical.com (91.189.89.225): icmp_seq=1 ttl=42 time=101 ms
64 bytes from wildcard-launchpad-net.banana.canonical.com (91.189.89.225): icmp_seq=2 ttl=42 time=102 ms
64 bytes from wildcard-launchpad-net.banana.canonical.com (91.189.89.225): icmp_seq=3 ttl=42 time=102 ms
64 bytes from wildcard-launchpad-net.banana.canonical.com (91.189.89.225): icmp_seq=4 ttl=42 time=102 ms
^C
--- edge.launchpad.net ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3816ms
rtt min/avg/max/mdev = 101.601/101.983/102.238/0.235 ms
[sadmac@foucault coding]$ bzr co lp:libnih libnih-error
bzr: ERROR: Connection error: Could not resolve 'edge.launchpad.net' [Errno -2] Name or service not known
[sadmac@foucault coding]$

Revision history for this message

In Red Hat Bugzilla #505105, cdahlin (cdahlin-redhat-bugs) wrote on 2010-04-06:

#384

Above is on latest F12

steve (swchoi-choi) on 2010-04-16

Changed in eglibc (Ubuntu Lucid):
status:	Triaged → Confirmed

Steve Langasek (vorlon) on 2010-04-16

Changed in eglibc (Ubuntu Lucid):
status:	Confirmed → Triaged

Johan (deberghes-johan) on 2010-04-24

summary:

- [karmic regression] all network apps / browsers suffer from multi-second
- delays by default due to IPv6 DNS lookups
+ [regression] all network apps / browsers suffer from multi-second delays
+ by default due to IPv6 DNS lookups

Revision history for this message

In Red Hat Bugzilla #505105, triage (triage-redhat-bugs) wrote on 2010-04-27:

#385

This message is a reminder that Fedora 11 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 11. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora
'version' of '11'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version'
to a later Fedora version prior to Fedora 11's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that
we may not be able to fix it before Fedora 11 is end of life. If you
would still like to see this bug fixed and are able to reproduce it
against a later version of Fedora please change the 'version' of this
bug to the applicable version. If you are unable to change the version,
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's
lifetime, sometimes those efforts are overtaken by events. Often a
more recent Fedora release includes newer upstream software that fixes
bugs or makes them obsolete.

The process we are following is described here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Revision history for this message

In Red Hat Bugzilla #505105, tytus64 (tytus64-redhat-bugs) wrote on 2010-04-30:

#386

Just like Casey I am experiencing the same problem on F12.

Clipper:~ $ ping peach.mycompany.com
PING peach.mycompany.com (10.26.1.61) 56(84) bytes of data.
64 bytes from peach.mycompany.com (10.26.1.61): icmp_seq=1 ttl=63 time=0.267 ms
64 bytes from peach.mycompany.com (10.26.1.61): icmp_seq=2 ttl=63 time=0.311 ms
^C
--- peach.mycompany.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1371ms
rtt min/avg/max/mdev = 0.267/0.289/0.311/0.022 ms
Clipper:~ $ ssh peach.mycompany.com
ssh: Could not resolve hostname peach.mycompany.com: No address associated with hostname

Here is the output from Wireshark when ssh command is issued:

24921 4305.188943 10.4.1.236 10.1.1.151 DNS Standard query A peach.mycompany.com
24922 4305.188983 10.4.1.236 10.1.1.151 DNS Standard query AAAA peach.mycompany.com
24923 4305.189460 10.1.1.151 10.4.1.236 DNS Standard query response A 10.26.1.61
24924 4305.189475 10.1.1.151 10.4.1.236 DNS Standard query response

The only way for me to fix this problem is to put the following line in /etc/hosts
10.26.1.61 peach

I should mention that this problem is _not_ unique to this network - the F12 machine is a laptop and I can see this problem at work as well as at home. It does not happen all the time but often enough to be annoying. Not sure what's the trigger.

Clipper:~ $ rpm -qa |grep glibc
glibc-2.11.1-4.i686
glibc-2.11.1-4.x86_64
glibc-headers-2.11.1-4.x86_64
glibc-common-2.11.1-4.x86_64
glibc-devel-2.11.1-4.x86_64
glibc-debuginfo-2.11.1-1.x86_64

Let me know if I can help in anyway to debug it.

Revision history for this message

In Red Hat Bugzilla #505105, triage (triage-redhat-bugs) wrote on 2010-06-28:

#387

Fedora 11 changed to end-of-life (EOL) status on 2010-06-25. Fedora 11 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Revision history for this message

In Red Hat Bugzilla #505105, tore (tore-redhat-bugs-1) wrote on 2011-01-31:

#388

There is no question that the underlying problem here is defective DNS resolvers that choke on perfectly legitimate AAA queries. That said, there are a couple of issues present in software shipped by Fedora that cause the problem to manifest itself as slowdowns noticeable by end users:

1) When called with the AI_ADDRCONFIG flag, libc's getaddrinfo() function does not disregard link-local IPv6 addresses when determining whether or not the local host has usable IPv6 connectivity. Since every IPv6-capable OS will have link-local IPv6 addresses assigned to all interfaces - regardless of any external connectivity being available or not - this essentially makes AI_ADDRCONFIG on Linux useless for the purpose of suppressing AAAA queries when they're not useful.

I've submitted a bug to the GNU libc upstream about this issue at <http://sourceware.org/bugzilla/show_bug.cgi?id=12377>.

getaddrinfo() on other operating systems (such as Apple Mac OS X and Microsoft Windows) does disregard link-local IPv6 addresses when called with AI_ADDRCONFIG, which is why the problem appears to affect GNU/Linux distributions more than other operating systems.

2) Many applications do not set the AI_ADDRCONFIG flag when calling getaddrinfo(). This includes, notably, Mozilla Firefox. However, a patch to correct this has recently been committed to the mozilla-central developement repo and will likely be part of Firefox 4.0 beta 11 (hopefully also 3.6.15), see <https://bugzilla.mozilla.org/show_bug.cgi?id=614526>. Microsoft Windows enables the use of AI_ADDRCONFIG as the system-wide default, as far as I know, which explains why it is able to cope better with those broken middleware boxes. Mac OS X does not set AI_ADDRCONFIG by default, however it has an extremely short timeout waiting for AAAA responses after the A response has been answered (around 125ms), which in turn hides the problem from most end users. Additionally, most major browsers (except Firefox) do set AI_ADDRCONFIG explicitly, which suppress the problematic AAAA queries in the first place.

So what Fedora could to avoid this problem is 1) to develop and include a patch to glibc that makes getaddrinfo() ignore link-local addresses for AI_ADDRCONFIG purposes, and 2) to back-port the NSPR patch already committed to mozilla-central to the version of Firefox shipped (or wait until Mozilla releases a new version with the patch already included).

Tore

There is no question that the underlying problem here is defective DNS resolvers that choke on perfectly legitimate AAA queries. That said, there are a couple of issues present in software shipped by Fedora that cause the problem to manifest itself as slowdowns noticeable by end users:

1) When called with the AI_ADDRCONFIG flag, libc's getaddrinfo() function does not disregard link-local IPv6 addresses when determining whether or not the local host has usable IPv6 connectivity. Since every IPv6-capable OS will have link-local IPv6 addresses assigned to all interfaces - regardless of any external connectivity being available or not - this essentially makes AI_ADDRCONFIG on Linux useless for the purpose of suppressing AAAA queries when they're not useful.

I've submitted a bug to the GNU libc upstream about this issue at <http://sourceware.org/bugzilla/show_bug.cgi?id=12377>.

getaddrinfo() on other operating systems (such as Apple Mac OS X and Microsoft Windows) does disregard link-local IPv6 addresses when called with AI_ADDRCONFIG, which is why the problem appears to affect GNU/Linux distributions more than other operating systems.

2) Many applications do not set the AI_ADDRCONFIG flag when calling getaddrinfo(). This includes, notably, Mozilla Firefox. However, a patch to correct this has recently been committed to the mozilla-central developement repo and will likely be part of Firefox 4.0 beta 11 (hopefully also 3.6.15), see <https://bugzilla.mozilla.org/show_bug.cgi?id=614526>. Microsoft Windows enables the use of AI_ADDRCONFIG as the system-wide default, as far as I know, which explains why it is able to cope better with those broken middleware boxes. Mac OS X does not set AI_ADDRCONFIG by default, however it has an extremely short timeout waiting for AAAA responses after the A response has been answered (around 125ms), which in turn hides the problem from most end users. Additionally, most major browsers (except Firefox) do set AI_ADDRCONFIG explicitly, which suppress the problematic AAAA queries in the first place.

So what Fedora could to avoid this problem is 1) to develop and include a patch to glibc that makes getaddrinfo() ignore link-local addresses for AI_ADDRCONFIG purposes, and 2) to back-port the NSPR patch already committed to mozilla-central to the version of Firefox shipped (or wait until Mozilla releases a new version with the patch already included).

Tore

Revision history for this message

In Red Hat Bugzilla #505105, tore (tore-redhat-bugs-1) wrote on 2011-02-11:

#389

Created attachment 478268
Solution 1/2: Make getaddrinfo()+AI_ADDRCONFIG ignore link-locals

Revision history for this message

In Red Hat Bugzilla #505105, tore (tore-redhat-bugs-1) wrote on 2011-02-11:

#390

Created attachment 478270
Solution 2/2: Make Mozilla Firefox use AI_ADDRCONFIG when calling getaddrinfo()

Revision history for this message

In Red Hat Bugzilla #505105, tore (tore-redhat-bugs-1) wrote on 2011-02-11:

#391

The two patches I've just attached solves this problem for most users:

The first makes getaddrinfo() ignore link-local addresses when called with the AI_ADDRCONFIG flag set. This makes getaddrinfo() avoid querying for AAAAs when the host has no IPv6 connectivity, provided that the AI_ADDRCONFIG flag is set. This brings glibc's getaddrinfo() behaviour in line with Mac OS X and Windows.

The second makes Mozilla Firefox use AI_ADDRCONFIG when calling getaddrinfo(). Note that the Mozilla release drivers have already approved this patch for inclusion on the 3.6.x branch, and it has already been commited to Firefox 4.0 (it's included in beta11).

Please apply.

(Of course, there might be applications other than Mozilla Firefox that does not set AI_ADDRCONFIG as well, which would require similar patches. However, Mozilla Firefox is the obvious one and likely the source of most user complaints.)

Tore

Revision history for this message

In Red Hat Bugzilla #505105, tore (tore-redhat-bugs-1) wrote on 2012-12-16:

#392

Okay, so this is still a problem. What happens is:

1) the user enters some host name into his web browser or other application of choice running on a machine connected to an IPv4-only Ethernet network
2) the application kicks of an getaddrinfo() call for the host name, using AF_UNSPEC and AI_ADDRCONFIG
3) getaddrinfo() transmits both IN A (IPv4) and IN AAAA (IPv6) DNS queries to the upstream resolver
4) The upstream resolver, which is typically some cheapo home gateway or something, don't understand the IN AAAA queries and either doesn't respond to them at all, or screw them up somehow
5) getaddrinfo() doesn't get a valid answer for the IN AAAA queries (valid answer could include NXDOMAIN or NODATA status codes), retransmits them, sits around waiting
6) user wonders why the web page or whatever takes "forever" to load, goes to submit/comment on bugs such as this one
7) getaddrinfo() finally times out the IN AAAA queries, returns IPv4 results to the application
8) lather rinse repeat

AI_ADDRCONFIG *should* have solved this issue, by suppressing IN AAAA queries from IPv4-only machines. However, the auto-configured IPv6 link local addresses on all Ethernet interfaces, causes getaddrinfo() to consider that the machine has IPv6, and therefore it won't suppress IN AAAA queries anymore. More info here:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG#Problem_2:_IN_AAAA_DNS_query_suppression_from_Ethernet-connected_IPv4-only_hosts

Revision history for this message

In Red Hat Bugzilla #505105, psimerda (psimerda-redhat-bugs) wrote on 2012-12-16:

#393

I see no reason why this shouldn't be fixed. We are working on solutions, all information here:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG

Related fedora feature page:

https://fedoraproject.org/wiki/Features/DualstackNetworking

Adding to the 'dualstack' tracker bug and modified the summary.

Revision history for this message

In Red Hat Bugzilla #505105, psimerda (psimerda-redhat-bugs) wrote on 2012-12-16:

#394

*** Bug 697149 has been marked as a duplicate of this bug. ***

Revision history for this message

In Red Hat Bugzilla #505105, psimerda (psimerda-redhat-bugs) wrote on 2012-12-16:

#395

*** Bug 459756 has been marked as a duplicate of this bug. ***

Revision history for this message

In Red Hat Bugzilla #505105, fedora-admin-xmlrpc (fedora-admin-xmlrpc-redhat-bugs) wrote on 2013-01-28:

#396

This package has changed ownership in the Fedora Package Database. Reassigning to the new owner of this component.

Revision history for this message

In Red Hat Bugzilla #505105, bcotton (bcotton-redhat-bugs) wrote on 2013-04-03:

#397

This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Revision history for this message

In Red Hat Bugzilla #505105, markzzzsmith (markzzzsmith-redhat-bugs) wrote on 2013-04-14:

#398

I disagree with making getaddrinfo() consider a host with only IPv6 link-local addresses to not have IPv6 connectivity. It does, it has IPv6 connectivity to all other IPv6 hosts on the directly attached links which also all have link-local addresses. This is actually the point of hosts automatically configuring link-local addresses on all interfaces all the time, and the IPv6 Addressing Architecture specifying that all interfaces must have link-local addresses - so that they can at a minimum always reach their on-link neighbors via link-local addressing. This is also why protocols such as IPv6 neighbor discovery, Multicast Listener Discovery and routing protocols such as OSPF use link-local addresses as source and/or destination addresses to reach their neighbors.

Combine autoconfigured IPv6 link-local addresses with a service discovery protocol such as Multicast DNS or SSDP, and you have Zero Configuration networking without any user intervention. Compare that with IPv4, where support for 169.254.0.0/16 is patchy, because it is done in userspace via DHCPv4. IPv6 will universally and reliably provide zero configuration networking.

I think it is reasonable to hosts more robust against broken devices in the network, but ignoring IPv6 link-local connectivity and then suppressing AAAA queries is not the solution. Happy Eyeballs (RFC6555) and IPv6 source and destination selection (RFC6724) are.

Revision history for this message

In Red Hat Bugzilla #505105, tore (tore-redhat-bugs-1) wrote on 2013-04-14:

#399

(In reply to comment #64)
> I disagree with making getaddrinfo() consider a host with only IPv6
> link-local addresses to not have IPv6 connectivity.

That's not what this bug report is about. It's about suppressing DNS "IN AAAA" queries if the host only has link-local addresses and AI_ADDRCONFIG is supplied.

> Combine autoconfigured IPv6 link-local addresses with a service discovery
> protocol such as Multicast DNS or SSDP,

This bug report is specifically about DNS. It's not about MDNS, SSDP, /etc/hosts, or any other NSS backends.

> I think it is reasonable to hosts more robust against broken devices in the
> network, but ignoring IPv6 link-local connectivity and then suppressing AAAA
> queries is not the solution.

How is it useful for a host with only link-local addresses to perforn "IN AAAA" DNS queries? Keep in mind that in order to use a link-local address, you need to supply an interface scope (e.g., "fe80::1%eth0"), and that DNS cannot supply this information.

Tore

Revision history for this message

In Red Hat Bugzilla #505105, markzzzsmith (markzzzsmith-redhat-bugs) wrote on 2013-04-14:

#400

Download full text (3.9 KiB)

(In reply to comment #65)
> (In reply to comment #64)
> > I disagree with making getaddrinfo() consider a host with only IPv6
> > link-local addresses to not have IPv6 connectivity.
>
> That's not what this bug report is about. It's about suppressing DNS "IN
> AAAA" queries if the host only has link-local addresses and AI_ADDRCONFIG is
> supplied.
>

I understand that.

The AI_ADDRCONFIG flag does not preclude link-local addresses. From RFC3493:

" If the AI_ADDRCONFIG flag is specified, IPv4 addresses shall be
   returned only if an IPv4 address is configured on the local system,
   and IPv6 addresses shall be returned only if an IPv6 address is
   configured on the local system. The loopback address is not
   considered for this case as valid as a configured address."

Note that loopback addresses are, so the designers specifically thought about exclusion of addresses types.

> > Combine autoconfigured IPv6 link-local addresses with a service discovery
> > protocol such as Multicast DNS or SSDP,
>
> This bug report is specifically about DNS. It's not about MDNS, SSDP,
> /etc/hosts, or any other NSS backends.
>

getaddrinfo() is not a DNS protocol specific API call, and is used in front of all those NSS backends, so that applications don't have to be exposed to how the address information was determined. For example, I run MDNS at home, and when I enable it, all of my IPv6 applications automatically work with it.

Here is what RFC3493 describes it as:

6.1 Protocol-Independent Nodename and Service Name Translation

Nodename-to-address translation is done in a protocol-independent
fashion using the getaddrinfo() function.

getaddrinfo() can return all the information necessary to use a link-local address i.e. both the address, and in the interface index, via the sin6_scope_id field of the sockaddr_in6 structure that is returned via the ai_addr field.

By making AI_ADDRCONFIG ignore link-local addresses, the getaddrinfo() call becomes broken for NSS backends that can provide both the link-local address and the corresponding interface index, such as MDNS or any other future ones.

Perhaps the DNS NSS backend could provide it, by returning the interface index of the interface it received the response on if a link-local address is returned. On the common single-homed host, this is likely to be the correct interface index for the link-local address.

> > I think it is reasonable to hosts more robust against broken devices in the
> > network, but ignoring IPv6 link-local connectivity and then suppressing AAAA
> > queries is not the solution.
>
> How is it useful for a host with only link-local addresses to perforn "IN
> AAAA" DNS queries? Keep in mind that in order to use a link-local address,
> you need to supply an interface scope (e.g., "fe80::1%eth0"), and that DNS
> cannot supply this information.
>

Something else outside of DNS could provide the interface information, and the application combines them. Specifying a hostname (perhaps in /etc/hosts) and an interface will be much simpler than specifying literal link-local addresses because getaddrinfo() won't lookup IPv6 addresses when the host only has link-local addresses.

T...

(In reply to comment #65)
> (In reply to comment #64)
> > I disagree with making getaddrinfo() consider a host with only IPv6
> > link-local addresses to not have IPv6 connectivity.
> 
> That's not what this bug report is about. It's about suppressing DNS "IN
> AAAA" queries if the host only has link-local addresses and AI_ADDRCONFIG is
> supplied.
>

I understand that.

The AI_ADDRCONFIG flag does not preclude link-local addresses. From RFC3493:

"  If the AI_ADDRCONFIG flag is specified, IPv4 addresses shall be
   returned only if an IPv4 address is configured on the local system,
   and IPv6 addresses shall be returned only if an IPv6 address is
   configured on the local system.  The loopback address is not
   considered for this case as valid as a configured address."

Note that loopback addresses are, so the designers specifically thought about exclusion of addresses types.

> > Combine autoconfigured IPv6 link-local addresses with a service discovery
> > protocol such as Multicast DNS or SSDP,
> 
> This bug report is specifically about DNS. It's not about MDNS, SSDP,
> /etc/hosts, or any other NSS backends.
>

getaddrinfo() is not a DNS protocol specific API call, and is used in front of all those NSS backends, so that applications don't have to be exposed to how the address information was determined. For example, I run MDNS at home, and when I enable it, all of my IPv6 applications automatically work with it.

Here is what RFC3493 describes it as:

6.1 Protocol-Independent Nodename and Service Name Translation

Nodename-to-address translation is done in a protocol-independent
   fashion using the getaddrinfo() function.

getaddrinfo() can return all the information necessary to use a link-local address i.e. both the address, and in the interface index, via the sin6_scope_id field of the sockaddr_in6 structure that is returned via the ai_addr field.

By making AI_ADDRCONFIG ignore link-local addresses, the getaddrinfo() call becomes broken for NSS backends that can provide both the link-local address and the corresponding interface index, such as MDNS or any other future ones.

Perhaps the DNS NSS backend could provide it, by returning the interface index of the interface it received the response on if a link-local address is returned. On the common single-homed host, this is likely to be the correct interface index for the link-local address.

> > I think it is reasonable to hosts more robust against broken devices in the
> > network, but ignoring IPv6 link-local connectivity and then suppressing AAAA
> > queries is not the solution.
> 
> How is it useful for a host with only link-local addresses to perforn "IN
> AAAA" DNS queries? Keep in mind that in order to use a link-local address,
> you need to supply an interface scope (e.g., "fe80::1%eth0"), and that DNS
> cannot supply this information.
>

Something else outside of DNS could provide the interface information, and the application combines them. Specifying a hostname (perhaps in /etc/hosts) and an interface will be much simpler than specifying literal link-local addresses because getaddrinfo() won't lookup IPv6 addresses when the host only has link-local addresses.

The "Happy Eyeballs" technique (RFC6555) wasn't just intended to be applied to web browsers, according to this draft from Fred Baker:

"Happier Eyeballs"
https://www.ietf.org/id/draft-baker-happier-eyeballs-00.txt

and could probably be applied to the DNS "application". For example, off the top of my head:

1) issue a standard DNS query including both A and AAAA queries.
2) if no response is received after 400ms (roughly half way around the world), issue two individual queries, one for an A and one for an AAAA.

That way you're not stopping getaddrinfo() from being used on IPv6 hosts with just link-local addresses, and it won't penalise people who have resolvers in their CPE that does the right thing. Those with broken CPE see a slight delay, but not a significant one, and one that most people won't notice.

Revision history for this message

In Red Hat Bugzilla #505105, horsley1953 (horsley1953-redhat-bugs) wrote on 2013-04-14:

#401

I thought the obvious problem with this was addressed way up near the top in comment 20 when someone pointed out that using the same port for the IPv4 and IPv6 queries gave firewalls fits. Ulrich Drepper had one of his standard "purity over practicality" tantrums and refused to change it to use two different ports to accommodate dumb firewalls, but since Ulrich is gone now, perhaps saner heads could revisit that? (And perhaps revisit all other bug fixes rejected over the years by Ulrich? :-).

Revision history for this message

In Red Hat Bugzilla #505105, psimerda (psimerda-redhat-bugs) wrote on 2013-04-15:

#402

Download full text (5.7 KiB)

(In reply to comment #66)
> The AI_ADDRCONFIG flag does not preclude link-local addresses. From RFC3493:
>
>
> " If the AI_ADDRCONFIG flag is specified, IPv4 addresses shall be
> returned only if an IPv4 address is configured on the local system,
> and IPv6 addresses shall be returned only if an IPv6 address is
> configured on the local system. The loopback address is not
> considered for this case as valid as a configured address."
>
> Note that loopback addresses are, so the designers specifically thought
> about exclusion of addresses types.

We are not discussing the designers' virtues but the technical issues. The RFC is (1) INFORMATIONAL and (2) wrong. For more detailed information, see:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG

(any comments of technical value welcome)

> For example, I run MDNS at home,
> and when I enable it, all of my IPv6 applications automatically work with it.

This is not true. Just try mDNS with link-local addresses (which you mentioned) and you will realize that this feature is absent with the current glibc and nss-mdns.

> getaddrinfo() can return all the information necessary to use a link-local
> address i.e. both the address, and in the interface index, via the
> sin6_scope_id field of the sockaddr_in6 structure that is returned via the
> ai_addr field.

getaddrinfo() can, while the NSS backends cannot. Therefore currently getaddrinfo() would only return scope_id for IPv6 literals, not mDNS nor any similar protocol.

> By making AI_ADDRCONFIG ignore link-local addresses, the getaddrinfo() call
> becomes broken for NSS backends

Currently false. You can't break a feature that is absent.

> Perhaps the DNS NSS backend could provide it,

You don't need scope_id for global addresses and you don't need this information with DNS responses at all.

> Something else outside of DNS could provide the interface information, and
> the application combines them.

I don't see the need for that. DNS returns global addresses. Global addresses don't need scope_id.

> Specifying a hostname (perhaps in /etc/hosts)

Not sure whether /etc/hosts can be used to provide scope_id.

> and an interface will be much simpler

There is currently no standard way to do that. And I don't think it is valuable enough to seek standardization for that.

> than specifying literal link-local
> addresses because getaddrinfo() won't lookup IPv6 addresses when the host
> only has link-local addresses.

There's a much easier solution. Just don't apply the same rules to mDNS you apply to DNS. I believe all of this is already described in:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG

> The "Happy Eyeballs" technique (RFC6555) wasn't just intended to be applied
> to web browsers, according to this draft from Fred Baker:
>
> "Happier Eyeballs"
> https://www.ietf.org/id/draft-baker-happier-eyeballs-00.txt
>
> and could probably be applied to the DNS "application". For example, off the
> top of my head:
>
> 1) issue a standard DNS query including both A and AAAA queries.

In the case described by this bug report, there's no need to query AAAA as global routing is not available anywa...

(In reply to comment #66)
> The AI_ADDRCONFIG flag does not preclude link-local addresses. From RFC3493:
> 
> 
> "  If the AI_ADDRCONFIG flag is specified, IPv4 addresses shall be
>    returned only if an IPv4 address is configured on the local system,
>    and IPv6 addresses shall be returned only if an IPv6 address is
>    configured on the local system.  The loopback address is not
>    considered for this case as valid as a configured address."
> 
> Note that loopback addresses are, so the designers specifically thought
> about exclusion of addresses types.

We are not discussing the designers' virtues but the technical issues. The RFC is (1) INFORMATIONAL and (2) wrong. For more detailed information, see:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG

(any comments of technical value welcome)

> For example, I run MDNS at home,
> and when I enable it, all of my IPv6 applications automatically work with it.

This is not true. Just try mDNS with link-local addresses (which you mentioned) and you will realize that this feature is absent with the current glibc and nss-mdns.

> getaddrinfo() can return all the information necessary to use a link-local
> address i.e. both the address, and in the interface index, via the
> sin6_scope_id field of the sockaddr_in6 structure that is returned via the
> ai_addr field.

getaddrinfo() can, while the NSS backends cannot. Therefore currently getaddrinfo() would only return scope_id for IPv6 literals, not mDNS nor any similar protocol.

> By making AI_ADDRCONFIG ignore link-local addresses, the getaddrinfo() call
> becomes broken for NSS backends

Currently false. You can't break a feature that is absent.

> Perhaps the DNS NSS backend could provide it,

You don't need scope_id for global addresses and you don't need this information with DNS responses at all.

> Something else outside of DNS could provide the interface information, and
> the application combines them.

I don't see the need for that. DNS returns global addresses. Global addresses don't need scope_id.

> Specifying a hostname (perhaps in /etc/hosts)

Not sure whether /etc/hosts can be used to provide scope_id.

> and an interface will be much simpler

There is currently no standard way to do that. And I don't think it is valuable enough to seek standardization for that.

> than specifying literal link-local
> addresses because getaddrinfo() won't lookup IPv6 addresses when the host
> only has link-local addresses.

There's a much easier solution. Just don't apply the same rules to mDNS you apply to DNS. I believe all of this is already described in:

https://fedoraproject.org/wiki/Networking/NameResolution/ADDRCONFIG

> The "Happy Eyeballs" technique (RFC6555) wasn't just intended to be applied
> to web browsers, according to this draft from Fred Baker:
> 
> "Happier Eyeballs"
> https://www.ietf.org/id/draft-baker-happier-eyeballs-00.txt
> 
> and could probably be applied to the DNS "application". For example, off the
> top of my head:
> 
> 1) issue a standard DNS query including both A and AAAA queries.

In the case described by this bug report, there's no need to query AAAA as global routing is not available anyway. That's all.

> 2) if no response is received after 400ms (roughly half way around the
> world), issue two individual queries, one for an A and one for an AAAA.

I think that glibc resolver implementation is too dumb that it's not worth adding a bunch of hacks. And the glibc is, at the same time, too important to be played with on daily basis when another broken name server is discovered.

Even important features like split DNS are missing with the glibc resolver. Therefore my current recommendation is that *all distributions* start deploying a local recursive DNS server like unbound or dnsmasq and perform all DNS tweaking in the specialized software. This is much more useful and much more maintainable. Any libc and any DNS-enabled software can make use of the features then.

> That way you're not stopping getaddrinfo() from being used on IPv6 hosts
> with just link-local addresses, and it won't penalise people who have
> resolvers in their CPE that does the right thing.

If you consider a CPE with DNS resolver a good solution, then I think a local full-fledge DNS resolver is an even better one (it would still use the CPE one as its upstream source).

(In reply to comment #67)
> I thought the obvious problem with this was addressed way up near the top in
> comment 20 when someone pointed out that using the same port for the IPv4
> and IPv6 queries gave firewalls fits.

> Ulrich Drepper had one of his standard
> "purity over practicality" tantrums and refused to change it to use two
> different ports to accommodate dumb firewalls, but since Ulrich is gone now,
> perhaps saner heads could revisit that?

Revisit? We certainly can. But I think this exact bug report is about saving AAAA query when it is apparently not needed. Unfortunately the patch had a side effect on non-DNS cases and therewore was removed.

The glibc getaddrinfo() is rather broken for many cases (corner cases for some, day-to-day cases for others), see:

https://fedoraproject.org/wiki/Features/FixNetworkNameResolution

> (And perhaps revisit all other bug fixes rejected over the years by Ulrich? :-).

I already did some bugkeeping upstream:

http://sourceware.org/bugzilla/buglist.cgi?quicksearch=getaddrinfo

But none of the bug reports are actually specific to DNS protocol processing. I personally believe that the DNS processing in the GNU C Library should be as simple as possible and hack-free. Distributions should use local resolving DNS servers that work correctly from the glibc side and perform any necessary hacks on the external side.

Such software is much more easily testable, replacable (e.g. with an instance with more debugging enabled) and maintainable.

Revision history for this message

In Red Hat Bugzilla #505105, tore (tore-redhat-bugs-1) wrote on 2013-04-15:

#403

Download full text (3.6 KiB)

Pavel already responded to most of your points, so I'll try to avoid just repeating his points.)

(In reply to comment #66)

> getaddrinfo() is not a DNS protocol specific API call, and is used in front
> of all those NSS backends, so that applications don't have to be exposed to
> how the address information was determined. For example, I run MDNS at home,
> and when I enable it, all of my IPv6 applications automatically work with it.

Well, again, this isn't about MDNS. Nobody is suggesting to make the MDNS backend ignore link-locals if called from getaddrinfo() w/AI_ADDRCONFIG. This bug is specifically about the DNS backend's behaviour; MDNS is out of scope.

> By making AI_ADDRCONFIG ignore link-local addresses, the getaddrinfo() call
> becomes broken for NSS backends that can provide both the link-local address
> and the corresponding interface index, such as MDNS or any other future ones.

See above, this is about the DNS backend *only*.

> Perhaps the DNS NSS backend could provide it, by returning the interface
> index of the interface it received the response on if a link-local address
> is returned. On the common single-homed host, this is likely to be the
> correct interface index for the link-local address.

This is a flawed assumption, even in the single-homed host case. One obvious example: If you run a local caching resolver (which I believe NetworkManager has native support for doing these days), you'll end up with all the returned link-local addresses being scoped to the "lo" interface, which is probably not what you want.

> Something else outside of DNS could provide the interface information, and
> the application combines them. Specifying a hostname (perhaps in /etc/hosts)
> and an interface will be much simpler than specifying literal link-local
> addresses because getaddrinfo() won't lookup IPv6 addresses when the host
> only has link-local addresses.

It would appear to me that the proper thing for such an application to do is to simply not use AI_ADDRCONFIG. However, does such an application actually exist, or are you inventing it just to support your position?

> 1) issue a standard DNS query including both A and AAAA queries.

Not possible. There is no single DNS query that requests both A and AAAA responses. (If, by any chance, you want to say "ANY" right now - don't, it doesn't do what you think it does.)

> 2) if no response is received after 400ms (roughly half way around the
> world), issue two individual queries, one for an A and one for an AAAA.

Two individual queries is what's being done today, and that's the only thing you can do. Also, 400ms, even multiple seconds, is too short a timeout - the major part a DNS lookup isn't the single RTT to the resolver listed in /etc/resolv.conf - it's waiting for that resolver to actually find the record in question. This is the sum of all RTTs to all the authoritative name servers in the delegation chain, potentially including timeouts and retransmits at some of the steps.

The only way to get Happy Eyeballs-ish behaviour using getaddrinfo() is if you run an IPv4-only thread with getaddrinfo(AF_INET)->connect(AF_INET), and a similar one for AF_INET6. You can't do it wit...