Ubuntu

mdns listed in nsswitch.conf causes excessive time for dns lookups

Reported by Sam Williams on 2007-03-22
288
This bug affects 44 people
Affects Status Importance Assigned to Milestone
avahi (Ubuntu)
Undecided
Unassigned
Nominated for Karmic by Bas van den Dikkenberg
Nominated for Lucid by Herakleitoszefesu
Nominated for Maverick by Bas van den Dikkenberg
nss-mdns (Debian)
Fix Released
Unknown
nss-mdns (Ubuntu)
Medium
Unassigned
Nominated for Karmic by Bas van den Dikkenberg
Nominated for Lucid by Herakleitoszefesu
Nominated for Maverick by Bas van den Dikkenberg

Bug Description

Binary package hint: avahi-daemon

I encountered this problem on a machine that is integrated into our work network. I performed a dist-upgrade to Feisty on my desktop and all went well. I've noticed recently that any dns based work seemed to take a significantly longer time then normal.

My system is getting dns information on our company internal systems from two dns servers. Previously, if I tried to establish an ssh connection with another system I could generally expect the connection in under 3 secs.

After the dist-upgrade the time went from under 3 seconds to approximately 25 seconds. After searching around the system I found an entry in /etc/nsswitch.conf that cause me a little concern. The line in question is:

   hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

I looked around a bit and it seems that the references to mdns are really talking about communication with the Avahi mDNS/DNS-SD daemon. Since this looks to be a part of a zeroconf configuration I wasn't expecting too much in my current environment, as we really only have three Mac's.

What concerned me is the idea that if we hit files with no answer there is a delay while we hit the other options until we hit dns, which is where the information I seek existed.

For an experiment I tried two separate tests. The first changed the line to looks like:

    hosts: files dns mdns4_minimal [NOTFOUND=return] mdns

The change should have improved the time, but I was still looking at approximately 23 seconds to return a command prompt on the destination machine.

Finally, I change the entry to simply:

    hosts: files dns

After this change I was again receiving the destination command prompt in under 3 seconds. I don't know if simply changing the file will correct the problem long-term or not. Seems to help me, but might be the way to go for most Ubuntu users.

ProblemType: Bug
Architecture: i386
Date: Thu Mar 22 18:10:54 2007
DistroRelease: Ubuntu 7.04
Uname: Linux samdesk 2.6.20-12-generic #2 SMP Wed Mar 21 20:55:46 UTC 2007 i686 GNU/Linux

Martin Pitt (pitti) wrote :

I was just pointed to http://lists.freedesktop.org/archives/avahi/2007-March/001007.html

If you change nsswitch.conf to

  hosts: files mdns4_minimal [NOTFOUND=return] dns

do the timeouts disappear then?

Changed in avahi:
assignee: nobody → pitti
importance: Undecided → Medium
status: Unconfirmed → Needs Info
Matthew Nuzum (newz) wrote :

I'm experiencing this too. Martin, if I use that setting you suggested, my delay improves. The ssh timeout went from about 15s to 5s. If I remove mdns from the line entirely it's 3s.

However, without mdns I can't ping .local hostnames.

My original line looked like:
   hosts: files mdns dns

Now it looks like
  hosts: files mdns4_minimal [NOTFOUND=return] dns

I used avahi in Edgy and mdns configured on all my PCs.

Sam Williams (sam-williams) wrote :

actually I detailed the results of that test in the initial post. I found no relief with the live containing any references to mdns. The only way I got my time down to a tolerable amount was to eliminate all references to mdns and have only files and dns on the line. Then my dns response returned to within 3 seconds.

David N. Welton (davidnwelton) wrote :

I was going crazy trying to figure out why my "network" was so slow. This suggestion:

    hosts: files mdns4_minimal [NOTFOUND=return] dns

improved things a great deal. It's a bit disconcerting that this has found its way into the production release, though, as tracking it down was not a simple matter.

Trent Lloyd (lathiat) wrote :

Apparently the usual cause for this is having no reverse DNS setup

Alexander Menk (alex-menk) wrote :

I have this problem too. It's very annoying. I got my IP from a "AVM Fritz Box WLAN" router. The AVM guys do a good work and are not to blame that they don't offer DNS lookup .. I think ..
Perhaps this problem will occur in very much home-network use-cases - those people which Ubuntu is mainly targeting. There must be a solution very soon.

Alexander Menk (alex-menk) wrote :

The problem occurs with making ssh-connections. According to strace it tries to do a reverse lookup of the host I connection too. That host is a dialup which I cannot be reverse-resolved (which is weird - but despite this all worked fine with edgy..)

When changing the file to hosts: files dns it works.

It seems to me, that DNS just says "no I don't know this host" instead of letting a timeout pass .. mh .. basicially ubuntu now behaves as windows networking when opening explorer and it's looking for files ;-)

I'm beginning to understand the problem .. Perhaps that is a conceptual problem of zeroconf-things?

Perhaps an enduser-option for disabling avahi would do? But that's ugly..

Any other ideas ?

Adam Porter (alphapapa) wrote :

FYI, this also may be a problem in Debian. I checked /etc/nsswitch.conf on my testing/unstable system and it was set to the same as is default in Feisty. I have wondered for a long time why my DNS lookups take so long sometimes. I removed "mdns4"; hopefully that will help.

I thinkt the problem is, is that Ubuntu lacks a 'network' strategy completely.
There is no network-team or something of that kind.

What they need to focus on:
  - fairness scheduling (i.e. torrents should wait for firefox' http-requests)
  - zeroconf (i.e. automatic file-sharing with other locally connected computers, including windoz boxes)
  - speed (they should be checking this stuff and making it as concurrent as possible: i.e. timeouts should only affect non-working connections)

But without a team and some expert on this stuff, we're going to keep geting stuff like this.
Likewise, there was a bug about x-server connecting slowly on the localhost.
There has been a bug about fairness scheduling, like forever.
When selecting a folder to share with samba, it is not shared.
We are still forced to manually edit the samba.conf. The graphical 'shared folder' thingie does not set it up correctly.

All these things together seem to be a case of lack of focus. The issues are fragmented.
Are there any ubuntu community members that want to put this on the table (perhaps in a specification): the need for a network-team ?

stu_edgar (stu-edgar) wrote :

Can someone please confirm that the solution
hosts: files mdns4_minimal [NOTFOUND=return] dns
is the best one.
My network performance in Feisty still a little off and below same set-up in Dapper
Thanks
Stu

stu_edgar (stu-edgar) wrote :

Herei s my network use while downloading in Firefox

For downloading a single file, it isn't necessary to continually do
DNS lookups, only at the beginning. If that screenshot represents one
file being downloaded, then it's probably a different problem.

stu_edgar (stu-edgar) wrote :

Thanks a lot Adam.
When I was looking at fixing this (before I found this thread)
I changed my hosts file from:
127.0.0.1 localhost
127.0.1.1 <my machine name>
to:
127.0.0.1 localhost <my machine name>
127.0.1.1 <my machine name>
Would that possibly create the timeout error? Otherwise I'm pretty standard Feisty over a wireless network to a router.
I've changed DNS to openddns.com and no differences noted.
Any help troubleshooting this isseu woudl be appreciated - should I strace a wget or something?
Thanks
Stuart

Jaya (jayachandranm) wrote :

This solution worked for me as well. I was experiencing slow pings, email connections, etc. after installing Feisty. Editing /etc/nsswitch.conf with,
 hosts: files dns

seems to have resolved all these issues.

I did not notice any slow down with Edgy. I checked /etc/nsswitch.conf in my Edgy, strangely it looks same as Feisty with,
  hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

Any explanation on why this worked fine in Edgy and not in Feisty?

Thanks for the help,
Jaya

stu_edgar (stu-edgar) wrote :

try downgrading avahi daemon to edgy one - helped for me
Stu

----- Original Message ----
From: Jaya <email address hidden>
To: <email address hidden>
Sent: Thursday, April 26, 2007 6:16:41 PM
Subject: [Bug 94940] Re: mdns listed in nsswitch.conf causes excessive time for dns lookups

This solution worked for me as well. I was experiencing slow pings, email connections, etc. after installing Feisty. Editing /etc/nsswitch.conf with,
 hosts: files dns

seems to have resolved all these issues.

I did not notice any slow down with Edgy. I checked /etc/nsswitch.conf in my Edgy, strangely it looks same as Feisty with,
  hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

Any explanation on why this worked fine in Edgy and not in Feisty?

Thanks for the help,
Jaya

--
mdns listed in nsswitch.conf causes excessive time for dns lookups
https://bugs.launchpad.net/bugs/94940
You received this bug notification because you are a direct subscriber
of the bug.

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Ohad Lutzky (lutzky) wrote :

The problem is that DNS often responds faster than mDNS. However, if a DNS server is *not* present, everything would slow down greatly :/

Andreas Gustafsson (gson) wrote :

"Me too". I just spent half a day tracking down why my Django development server
running under Feisty was taking 20 seconds to serve a simple web page to another
machine on the same LAN, and finally tracked it down to this issue. Specifically,
the client machine was on a private network (IP address 10.0.1.1) and the local
 DNS server was configured with an empty 10.in-addr.arpa.zone,
causing it to return an NXDOMAIN response for the reverse mapping.

The django server was calling gethostbyaddr() on the client address (10.0.1.1)
for each HTTP request, and each gethostbyaddr() call took about about five
seconds to complete. On my other, non-Ubuntu machines, it would return almost
instantly (within a few milliseconds).

When I changed the "hosts" line in /etc/nsswitch.conf to

   hosts: files dns

the problem went away - the page that took 20 seconds to load now loads
in a small fraction of a second.

It's not reasonable for the lack of a reverse mapping in the DNS to cause a
long delay. On other operating systems, it simply causes the gethostbyaddr()
call to quickly return with h_errno=HOST_NOT_FOUND, and Ubuntu should
behave the same way.

Long delays in gethostbyname() and gethostbyaddr() are of course to be
expected if the DNS server is not responding, but that is not the case
here - the server is responding quickly, but with an NXDOMAIN, indicating
that no reverse mapping exists.

Andreas Gustafsson (gson) wrote :

I'm attaching a small C program to demonstrate the issue. It attempts
to reverse map the address 255.0.0.0, which is within a reserved range
and therefore does not have a reverse mapping.

On my Feisty system, it takes about five seconds to run (with the
unmodified /etc/nsswitch.conf). On my other systems, it runs in
a small fraction of a second.

Trent Lloyd (lathiat) wrote :

This is actually due to nss-mdns not avahi directly

WOW! The real cause is finally known. Lets hope for a quick bug fix and a backport to feisty.

Changed in nss-mdns:
status: Unknown → Unconfirmed
Sam Williams (sam-williams) wrote :

excellent news!!

Ryan Pavlik (abiryan) wrote :

I seem to be having this problem but there is no package nss-mdns - I know I had to change the nsswitch.conf to attempt to run Sugar at one time, what package can I reconfigure to restore default settings?

Trent Lloyd (lathiat) wrote :

The binary package is called libnss-mdns

liquidweaver (joshuaweaver) wrote :

I was able to fix this in feisty by replacing mdns4_minimal with mdns4.

 Before:
hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
 After:
files mdns4 [NOTFOUND=return] dns

Works like a charm, and I can still resolves hosts using mdns.

Aaron C. de Bruyn (darkpixel) wrote :

At a client site using a few Windows 2003 Servers, I would connect to an Ubuntu vmware image that had DNS of lamp.custname.local regularly with Gutsy beta.

During the beta I was out here quite a bit, but about a week before Gutsy launched until today I hadn't been out here.
Now when I plug my laptop into the network and try to ssh or ping lamp.custname.local I get nothing.

If I do an nslookup it returns the correct IP from the DNS server.

Doing a bit of stracing and debugging led me to nsswitch.conf and this bug.

I changed nsswitch.conf like so:
aaron@chrysalis:/etc$ diff nsswitch.conf nsswitch.conf.old
11c11
< hosts: files dns mdns4
---
> hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

I can resolve stuff correctly out of DNS, but I can't resolve avahi stuff--which I think is to be expected removing mdns4_minimal--but then what is the mdns4 line for?

Aaron C. de Bruyn (darkpixel) wrote :

I should also mention I looked at the freedesktop.org link Martin posted and that didn't resolve the issue.
And I should clarify that the domain I mentioned (custname.local) is not an avahi configured domain, it is the name we assigned active directory, and the only DNS server on the network is integrated into active directory. So windows boxes on the network get entries winbox.custname.local automagically, but the lamp vmware image had to be tossed into dns manually. There are a few devices that are using avahi on the network--my laptop, a few printers, and some NAS advertises it's admin front-end.

Jan Claeys (janc) wrote :

Aaron, .local is the proposed reserved top-level domain (TLD) for link-local addresses, to be queried by mDNS (multicast DNS) instead of "normal" DNS. AFAIK currently this is a draft which is scheduled to become an IETF RFC: http://files.multicastdns.org/draft-cheshire-dnsext-multicastdns.txt

Either don't use Avahi (which implements mDNS/DNS-SD), or use another TLD for your vmware instances. But remember that any Apple computer will have Bonjour (the original implementation of mDNS/DNS-SD) running, causing the same problems as long as you use .local for other purposes.

(Another solution might be to announce the vmware instances through mDNS of course.)

Aaron C. de Bruyn (darkpixel) wrote :

Thanks for the info Jan.

So in my case, the issue I am seeing may be due to the fact that avahi thinks it is supposed to resolve .local addresses. If the internal network were named something different, this issue would potentially go away?

Jan Claeys (janc) wrote :

Aaron: exactly, and yes, using another domain should fix your problem

Aaron C. de Bruyn (darkpixel) wrote :

I'm torn.
As much as I despise Microsoft, I am forced (at work) to touch quite a few Windows domains. This means I either have to disable avahi on my laptop and remove mdns from nsswitch, or not be able to resolve computers on almost every client network we support.

Even microsoft suggests in their documentation to use .local for AD/DNS integration.
http://support.microsoft.com/kb/324753

How hard would it be to get everyone on board with using something like .auto for avahi?
I'm guessing there are quite a few admin's out there with windows domains named similarly that will have these issues.

After setting up avahi in Edgy, a delay getting the password prompt was introduced for ssh logins of about 7 secs.

Changing the hosts: line in Edgy (6.10) from:-
   hosts: files dns mdns
to:-
   hosts: files mdns4 [NOTFOUND=return] dns
per liquidweaver's suggestion, reduced the delay to under 1 second.

Thanks. For me changing,

hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

to,

hosts: files dns

removed the delay. I just followed the forum, still do not know the
technical details.

Jaya

On Nov 12, 2007 9:15 PM, Haz <email address hidden> wrote:

> After setting up avahi in Edgy, a delay getting the password prompt was
> introduced for ssh logins of about 7 secs.
>
> Changing the hosts: line in Edgy (6.10) from:-
> hosts: files dns mdns
> to:-
> hosts: files mdns4 [NOTFOUND=return] dns
> per liquidweaver's suggestion, reduced the delay to under 1 second.
>
> --
> mdns listed in nsswitch.conf causes excessive time for dns lookups
> https://bugs.launchpad.net/bugs/94940
> You received this bug notification because you are a direct subscriber
> of the bug.
>

ed_p (edpizzi) wrote :
Download full text (7.0 KiB)

After some stracing and tcpdumping, it looks like the changed behavior here is that when mdns gets an NXDomain response, it retries up to 5 seconds, then reports a "timeout" to the requesting client, rather than immediately reporting that the record doesn't exist.

Is there a reason why requests that get NXDomain responses are retried? I can't think of a situation where that would be what you'd want, but maybe I'm missing something.

Trace excerpts are below.

Disabling mdns, request is over in 13 ms, and we do not retry (stracing sshd, following forks):

[pid 7728] 02:10:20.474565 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4
[pid 7728] 02:10:20.474649 connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("68.87.76.178")}, 28) = 0
[pid 7728] 02:10:20.474745 fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR)
[pid 7728] 02:10:20.474806 fcntl64(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 7728] 02:10:20.474870 gettimeofday({1199614220, 474899}, NULL) = 0
[pid 7728] 02:10:20.474943 poll([{fd=4, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
[pid 7728] 02:10:20.475030 send(4, "\214\222\1\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7"..., 42, MSG_NOSIGNAL) = 42
[pid 7728] 02:10:20.475158 poll([{fd=4, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
[pid 7728] 02:10:20.488319 ioctl(4, FIONREAD, [42]) = 0
[pid 7728] 02:10:20.488425 recvfrom(4, "\214\222\201\203\0\1\0\0\0\0\0\0\0011\0010\003168\0031"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("68.87.76.178")}, [16]) = 42
[pid 7728] 02:10:20.488640 close(4) = 0

There are no further communications with the dns server in the trace. (It's not real clear here, but the IP being looked up is 192.168.0.1.)

With mdns enabled, we retry several times (stracing avahi-daemon). I've annotated it with shell-style comments, since it's much longer.

# avahi-daemon gets the RESOLVE-ADDRESS command from sshd over its socket
02:23:43.930581 poll([{fd=7, events=POLLIN}, {fd=3, events=POLLIN, revents=POLLIN}, {fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=14, events=POLLIN}, {fd=13, events=POLLIN}, {fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=9, events=POLLIN}], 9, 2196610) = 1 02:23:43.930711 gettimeofday({1199615023, 930740}, NULL) = 0
02:23:43.930778 read(3, "RESOLVE-ADDRESS 192.168.0.1\n", 20480) = 28
# (snip)
# request #1
02:23:44.035615 sendmsg(14, {msg_name(16)={sa_family=AF_INET, sin_port=htons(5353), sin_addr=inet_addr("224.0.0.251")}, msg_iov(1)=[{"\0\0\0\0\0\1\0\0\0\0\0\0\0011\0010\003168\003192\7in-a"..., 42}], msg_controllen=24, {cmsg_len=24, cmsg_level=SOL_IP, cmsg_type=, ...}, msg_flags=0}, 0) = 42
# (snip)
02:23:44.036015 poll([{fd=7, events=POLLIN}, {fd=3, events=POLLIN}, {fd=16, events=POLLIN}, {fd=15, events=POLLIN}, {fd=14, events=POLLIN, revents=POLLIN}, {fd=13, events=POLLIN}, {fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=9, events=POLLIN}], 9, 100) = 1
02:23:44.036144 gettimeofday({1199615024, 36173}, NULL) = 0
02:23:44.036212 ioctl(14, FIONREAD, [42]) = 0
02:23:44.036307 recvmsg(14, {msg_name(16)={sa_family=AF...

Read more...

Andreas Gustafsson (gson) wrote :

> Is there a reason why requests that get NXDomain responses are retried? I can't think of a situation where that would be what you'd want, but maybe I'm missing something.

Speaking as a long-time participant in the standardization of the DNS protocol, I can assure you that retrying a request that gets an NXDomain response is always the wrong thing to do.

ed_p (edpizzi) wrote :

After reading through related bugs, it looks like avahi / nss is trying multicast dns before traditional dns. With multicast DNS, the only real option is long timeouts and retries, since only one avahi-enabled machine on the network may have a response for a given request. (That is, a successful lookup could have NXDomain responses from all but one host.) Since IP networks are assumed unreliable, it makes sense to retry requests, since the request may not have reached the one host that has the record.

I'm not sure that there's an easy way to fix this. Anything we did to fix this issue would weaken multicast dns (lower the timeout, reduce the number of retries, etc). It's unfortunate that currently this impacts servers that have no use for avahi-style multicast dns, since avahi mdns is enabled by default on many systems (eg. gutsy).

Martin Pitt (pitti) on 2008-01-14
Changed in avahi:
status: New → Invalid
Peter Valdemar Mørch (pmorch) wrote :

Ok, so I don't pretend to understand everything in this post. If you, like me, simply want to avoid delays logging in with ssh, stop avahi. For me, a simple "ssh server" took about 10 secs. Running with "ssh -o GSSAPIAuthentication=no server" brought that delay down to nothing. If that is also true for you:

Disable avahi on the client like this:
$ sudo /etc/init.d/avahi-daemon stop
Made it login without delay with or without GSSAPIAuthentication=no. Until the next reboot, where avahi will be restarted.
Disable avahi permanently by setting "AVAHI_DAEMON_START=1" in /etc/default/avahi-daemon worked for me. (Until I discover what avahi is and want to use it! :D)

jjoshi (jimy-joshi) wrote :

all,
 i am on gutsy both the following changes are not working for me. browser works fine for half a minute then the look ups are back to there horrible limit..taking as much as 30 seconds..

files mdns4 [NOTFOUND=return] dns
or
files dns

any other suggestions..i can try out..

Loye Young (loyeyoung) wrote :

This bug is documented in http://www.ietf.org/rfc/rfc4795.txt and warned against in the manpage to resolv.conf.

Changed in avahi:
status: Invalid → Confirmed
DaveAbrahams (boostpro) wrote :

I've reproduced the slow ssh login problem on: Intrepid Minimal CD Install + ssh

There's no apparent avahi/mdns installed or activated

In that configuration, "-o GSSAPIAuthentication=no" on the client command line has no effect, nor does setting it in the server's ssh_config file (though why that would change anything, I have no idea). The only way to suppress the delay seems to be to put

UseDNS no

in the server's /etc/ssh/sshd_config

Martin Pitt (pitti) on 2009-05-27
Changed in nss-mdns (Ubuntu):
assignee: Martin Pitt (pitti) → nobody
status: Incomplete → New

I've confirmed this in Jaunty on two separate systems, with slow ping times on local LAN servers.

Still present in Karmic.

Filippo De Luca (dl-filippo) wrote :

Hi,
My local domain in named xxx.local. The ubuntu machines cannot resolve hosts.

My solution is to remove [NOTFOUND=return] from line.

Patrick (oc3an) wrote :

Has anyone else tried:

hosts: files mdns_minimal [NOTFOUND=return] dns mdns

works perfectly for me and also fixes broken ipv6 mdns behavior.

pauljohn32 (pauljohn32) wrote :

For me, turning off avahi-daemon on the target server solved the problem entirely.

Perhaps the other fixes have already been applied by some deb updates. In /etc/nsswitch.conf

hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

I don't have any settings on DNS in /etc/ssh/ssh_config or /etc/sshd_config.

A few versions of linux ago, I got on a crusade to turn off all services that I did not know about. When I turned off avahi-daemon, it turned out the sound card would not work anymore.
(http://www.linux-archive.org/fedora-user/54765-what-root-no-sound-problem-some-programs.html)
 I'm very excited to go to the target machine and find out what I've broken by turning off avahi this time.

VladLazarenko (snail) wrote :

I had very weird problem with MacBook Pro and Ubuntu 9.10. When DNS resolving was in progress, everything else was frozen. For example, I do 'ping news.google.com' in one terminal, everything looks good, then I go and do 'ping linux.org.ru', and while DNS lookup is in progress, ping in the first terminal just stops. I thought that this bug is somehow related and updated my hosts entry in nsswitch.conf to this one - "hosts: files dns". This solved the problem.

TrReardon (tr-reardon) wrote :

I think I have found some of the problem. avahi-daemon gets into a weird state when it sees advertisements for both IPv4 and IPv6 addresses for the same service. You will notice that when you run avahi-resolve just after ifdown/ifup, you will be given an IPv6 address. About 2-8 seconds later that same request yields and IPv4 address.

The caching behavior of avahi-daemon appears to allow this to occur about every 8 minutes, from my own testing. ssh (but not ping) is vulnerable to this and will report that the server in question is down. This will of course happen for rsync as well.

Conn O Griofa (psyke83) wrote :

Has there been any progress on this issue? I experience slow DNS lookup times on *all* my machines with Ubuntu (including Lucid), but not on any version of Windows.

I can confirm that changing the "hosts" line in /etc/nsswitch.conf from:
files mdns4_minimal [NOTFOUND=return] dns mdns4
to:
files mdns_minimal [NOTFOUND=return] dns mdns

...completely resolves the slow DNS lookup times. I assume this effectively enables IPv6 support for avahi-daemon, and taking into consideration that my router and ISP uses IPv4 addressing, I assume that this change will not break other legacy IPv4 setups either.

Can we get someone with experience of the avahi-daemon to comment on the feasibility of this change?

Vitaliy Kulikov (slonua) wrote :

so, i had the same issue ... after some analyze, i decided to remove package 'libnss-mdns'.
after that file /etc/nsswitch.conf will be update automatically to

hosts: files dns

well, everything works perfect now.

using karmic.

Björn Lindqvist (bjourne) wrote :

Same configuration change worked for me with no apparent problems. Why does Ubuntu need multicast dns at all? Which software is responsible for the bug? From the above comments it appears that the problem is inherent in how mdns works. Then it would be preferable not to have mdns at all rather than really slow internet.

Jan Claeys (janc) wrote :

As said before, the problem occurs when you use mDNS combined with a broken DNS server. If your DNS server is not broken, you will not have any delays. (Unfortunately the DNS relays on several cheap home routers are broken...)

Some ways to solve this:
* try to detect broken DNS servers & configure Ubuntu to not use them
  * use a local DNS server on the system instead? (contra: significantly increases the load on the root servers?)
  * configure Ubuntu to use a proper DNS server (either one run by Canonical or one provided by a 3rd party like Google?)
  * disable mDNS when broken routers (or other broken DNS servers) are detected (breaks other things though)
* current approach with patched glibc (which breaks several applications, so far from optimal)
* ... ?

One problem with detecting broken DNS servers is that that might change depending on where you connect your laptop of course, so it would have to be dynamic...

Andreas Gustafsson (gson) wrote :

The problem *does* occur with non-broken DNS servers, too. I just tried the test program "test3.c" attached to my comment from 2007-05-13 on a system booted from a Lucid alpha3 live CD, and it still takes more than five seconds to execute. The system was directly querying a BIND 9 DNS server, and there was no "cheap home router" involved. This is trivial to reproduce; please try it.

Why is Ubuntu doing mDNS lookups for reverse mappings, anyway? I can see the utility of doing mDNS for forward mappings, but doing mDNS for reverse mappings seems not only harmful (being the cause of this bug), but also quite pointless.

Jan Claeys (janc) wrote :

Andreas: I'd guess in most cases reverse lookup with mdns is only useful for IP addresses inside the current subnet(s), so you might have a point there (but that would probably have to go into another bug report).

Andreas Gustafsson (gson) wrote :

Disabling mdns only for reverse lookups outside the local subnet(s) would not solve the problem. The test3.c example program uses a non-local, reserved address because that's a convenient way to illustrate that the problem is not caused by the local DNS configuration, but the actual delays I'm experiencing occur when connecting to an Ubuntu host from a non-Ubuntu system on a local subnet, for example with ssh, causing he Ubuntu host to do a reverse lookup of a local address. The standard DNS reverse lookup for that address resolves to NXDOMAIN in a few milliseconds, but since the connecting non-Ubuntu system is not running an mdns responder, the subsequent (pointless, IMO) mdns lookup takes several seconds to time out.

damaan (jon-ekdahl) wrote :

I installed Lucid beta2 and I'm experiencing slow networking, possibly related to DNS problems.

I ran Andreas test3.c program attached above, it returns in little over ten seconds with the default nsswitch.conf setting. I expected an improvement when using "hosts: files dns", but it still took ten seconds?! Maybe I misunderstood what the test case does.

Andreas Gustafsson (gson) wrote :

damaan - the behavior you report suggests that you are in fact having DNS problems, but that they are unrelated to mdns and this bug. Perhaps you have addresses in your /etc/resolv.conf that do not point to working DNS servers.

damaan (jon-ekdahl) wrote :

@Andreas: It seems that I _did_ have DNS problems. Disabling ipv6 certainly improved my overall situation (firefox, apt-get), but the test case was still running slowly. Then i tried switching to OpenDNS, which made your test case run very fast. So I guess my problem was the DNS relay in my D-Link router, or my ISP's DNS-servers. Sorry for the noise, and thanks for pointing me in the right direction.

Lennart Hengstmengel (farenji) wrote :

I stumbled upon this issue as well. I am using ubuntu in a local network with a local dns server, which serves foo.local hostnames. When I did a "host bar.foo.local" I got the correct response "Host bar.foo.local has IP xx.xx.xx.xx"; however f.e. "ping bar.foo.local" gave an "unknown host" response.

Turned out that resolving was done using multicast dns instead of normal dns - I discovered that by using a network sniffer.

By googling for "mdns ubuntu" I found this bugreport.

I removed the "mdns4_minimal [NOTFOUND=return]" part from /etc/nsswitch.conf and that indeed resolved the issue. IMHO that should be the default for ubuntu.

problem is still here also in lucid

ITSEC (david-sanders-thewg) wrote :

Using Ubuntu 10.04.01 in an interesting situation...
Local network is connected to router with a VPN Gateway.
All work traffic goes through NetworkA via VPN over NetworkB.
All non work traffic flows through Network B directly to the internet (including NetworkC traffic).
Network B has it's own wireless router to handle home clients (NetworkC)
NetworkB filters all NetworkC traffic directly to the internet.

So, my conundrum is to resolve work addresses (foo.local) AND internet addresses (bar.com) at the same time while on NetworkA. I can use my internal DNS servers, since they are recursive, and the router will correctly toss all internet traffic out NetworkB. Network C does not require this setup, only internet DNS servers.

I could get nslookups to work, -

nslookup time
Server: xxx.xxx.xxx.xxx
Address: xxx.xxx.xxx.xxx#53

Name: time.foo.local
Address: xxx.xxx.xxx.xxx

but not ping -
ping time - No repsonse, only solid cursor
ping time.foo.local - No repsonse, only solid cursor

After dealing with this for a few days, I wound up here. After reading this and eagerly copying the changes I needed to make to my nsswitch.conf file (current is "hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4") , it hit me...I'm not using search domains for foo.local. hmmmm....

Catting the file showed me this -
cat /etc/resolv.conf
nameserver xxx.xxx.xxx.xxx
nameserver xxx.xxx.xxx.xxx

So I changed it to this -

nameserver xxx.xxx.xxx.xxx
nameserver xxx.xxx.xxx.xxx
search foo.local

Viola'! I correctly get domain expansion on the hostnames (ping time resolves as time.foo.local), and I have internet resolution.

So, the fix for me was to add search domains to my /etc/resolv.conf. I realize this may not work for everyone, but it did for me.

ShineOn (shineon1) wrote :

Regarding Microsoft suggesting in their documentation to use .local as the tld for an Active Directory domain, that's just an example, not really a suggestion, as in "you might want to use a name such as "mycompany.local". This was back before the major problems with the .local TLD and Bonjour/Rendezvous/zeroconf hit the fan, and the link Aaron C. de Bruyn gave in 2007 is to one document, from 2003, that had that unfortunate example.

More recently, since 2004 actually, Microsoft has acknowledged the proposed reservation of .local as the TLD for mDNS search, and strongly recommend against using .local for your AD domain TLD. Further, they now recommend against using any other "illegal" TLD, because, as with .local, you never know when that TLD will suddenly become legit, causing you all sorts of havoc, like those Microsoft victims, er, users that were unfortunate enough to take the earlier example as gospel.

Microsoft now recommends that you use a "private" subdomain of your company's registered domain, like "corp.mycompany.com" or "lan.mycompany.org" or whatever, where the TLD/registered domain is mycompany.com or mycompany.org, respectively. Again, those are examples, and not gospel...

As to this issue with avahi, it's still an issue if you are in that group that still uses .local as your AD TLD. What I did to work around it was to put "dns" in front of "mdns4_minimal" in the hosts: line of nsswitch.conf. There's still a lag, but it at least resolves by fqdn now, letting me join the Linux box to the AD domain, without removing avahi.

Andreas Gustafsson (gson) wrote :

I just did a fresh install of Ubuntu 10.10 (i386 desktop), and the bug is still there. Specifically, logging in to the Ubuntu machine over a LAN configured with 10.x.x.x addresses and a configured DNS server that immediately returns an NXDOMAIN response for the reverse mappings of those addresses, there is a 5-second delay before the ssh login completes, due to Ubuntu's pointless attempt at reverse mapping the net 10 address of the originating machine using mdns. Removing the "mdns4" entry from /etc/nsswitch.conf (but leaving "mdns4_minimal") fixes the problem - doing "time ssh ubuntu-machine true", the elapsed time falls from 5.03 seconds to 0.03 seconds.

Celsius (celsius-netbel) wrote :

Using 10.04 (64bit desktop) I was having important delays when using ssh. The login prompt was taking several seconds to appear.

I changed the line to:

hosts: files mdns4_minimal dns [NOTFOUND=return] mdns4

and now it's connecting in a fraction of a second. I don't know why it works although I keep the 'mdns4' and just change the position of 'dns', but the result is the same.

Dave Gordon (python-bugz) wrote :

My network was showing this (multiple) 5-second timeout problem. To test it, I turned off DNS caching
# /etc/init.d/nscd stop
and then
# time curl -I www.google.co.uk
which consistently took more than 5 seconds.
I tried several of the approaches described above, and eventually determined that the problem was primarily down to the modem/router that I was using; its internal DNS could not handle two concurrent UDP DNS enquiries. In addition, if I sent out the two UDP DNS queries (A and AAA) in parallel to an *external* DNS server, the router often dropped one of the incoming reply packets, presumably because its stateful firewall only recorded sending a packet, not how many had been sent, and so cleared the entry for the DNS server when the first reply arrived.
In the end I settled on this in /etc/nsswitch.conf:

hosts: files nis mdns4_minimal dns [NOTFOUND=return] mdns4

(I use NIS, but not mdns - but I've left it in there in case a friend brings a machine that uses it).
But more importantly, I put this in /etc/resolv.conf:

options single-request
nameserver 213.120.234.2
nameserver 192.168.1.1

where 213.120.234.2 is one of my ISP's public DNS servers, and 192.168.1.1 is my own router as a fallback. With this configuration, the curl command above completes in well under one second. So in this case at least, the problem is not mdns but the limited capabilities of the ADSL router. I think I'll switch over to running a proper DNS (and DHCP) service on one of my own machines now, rather than relying on the router!

gene (eugenios) wrote :

Thanks for the tip, Dave. I've seen the single-request workaround before, it did not seem to help. Maybe adding the string before the dns list works better???
In my case the problem was with a few browsers at time being very sluggish, namely, firefox, chromium and epiphany for some sites. All other programs using dns were fast as always, even text-based browsers.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nss-mdns (Ubuntu):
status: New → Confirmed
Joel Maslak (jmaslak) wrote :

Can confirm this affects me on Oneiric Beta 2. Host has working IPv4 and IPv6 network. All networks have working reverse DNS, but not all hosts have actual DNS entries (this is true of both IPv4 and IPv6).

Fix for me is to remove mdns4 from the nsswitch.conf.

"sudo /sbin/arp" takes a long time to look up hostnames. This makes facter take a long time to run (+1 minute longer than necessary). This in turn makes puppet take a substantially longer time to run (100+ seconds vs. 28 seconds).

gene (eugenios) wrote :

If anyone is interested, here's a related issue: bug# 788274 https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/788274

I recently attached some tcpdumps with the report.

The experience is different from that of the original reporter. tweaking in either /etc/nsswitch.conf or /etc/resolv.conf does not help much. Hence it is not an exact duplicate of this.

It is extremely annoying!!!

This problem seems to have started for me at the time of upgrade from 11.04 to 11.10. Changing the resolve order in resolv.conf to:

files dns mdns4_minimal [NOTFOUND=return] mdns4 wins

fixes the problem.

Helge (helgesdk) wrote :

On Precise 12.04 the reverse lookup causes ssh with gssapi enabled to wait for ~15-20 seconds before login.
Removing mdns4 from the hosts: line in /etc/nsswith.conf fixes the issue.

Mark Thornton (mthornton-2) wrote :

A default installation of Microsoft Small Business Server Essentials 2011 will use a .local domain. It is quite tedious to override this choice (http://titlerequired.com/2011/08/02/installing-sbs-essentials-using-an-answer-file/). Anyone using Ubuntu on such a network has to remove the default mDNS entries from the nsswitch.conf file. Not helpful.

Note that SBS 2011 is a current product, so we can expect there to be many mycompany.local domains in use.

dronus (paul-geisler) wrote :

Many people would have "cheap home routers", so this still is very annoying. It spreads delays over a wide range of applications while not showing an explanation or progress indicator to the "normal user".

What are the caveats of changing the nsswitch.conf like stated above?

Zta77 (zta77) wrote :

Great. I just spent half a day debugging my network (as I've just set up my own bind service on a new server) before I finally found an article[1] and later this bug report that explained what was going on and how to fix it.

I noticed that my Ubuntu Server has the working nsswitch.conf installed by default with: hosts: files dns

I'm running Desktop Ubuntu 12.04 on my laptop so I don't know if this has been fixed in later Ubuntus. But if not, please fix it! The fix seems pretty easy.

[1] http://www.unchartedbackwaters.co.uk/pyblosxom/debian_ubuntu_dns_resolution_delays

Timo Jyrinki (timo-jyrinki) wrote :

For me this change (minimal change, only adding "[NOTFOUND=return]" after dns, not changing the order) in /etc/nsswitch.conf fixed my problem of slow SSH connecting in internal network (where UseDNS=no in sshd_config did not help):
-hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
+hosts: files mdns4_minimal [NOTFOUND=return] dns [NOTFOUND=return] mdns4

Changed in nss-mdns (Debian):
status: New → Fix Released
Luis Alvarado (luisalvarado) wrote :

Has the fix been applied to 12.04 and 13.04? I am getting abnormal network activity. Many small packets are send/received and is do to this.

Feisar (f3isar) wrote :

I can confirm that this is still a problem on 13.04. I have noticed it affecting:

ping
ssh
ntp

I fix by using the following line in /etc/nsswitch.conf:

hosts: files dns

Zta77 (zta77) wrote :

Still a problem in 13.10 as far as I can tell. Fix #75 helped.

According to this log, a fix was released nearly a year ago, but still people keep finding the bug. How does this Launchpad work again?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.