Regression in getaddrinfo(): calls block for much longer on Bionic (compared to Xenial), please disable LLMNR

Bug #1739672 reported by Mike Pontillo
40
This bug affects 4 people
Affects Status Importance Assigned to Milestone
glibc (Ubuntu)
Undecided
Unassigned
Artful
Undecided
Unassigned
Bionic
Undecided
Unassigned
linux (Ubuntu)
High
Unassigned
Artful
Undecided
Unassigned
Bionic
High
Unassigned
systemd (Ubuntu)
High
Dimitri John Ledkov
Artful
Undecided
Unassigned
Bionic
High
Dimitri John Ledkov

Bug Description

When testing MAAS on Bionic, we noticed sluggish performance that we could not immediately explain.

After comparing the results from a run of the test suite on Xenial to a run on Bionic, we determined that the slowdowns had to do with DNS lookups. In particular, if MAAS attempts to resolve a hostname using getaddrinfo() and the call fails, on Xenial the negative result is returned in a fraction of a second. On Bionic, the negative result is returned in ~1.6 seconds, according to some measures.

### To run the test ###

git clone https://github.com/mpontillo/test-getaddrinfo
cd test-getaddrinfo
make

### Results on Xenial ###
$ time ./test not-a-real-hostname
Trying to resolve: not-a-real-hostname
    getaddrinfo errno: Success
    getaddrinfo() return value: -2 (Name or service not known)

real 0m0.015s
user 0m0.000s
sys 0m0.000s

### Results on Bionic ###
$ time ./test not-a-real-hostname
Trying to resolve: not-a-real-hostname
    getaddrinfo errno: Resource temporarily unavailable
    getaddrinfo() return value: -3 (Temporary failure in name resolution)

real 0m1.609s
user 0m0.004s
sys 0m0.000s

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1739672

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: xenial
Revision history for this message
Mike Pontillo (mpontillo) wrote : Re: Regression in getaddrinfo(): calls block for much longer on Bionic (compared to Xenial)

Note: I doubt this bug is in the kernel itself; I initially attempted to file it under glibc at first, but for some reason the `linux` package was selected.

I also added `systemd` in case the difference in behavior can be explained by the addition of resolved.

Note that these tests were run on Bionic Desktop, so it's using network-manager, not networkd.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

One way to tell if this bug is due to the kernel, is to boot with some prior kernel versions and see if the bug goes away.

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key
Revision history for this message
Mike Pontillo (mpontillo) wrote :

I just tested with the Xenial kernel and Bionic userspace and observed that the bug still occurs, so marked Invalid for 'linux'.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
description: updated
tags: added: bionic
Revision history for this message
Blake Rouse (blake-rouse) wrote :

$ time ./test not-a-real-hostname
Trying to resolve: not-a-real-hostname
    getaddrinfo errno: No such file or directory
    getaddrinfo() return value: -2 (Name or service not known)

real 0m10.007s
user 0m0.001s
sys 0m0.001s

Revision history for this message
Blake Rouse (blake-rouse) wrote :

The issue is with the systemd resolver not with glibc.

With systemd-resolve IP in /etc/resolv.conf:

# This file is managed by man:systemd-resolved(8). Do not edit.
#
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.
nameserver 127.0.0.53

$ time ./test not-a-hostname
Trying to resolve: not-a-hostname
    getaddrinfo errno: No such file or directory
    getaddrinfo() return value: -2 (Name or service not known)

real 0m10.076s
user 0m0.001s
sys 0m0.000s

Without systemd-resolve in /etc/resolv.conf. I changed it to point to my local DNS server directly.

# This file is managed by man:systemd-resolved(8). Do not edit.
#
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.
#nameserver 127.0.0.53
nameserver 192.168.1.1

$ time ./test not-a-hostname
Trying to resolve: not-a-hostname
    getaddrinfo errno: No such file or directory
    getaddrinfo() return value: -2 (Name or service not known)

real 0m0.097s
user 0m0.001s
sys 0m0.000s

Changed in glibc (Ubuntu):
status: New → Invalid
Revision history for this message
Blake Rouse (blake-rouse) wrote :

$ systemd-resolve --status
Global
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      18.172.in-addr.arpa
                      19.172.in-addr.arpa
                      20.172.in-addr.arpa
                      21.172.in-addr.arpa
                      22.172.in-addr.arpa
                      23.172.in-addr.arpa
                      24.172.in-addr.arpa
                      25.172.in-addr.arpa
                      26.172.in-addr.arpa
                      27.172.in-addr.arpa
                      28.172.in-addr.arpa
                      29.172.in-addr.arpa
                      30.172.in-addr.arpa
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

Link 8 (vethJWPPL8)
      Current Scopes: LLMNR/IPv6
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 4 (lxdbr0)
      Current Scopes: LLMNR/IPv4 LLMNR/IPv6
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 3 (eno1)
      Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
         DNS Servers: 192.168.1.1

Link 2 (enp4s0)
      Current Scopes: none
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Changed in systemd (Ubuntu):
status: New → Confirmed
Steve Langasek (vorlon)
Changed in systemd (Ubuntu):
importance: Undecided → High
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

$ time systemd-resolve -p dns not-a-real-hostname
not-a-real-hostname: resolve call failed: No appropriate name servers or networks for name found

real 0m0.003s
user 0m0.000s
sys 0m0.004s

$ time systemd-resolve -p llmnr not-a-real-hostname
not-a-real-hostname: resolve call failed: All attempts to contact name servers or networks failed

real 0m0.850s
user 0m0.000s
sys 0m0.004s

$ time systemd-resolve -p llmnr-ipv4 not-a-real-hostname
not-a-real-hostname: resolve call failed: All attempts to contact name servers or networks failed

real 0m0.820s
user 0m0.000s
sys 0m0.000s

$ time systemd-resolve -p llmnr-ipv6 not-a-real-hostname
not-a-real-hostname: resolve call failed: All attempts to contact name servers or networks failed

real 0m0.750s
user 0m0.000s
sys 0m0.000s

$ time systemd-resolve not-a-real-hostname
not-a-real-hostname: resolve call failed: All attempts to contact name servers or networks failed

real 0m0.712s
user 0m0.004s
sys 0m0.000s

The dns resolution from systemd-resolve is fast; the llmnr one is not. We currently have llmnr resolution enabled by default. ...it's a feature?!

What is the usecase of resolving things that do not exist? Surely we optimise for the fact that most resolutions will succeed, from performance point of view. The first result retrieved, is returned back.

Revision history for this message
Steve Langasek (vorlon) wrote :

> The dns resolution from systemd-resolve is fast; the llmnr one is not.
> We currently have llmnr resolution enabled by default. ...it's a feature?!

It is not a feature for a local DNS resolver to be doing llmnr under the hood by default. Please disable this.

> What is the use case of resolving things that do not exist?

DNS search paths.
User typoed the name and shouldn't have to wait 10 seconds for the response.

Revision history for this message
Mike Pontillo (mpontillo) wrote :

Interesting; the first thing I tried when triaging this was to edit /etc/nsswitch.conf as follows:

# hosts: files mdns4_minimal [NOTFOUND=return] dns myhostname
hosts: files dns

... to eliminate the possibility that it was multicast DNS causing the slowdown. But it appears I'm behind the times. ;-) (And didn't this only affect the .local domain?)

Does this mean there are now two subsystems responsible for link-local address resolution? (avahi and systemd-resolved?)

tags: removed: kernel-da-key
Revision history for this message
Mike Pontillo (mpontillo) wrote :

Workaround:

    grep -q 'LLMNR=no' /etc/systemd/resolved.conf || \
        echo 'LLMNR=no' | sudo tee -a /etc/systemd/resolved.conf
    sudo service systemd-networkd restart

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

avahi is zeroconf or IPv4LL, which is RFC 3927; whilst LLMNR is RFC 4795. In many aspects LLMNR is like mdns, yet the two are distinct and resolve different things quite differently, when inspected closer.

Changed in systemd (Ubuntu):
status: Confirmed → Triaged
summary: Regression in getaddrinfo(): calls block for much longer on Bionic
- (compared to Xenial)
+ (compared to Xenial), please disable LLMNR
tags: added: rls-bb-incoming
Revision history for this message
Scott Moser (smoser) wrote :

Hi,

I'm coming here from bug 1730744.
Is this bug expected to be fixed for 18.04?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

yes, but I'm not uploading systemd until after the build-farm is open.

Changed in systemd (Ubuntu Bionic):
assignee: nobody → Dimitri John Ledkov (xnox)
Steve Langasek (vorlon)
tags: removed: rls-bb-incoming
Changed in systemd (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: id-5a4e5d0285ca388b893cf09d
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 237-3ubuntu3

---------------
systemd (237-3ubuntu3) bionic; urgency=medium

  * tests/control: drop qemu-system-ppc.
    Whilst some tests pass, many regress / fail to boot. This is not a regression,
    as qemu-based tests were not run previously.

 -- Dimitri John Ledkov <email address hidden> Tue, 20 Feb 2018 17:40:02 +0000

Changed in systemd (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Ryan Harper (raharper) wrote :

I'm seeing this on Artful as well, in Azure cloud.

Revision history for this message
Ryan Harper (raharper) wrote :

ubuntu@foufoune:~$ lsb_release -rd
Description: Ubuntu 17.10
Release: 17.10
ubuntu@foufoune:~$ apt-cache policy systemd
systemd:
  Installed: 234-2ubuntu12.3
  Candidate: 234-2ubuntu12.3
  Version table:
 *** 234-2ubuntu12.3 500
        500 http://azure.archive.ubuntu.com/ubuntu artful-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     234-2ubuntu12.1 500
        500 http://security.ubuntu.com/ubuntu artful-security/main amd64 Packages
     234-2ubuntu12 500
        500 http://azure.archive.ubuntu.com/ubuntu artful/main amd64 Packages

ubuntu@foufoune:~$ time ping __cloud_init_expected_not_found
ping: __cloud_init_expected_not_found: Temporary failure in name resolution

real 0m15.016s
user 0m0.000s
sys 0m0.003s

After applying the workaround from comment #11, I see fast lookups again.

ubuntu@foufoune:~$ cat /etc/systemd/resolved.conf
# This file is part of systemd.
#
# systemd is free software; you can redistribute it and/or modify it
# under the terms of the GNU Lesser General Public License as published by
# the Free Software Foundation; either version 2.1 of the License, or
# (at your option) any later version.
#
# Entries in this file show the compile time defaults.
# You can change settings by editing this file.
# Defaults can be restored by simply deleting this file.
#
# See resolved.conf(5) for details

[Resolve]
#DNS=
#FallbackDNS=
#Domains=
#LLMNR=yes
#MulticastDNS=yes
#DNSSEC=no
#Cache=yes
#DNSStubListener=udp
LLMNR=no

ubuntu@foufoune:~$ time ping __cloud_init_expected_not_found
ping: __cloud_init_expected_not_found: Name or service not known

real 0m0.006s
user 0m0.000s
sys 0m0.001s

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers