systemd-resolved and dns-masq make CPU 100% when using lxc name resolution

Bug #1721092 reported by Alex Garel
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

This bug may be a duplicate of https://bugs.launchpad.net/ubuntu/+source/dnsmasq/+bug/1688364 but I'm not sure, hence this new entry.

It hits while I wanted systemd to resolve the lxc domain to my internal lxc hosts.

Step to (maybe) reproduce:

- having lxc installed
- make some user space lxc container
- uncomment the line LXC_DOMAIN="lxc" in /etc/default/lxc-net
- restart lxc-net service

Now the dnsmasq on 10.0.3.1 should be a dns resolving lxc names.

- add a file /etc/systemd/resolved.conf.d/lxc.conf

  [Resolve]
  DNS=10.0.3.1
  Domains=~.lxc
  DNSSEC=false

- restart systemd-resolved service

Now systemd knows it should ask .lxc names to dnsmasq

- start an lxc container, let assume it's called my-container
- ping it using my-container.lxc it should work

After some times, systemd-resolved should use 100% CPU.

Commenting the /etc/systemd/resolved.conf.d/lxc.conf file and restarting it makes systemd-resolved never eating resources again.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: systemd 232-21ubuntu5
Uname: Linux 4.10.16-041016-generic x86_64
ApportVersion: 2.20.4-0ubuntu4.5
Architecture: amd64
CurrentDesktop: GNOME
Date: Tue Oct 3 18:18:56 2017
InstallationDate: Installed on 2015-11-10 (692 days ago)
InstallationMedia: Ubuntu 15.10 "Wily Werewolf" - Release amd64 (20151021)
MachineType: Intel Corporation Skylake Platform
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.10.16-041016-generic root=UUID=6814e3c1-8cea-4ecc-964d-535fd18782e9 ro quiet splash crashkernel=384M-:128M vt.handoff=7
SourcePackage: systemd
UpgradeStatus: Upgraded to zesty on 2017-02-25 (219 days ago)
dmi.bios.date: 11/06/2015
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 5.11
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: WhiteTip Mountain1 Fab2
dmi.board.vendor: Topstar
dmi.board.version: RVP7
dmi.chassis.asset.tag: Default string
dmi.chassis.type: 9
dmi.chassis.vendor: Default string
dmi.chassis.version: Default string
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr5.11:bd11/06/2015:svnIntelCorporation:pnSkylakePlatform:pvr0.1:rvnTopstar:rnWhiteTipMountain1Fab2:rvrRVP7:cvnDefaultstring:ct9:cvrDefaultstring:
dmi.product.name: Skylake Platform
dmi.product.version: 0.1
dmi.sys.vendor: Intel Corporation

Revision history for this message
Alex Garel (alex-garel) wrote :
Revision history for this message
Alex Garel (alex-garel) wrote :

Hello, can I do anything for this to progress toward a solution ?

Revision history for this message
Steve Langasek (vorlon) wrote :

This may be related to LP: #1694156.

Can you show /etc/resolv.conf and the output of systemd-resolve --status from the affected (host) system?

Changed in systemd (Ubuntu):
status: New → Incomplete
Revision history for this message
Alex Garel (alex-garel) wrote :

Hello,

I verified the bug is still there on my laptop.

here are the elements you asked for (see also attached file).

-------------------------
$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 127.0.0.53

-------------------------

In case here are installed versions:

lxc Version: 2.1.0-0ubuntu1
lxc-common Version: 2.1.0-0ubuntu1
systemd Version: 234-2ubuntu12.1
dnsmasq Version: Version: 2.78-1

Revision history for this message
Steve Langasek (vorlon) wrote :

Ok. It looks, from what I can interpret of the --status output, like your /etc/systemd/resolved.conf.d/lxc.conf is not having the intended effect, and queries are perhaps being passed to the lxc dns server that perhaps should not be. This may still be related to LP: #1694156. Would you be able to provide a network trace, showing what queries are being sent at the time systemd-resolved is spinning?

Changed in systemd (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Alex Garel (alex-garel) wrote :

Hello,

I tried to get DNS packets using :

$ sudo tcpdump -i lxcbr0 -l -vvv dst host 10.0.3.1 and dst port 53 |tee /tmp/tcpdump-dns

I then wait until systemd-resolved ate 100% of CPU. But at this time (unfortunately when I noticed it, it may have been eaten CPU for quite a while). However I can't see any strange exchange between systemd-resolved and dnsmasq.

I attach all the packets captured by tcpdump.

I'm not an expert in networks topics but I'm at ease with linux and cli commands, so feel free to tell me which experiment you would like me to run.

Revision history for this message
steve cohen (steve-si9yrl01qsu4bt4tonx56g) wrote :
Download full text (5.7 KiB)

hello in bionic
i also tried to get systemd.resolve and dnsmasq working

the /etc/default/lxc-net setup the dnsmasq using ultimately creating the dnsmasq shown below
the config did assign static and dynamic ip's to the containers however i could not access dynamic addresses by name. so i added to /etc/systemd/resolv.conf attached below. that worked with the effect of the cpu utilization.

it seemed to jump up after i went to the net and was not accessing the containers on 10.0.3.x as if systemd-resolve was sending queries to the dnsmasq in a loop.. please look at global section has 10.0.3.1 as the dns server and that doesn't look right. it is placed by /etc/systemd/resolv.conf

ps ax:
dnsmasq --conf-file=/etc/lxc/dnsmasq.conf -s lxc -S /lxc/ -u lxc-dnsmasq --strict-order --bind-interfaces --pid-file=/run/lxc/dnsmasq.pid --listen-address 10.0.3.1 --dhcp-range 10.0.3.128,10.0.3.254 --dhcp-lease-max=253 --dhcp-no-override --except-interface=lo --interface=lxcbr0 --dhcp-leasefile=/var/lib/misc/dnsmasq.lxcbr0.leases --dhcp-authoritative

with :/etc/systemd/resolv.conf
[Resolve]
#DNS=
DNS=10.0.3.1
#FallbackDNS=
#Domains=
Domains=lxc
#LLMNR=no
#MulticastDNS=no
#DNSSEC=no
#Cache=yes

i was also watching tcpdump on llxcbr0 which had no activity
after a while the systemd-resolve climbed approaching 100% with dnsmasq 50%

netstat -nlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 1643/sshd
tcp 0 0 127.0.0.1:631 0.0.0.0:* LISTEN 1263/cupsd
tcp 0 0 10.0.3.1:53 0.0.0.0:* LISTEN 2518/dnsmasq
tcp6 0 0 :::22 :::* LISTEN 1643/sshd
tcp6 0 0 ::1:631 :::* LISTEN 1263/cupsd
udp 14592 0 0.0.0.0:5353 0.0.0.0:* 1399/avahi-daemon:
udp 0 0 0.0.0.0:22168 0.0.0.0:* -
udp 0 0 0.0.0.0:6670 0.0.0.0:* -
udp 0 0 0.0.0.0:56840 0.0.0.0:* -
udp 0 0 0.0.0.0:62910 0.0.0.0:* -
udp 0 0 0.0.0.0:48051 0.0.0.0:* 1399/avahi-daemon:
udp 13824 0 10.0.3.1:53 0.0.0.0:* 2518/dnsmasq
udp 9216 0 127.0.0.53:53 0.0.0.0:* 1007/systemd-resolv
udp 10240 0 0.0.0.0:67 0.0.0.0:* 2518/dnsmasq
udp 0 0 0.0.0.0:68 0.0.0.0:* 6025/dhclient
udp 0 0 0.0.0.0:631 0.0.0.0:* 1481/cups-browsed
udp6 4608 0 :::5353 :::* 1399/avahi-daemon:
udp6 0 0 :::59159 :::* 1399/avahi-daemon:
raw6 0 0 :::58 :::* ...

Read more...

Revision history for this message
steve cohen (steve-si9yrl01qsu4bt4tonx56g) wrote :

added to dnsmasq.conf cli --dns-loop-detect
this stopped the cpu overdrive .. also allowed only resolution of the containers.. lost the real world.

Revision history for this message
Dan Streetman (ddstreet) wrote :

please reopen if this is still an issue

Changed in systemd (Ubuntu):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.