systemd-resolved using 100% CPU

Bug #1670959 reported by Tamas Papp on 2017-03-08
252
This bug affects 55 people
Affects Status Importance Assigned to Milestone
dnsmasq (Ubuntu)
Undecided
Unassigned
systemd (Ubuntu)
Medium
Unassigned

Bug Description

[Triage Notes]
"Incomplete" in dnsmasq: may be a valid bug, but developers cannot make progress until someone can provide steps to reproduce. See comment 55.

[Original Description]
Sometimes systemd-resolved process is using 100% CPU.
After a while it changes back to normal.

It happens usually after connecting to the (wifi) network, like starting the OS.

strace output:

sendmsg(12, {msg_name(16)={sa_family=AF_INET, sin_port=htons(33589), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"6\215\201\200\0\1\0\1\0\0\0\1\4cs41\3wac\vedgecastcdn\3net\0\0\34\0\1\300\f\0\34\0\1\0\0\10\235\0\20&\6(\0\0024\0Y%L\4\6#f&\214\0\0)\377\326\0\0\0\0\0\0", 81}], msg_controllen=28, [{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, {ipi_ifindex=if_nametoindex("lo"), ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("127.0.0.53")}}], msg_flags=0}, 0) = 81
sendmsg(3, {msg_name(0)=NULL, msg_iov(4)=[{"PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=../src/resolve/resolved-dns-stub.c\nCODE_LINE=363\nCODE_FUNCTION=dns_stub_process_query\nSYSLOG_IDENTIFIER=systemd-resolved\n", 160}, {"MESSAGE=", 8}, {"Processing query...", 19}, {"\n", 1}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 188
epoll_wait(4, [{EPOLLIN, {u32=3176459184, u64=94565471415216}}], 16, -1) = 1
clock_gettime(CLOCK_BOOTTIME, {44665, 938069872}) = 0
recvfrom(12, NULL, 0, MSG_PEEK|MSG_TRUNC, NULL, NULL) = 53
recvmsg(12, {msg_name(16)={sa_family=AF_INET, sin_port=htons(33589), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"Z\262\1\20\0\1\0\0\0\0\0\1\4cs41\3wac\vedgecastcdn\3net\0\0\34\0\1\0\0)\2\0\0\0\0\0\0\0", 3936}], msg_controllen=56, [{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, {ipi_ifindex=if_nametoindex("lo"), ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("127.0.0.53")}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, {ttl=64}}], msg_flags=0}, 0) = 53
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=303, ...}) = 0
getrandom("\365I", 2, GRND_NONBLOCK) = 2
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=303, ...}) = 0
getrandom("\203;", 2, GRND_NONBLOCK) = 2
clock_gettime(CLOCK_BOOTTIME, {44665, 938446937}) = 0
open("/run/systemd/netif/links/3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=303, ...}) = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=303, ...}) = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 18
connect(18, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 18, {EPOLLIN, {u32=3176610576, u64=94565471566608}}) = 0
write(18, "\203;\1\20\0\1\0\0\0\0\0\1\4cs41\3wac\vedgecastcdn\3net\0\0\34\0\1\0\0)\2\0\0\0\0\0\0\0", 53) = 53
clock_gettime(CLOCK_BOOTTIME, {44665, 938833717}) = 0
clock_gettime(CLOCK_BOOTTIME, {44665, 938875138}) = 0
epoll_ctl(4, EPOLL_CTL_DEL, 18, NULL) = 0
close(18) = 0

journalctl output:

Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:41 parsec dnsmasq[1545]: Maximum number of concurrent DNS queries reached (max: 150)

As you can see, I would use it together with dnsmasq.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: systemd 232-18ubuntu1
ProcVersionSignature: Ubuntu 4.10.0-9.11-generic 4.10.0
Uname: Linux 4.10.0-9-generic x86_64
NonfreeKernelModules: zfs zunicode zavl zcommon znvpair
ApportVersion: 2.20.4-0ubuntu2
Architecture: amd64
Date: Wed Mar 8 08:20:18 2017
MachineType: Hewlett-Packard HP EliteBook Folio 1020 G1
ProcEnviron:
 LANGUAGE=en_US:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-4.10.0-9-generic root=UUID=a54fe703-35d4-47ac-9c6e-4034421531fb ro rootflags=subvol=@
SourcePackage: systemd
UpgradeStatus: Upgraded to zesty on 2015-05-24 (653 days ago)
dmi.bios.date: 03/09/2015
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: M77 Ver. 01.05
dmi.board.name: 2271
dmi.board.vendor: Hewlett-Packard
dmi.board.version: KBC Version 91.4C
dmi.chassis.asset.tag: CNU51199KV
dmi.chassis.type: 10
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvrM77Ver.01.05:bd03/09/2015:svnHewlett-Packard:pnHPEliteBookFolio1020G1:pvrA3009DD18303:rvnHewlett-Packard:rn2271:rvrKBCVersion91.4C:cvnHewlett-Packard:ct10:cvr:
dmi.product.name: HP EliteBook Folio 1020 G1
dmi.product.version: A3009DD18303
dmi.sys.vendor: Hewlett-Packard

Tamas Papp (tomposmiko) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Changed in systemd (Ubuntu):
importance: Undecided → Medium
Luis Michael Ibarra (clvx) wrote :

It also happens when you are connected by wire. XPS13 9350 affected after upgrading from 16.10(yaketti) to 17.04(zetsy(.

debb1046 (debb1046) wrote :

Also had 100% CPU after upgrade, together with failing DNS resolution and dnsmasq complaining about having reached the max number of simultaneous requests.
Removing dnsmasq has fixed it for me.

Encolpe Degoute (encolpe) wrote :

Remove dnsmasq is not an option when you are using lxd/lxc.
You may reassign this bug to dnsmasq.

A temporary solution can be to append your nameserver in /etc/resolvconf/resolv.conf.d/head

Jeremy Bicha (jbicha) wrote :

lxc and lxd do not depend on dnsmasq in Ubuntu 17.04.

Danny (lesarde) wrote :
Download full text (6.2 KiB)

1. top
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19147 systemd+ 20 0 50848 6232 4572 R 99.0 0.1 5:43.81 systemd-resolve
 1472 dnsmasq 20 0 57020 3084 2620 R 63.1 0.0 44:03.49 dnsmasq

2. journalctl -u systemd-resolved.service
Apr 16 13:01:59 Danny systemd-resolved[19147]: DNSSEC validation failed for question clients-china.l.google.com IN A: failed-auxiliary
Apr 16 13:02:02 Danny systemd-resolved[19147]: Using degraded feature set (UDP+EDNS0) for DNS server 127.0.0.1.
Apr 16 13:02:08 Danny systemd-resolved[19147]: Server 127.0.0.1 does not support DNSSEC, downgrading to non-DNSSEC mode.
Apr 16 13:06:24 Danny systemd-resolved[19147]: Server 192.168.128.1 does not support DNSSEC, downgrading to non-DNSSEC mode.
Apr 16 13:11:24 Danny systemd-resolved[19147]: Grace period over, resuming full feature set (UDP+EDNS0+DO+LARGE) for DNS server 127.0.0.1.

3. journalctl -u dnsmasq.service
Apr 16 13:18:27 Danny dnsmasq[1472]: Maximum number of concurrent DNS queries reached (max: 150)
Apr 16 13:18:33 Danny dnsmasq[1472]: Maximum number of concurrent DNS queries reached (max: 150)
Apr 16 13:18:39 Danny dnsmasq[1472]: Maximum number of concurrent DNS queries reached (max: 150)
Apr 16 13:18:45 Danny dnsmasq[1472]: Maximum number of concurrent DNS queries reached (max: 150)
Apr 16 13:18:51 Danny dnsmasq[1472]: Maximum number of concurrent DNS queries reached (max: 150)

4. lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 17.04
Release: 17.04
Codename: zesty

5. sudo strace -p 1472 --- dnsmasq
bind(12, {sa_family=AF_INET, sin_port=htons(37051), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
sendto(12, "g\325\1\20\0\1\0\0\0\0\0\1\rclients-china\1l\6goo"..., 55, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, 16) = 55
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=12, events=POLLIN}], 8, -1) = 2 ([{fd=4, revents=POLLIN}, {fd=12, revents=POLLIN}])
recvfrom(12, "g\325\201\200\0\1\0\6\0\0\0\1\rclients-china\1l\6goo"..., 5131, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, [16]) = 151
sendmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(33355), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\272\303\201\220\0\1\0\6\0\0\0\1\rclients-china\1l\6goo"..., 151}], msg_controllen=0, msg_flags=0}, 0) = 151
close(12) = 0
recvmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(57326), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"C\21\1\20\0\1\0\0\0\0\0\1\rclients-china\1l\6goo"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 55
socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 12
fcntl(12, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0

6. sudo strace -p 1914
clock_gettime(CLOCK_BOOTTIME, {5858, 172230641}) = 0
open("/run/systemd/netif/links/2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=303, ...}) = 0
stat("/e...

Read more...

Danny (lesarde) on 2017-04-16
description: updated
Kai-Heng Feng (kaihengfeng) wrote :

Does put 'DNSSEC=off' to /etc/systemd/resolved.conf help?

Luis Michael Ibarra (clvx) wrote :

Kai-Heng, At least for me, it's working now. Btw, before changing DNSSEC=off, I tested removing dnsmasq to keep only systemd-resolved(as someone mentioned above), and systemd-resolved didn't resolve any domain. I had to install dnsmasq again to have some sort of connection.

Is this bug related to this one[1]?

[1] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1624320

Tamas Papp (tomposmiko) wrote :

I cannot reproduce the issue very recently, for about ~1 week.

Let me check it in another location tomorrow.

ii dnsmasq 2.76-5 all Small caching DNS proxy and DHCP/TFTP server
ii systemd 232-21ubuntu2 amd64 system and service manager
ii systemd-sysv 232-21ubuntu2 amd64 system and service manager - SysV links

Colan Schwartz (colan) wrote :

@clvx: I think the other bug is unrelated, although maybe that's what you're running into instead of this one.

I was also able to fix this problem by removing dnsmasq:

* sudo apt remove dnsmasq

As soon as I did that, both processes dropped to nothing, and DNS still resolves.

This is a good workaround, but perhaps the actual fix should be to remove the package on upgrade to 17.04?

Tamas Papp (tomposmiko) wrote :

Weird, I can reproduce the issue again after about 1-2 weeks of silence...
I can test the DNSSEC=off setting.

Tamas Papp (tomposmiko) wrote :

> This is a good workaround, but perhaps the actual fix should be to remove the package on upgrade to 17.04?

What do you mean by this?
The fix should not ever something like this.
There is reason why dnsmasq is installed on these machines.

This is a workaround by not good.

Tamas Papp (tomposmiko) wrote :

Recently I can definitely reproduce the issue anytime.

It's extremely annoying.

Tamas Papp (tomposmiko) wrote :

Actually it's happening continuously, it does not even require a wakup, of wifi reconnection or anything...

It's a complete useless peace of crap.

Sergio Callegari (callegar) wrote :

I see it all the time after an upgrade to 17.04. Medium priority seems an euphemism. This is a true showstopper. It is not just systemd-resolved taking a whole core. Dnsmasq takes half of another. The cpu runs hot. In a laptop, batteries drain in a snap.

Has this to do with https://unix.stackexchange.com/questions/304050/how-to-avoid-conflicts-between-dnsmasq-and-systemd-resolved? Should DNSStubListener=no be added to /etc/systemd/resolved.conf? Should dnsmasq not be there?

Note that this bug is now 2 months old, that is over 20% of the expected life of zesty.

tlk (sarcasticskull) wrote :

On a fresh 17.04 install systemd-resolved didn't give me any grief until I've manually installed dnsmasq, because I wanted NetworkManager to share my connection from desktop to a laptop and without dnsmasq it fails to do so.
Now I'm hitting this CPU hog problem.

tlk (sarcasticskull) wrote :

On another not, it seems that dnsmasq was dropped intentionally in regards to NM (https://launchpad.net/ubuntu/zesty/+source/network-manager/+changelog), in favor of systemd-resolved. But when I tried to NAT my desktop's connection to a laptop initially it didn't work out, so I looked into journalctl and found out that NM explicitly wants 'dnsmasq' executable... Another bug probably?

Adam Dingle (adam-yorba) wrote :

I've also run into this CPU problem on 17.04. Like tlk, I had to install dnsmasq because I'm running a hotspot on a USB wifi device, and the system log indicated that it wouldn't work unless dnsmasq is installed.

Adam Dingle (adam-yorba) wrote :

I just tried the workaround Kai-Heng mentioned in #9 above: I added 'DNSSEC=off' to /etc/systemd/resolved.conf, then rebooted. Unfortunately I'm still seeing this CPU problem.

Adam Dingle (adam-yorba) wrote :

It seems that the problem is that dnsmasq is forwarding DNS queries to the systemd-resolved stub server, but they are incompatible in some way:. From /var/log/syslog:

Jun 12 17:05:34 sparkly dnsmasq[1177]: using nameserver 127.0.0.53#53
Jun 12 17:05:49 sparkly dnsmasq[1177]: Maximum number of concurrent DNS queries reached (max: 150)

As a workaround, I edited /etc/dnsmasq.conf and added these lines to tell it to ignore resolve.conf and hardcoded my ISP's DNS servers instead:

no-resolv
server=75.75.75.75
server=75.75.76.76

It seems to work: I'm running a hotspot on a USB wifi device and it's working fine, and there's no sign of the CPU trouble.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dnsmasq (Ubuntu):
status: New → Confirmed
MF (mmuruev) wrote :

Have the same issue with 17.04. Especially then qBittorent starting. Also huge lag with apt update resolve servers.

tags: added: resolved

Hmm I knew I remembered something related ...
In bug 1694156 we have had a - what appears to me as - similar case.
There we could track it down to a config issue, but maybe you here are affected by the same issue yet without a simple config issue to fix.

The clue there was, that that dnsmasq was configured as global dns server to resolved in a manually added /etc/resolvconf/resolv.conf.d/tail. Due to that searches to the local search domain that failed started to be passed back and forth between dnsmasq and systemd-resolvd.
Does anybody affected here have the same setup of dnsmasq being registered as global dns? You can check with:
 $ systemd-resolve --status
The local dnsmasq should not more be listed in there.

Not sure - but if all affected users would have such a "local dnsmasq is configured as global dns in systemd-resolved" then we will have to find which package/config/guide/upgrade-path sets it up that way and resolve that - but first of all please verify that this would help you.

Also as another option if you explicitly want your local dnsmasq to be configured in resolved as global dns server - then you might want to check if dnsmasq's option "--dns-loop-detect" via DNSMASQ_OPTS in /etc/default/dnsmasq.

"if dnsmasq's option "--dns-loop-detect" via DNSMASQ_OPTS in /etc/default/dnsmasq."
this does not help.

Michael Roth (mdroth) wrote :

@ChristianEhrhardt: I'm not sure how to determine whether systemd-resolver is forwarding back to dnsmasq, hopefully it's clear from the output:

mdroth@sif:~$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 127.0.0.1

mdroth@sif:~$ ps aux | grep dnsmasq
dnsmasq 1122 1.6 0.0 55276 2864 ? S Sep18 10:11 /usr/sbin/dnsmasq -x /run/dnsmasq/dnsmasq.pid -u dnsmasq -r /run/dnsmasq/resolv.conf -7 /etc/dnsmasq.d,.dpkg-dist,.dpkg-old,.dpkg-new --local-service --trust-anchor=.,19036,8,2,49AAC11D7B6F6446702E54A1607371607A1A41855200FD2CE1CDDE32F24E8FB5
libvirt+ 1794 0.0 0.0 52380 396 ? S Sep18 0:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper
root 1795 0.0 0.0 52352 396 ? S Sep18 0:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper
mdroth 7792 0.0 0.0 14244 1028 pts/6 S+ 06:43 0:00 grep --color=auto dnsmasq

mdroth@sif:~$ cat /run/dnsmasq/resolv.conf
nameserver 127.0.0.53

mdroth@sif:~$ systemd-resolve --status
Global
         DNS Servers: 127.0.0.1
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      18.172.in-addr.arpa
                      19.172.in-addr.arpa
                      20.172.in-addr.arpa
                      21.172.in-addr.arpa
                      22.172.in-addr.arpa
                      23.172.in-addr.arpa
                      24.172.in-addr.arpa
                      25.172.in-addr.arpa
                      26.172.in-addr.arpa
                      27.172.in-addr.arpa
                      28.172.in-addr.arpa
                      29.172.in-addr.arpa
                      30.172.in-addr.arpa
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

Link 5 (virbr0-nic)
      Current Scopes: none
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 4 (virbr0)
      Current Scopes: none
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 3 (wlp4s0)
      Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
         DNS Servers: 192.168.2.254
                      fd6d:6e53:4042::1
          DNS Domain: lan

Link 2 (enp0s31f6)
      Current Scopes: none
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Hi Michael,
I'm certainly the least expert in this case - I was just trying to help mapping that other case I have seen.

Your ouput tells (me oO) that your resolvd talks to:
1. a link scope DNS 192.168.2.254 on wlp4s0
2. is set up to talk to a gloabl dns at 127.0.0.1

IIRC it will ask both and whoever answers first is the reply it gives (can be bad at times if the nack of server A is faster than the detail from server B).

On 127.0.0.53 that is resolvd itself listening.

The libvirt dnsmasq usually has "except-interface=lo" so it should not listen on 127.0.0.1.
The case I mentioned in c#25 was a manual config change that made resolved call the dnsmasq causing the loop.
I agree that you can't easily derive if you are "looped" now from the output you have.

Maybe you could check in your case if/who does listen on 127.0.0.1:53?
$ sudo netstat -ltaupn | sed -rne 2p -e '/:53\b/p'

Does libvirt know the pid of the dnsmasq it spawned, maybe this can be used to make this check more find grained?
Like "is any of our pids binding :53 on something in systemd-resolved --status" maybe.

Tamas Papp (tomposmiko) wrote :

hi All,

Is it possible to eliminate systemd-resolved from the system?

Dimitri John Ledkov (xnox) wrote :

@ Michael Roth (mdroth)

On zesty and later /etc/resolv.conf must not list "nameserver 127.0.0.1" this seems to be broken configuration state.
I am concerned how you got into such a state. Is your network managed by ifupdown with /etc/network/interfaces or NetworkManager? do you have resolvconf installed? What is the output of $ NetworkManager --print-config | grep dns ?

@tomposmiko
$ systemctl disable systemd-resolved
$ systemctl stop systemd-resolved
$ systemctl mask systemd-resolved
$ apt remove --purge libnss-resolve

Should do it. Then you should still have resolvconf & ifupdown installed, or resolvconf+dnsmasq+NetworkManager installed. Or without resolvconf manage /etc/resolv.conf by hand.
Note such system configurations may not be fully supported, but should be operational for simplistic networking setups.

Nikhil Verma (nikhilweee) wrote :

Is there an official solution yet? BTW https://askubuntu.com/a/968309/ seemed to help me out.

James Cuzella (trinitronx) wrote :

Also seeing this issue in Ubuntu 17.10 (artful) with:

dnsmasq-base 2.78-1
systemd 234-2ubuntu12.1

I can confirm that following https://askubuntu.com/a/968309/ resolves the issue!

Steps to fix the issue:

Added this line to /etc/default/dnsmasq:

    DNSMASQ_EXCEPT=lo

Then restarting dnsmasq:

    sudo systemctl daemon-reload
    sudo systemctl restart dnsmasq

No more systemd-resolved taking 99-100% of a CPU core!

Interesting James,
so there is a little twist that might be related and lead to a resolution.

Usually the service of dnsmasq would add itself to the resolvconf server set.
It does so in:
start_resolvconf()
{
# If interface "lo" is explicitly disabled in /etc/default/dnsmasq
# Then dnsmasq won't be providing local DNS, so don't add it to
# the resolvconf server set.
        for interface in $DNSMASQ_EXCEPT
        do
                [ $interface = lo ] && return
        done

        # Also skip this if DNS functionality is disabled in /etc/dnsmasq.conf
        if grep -qs '^port=0' /etc/dnsmasq.conf; then
                return
        fi

        if [ -x /sbin/resolvconf ] ; then
                echo "nameserver 127.0.0.1" | /sbin/resolvconf -a lo.$NAME
        fi
        return 0
}

So I wonder if that "not adding to resolvconf is the actual way this is fixed for you per https://askubuntu.com/a/968309/

So while the config says "do not server lo", what it really does to fix your case is not adding it. That in turn one could test.

So once you are in the bad case could you try if
$ resolvconf -a lo.dnsmasq
would resolv the issue as well?

I currently fail to grasp all the potential regressions that change could have, but I'd think that knowing if
a) not serving on 127.0.0.1
or
b) not registering in resolvconf
is the actual trigger might help someone else to go further.

James Cuzella (trinitronx) wrote :

Christian: I've tried as I believe you have asked.

When in the bad case (I comment out the line: DNSMASQ_EXCEPT=lo and then reboot):

 - Running this did not resolve the issue: `sudo resolvconf -a lo.dnsmasq`
 - Running this did not resolve the issue: `echo "nameserver 127.0.0.1" | /sbin/resolvconf -a lo.dnsmasq`

So far only the DNSMASQ_EXCEPT=lo line in /etc/default/dnsmasq appears to resolve the issue.

James Cuzella (trinitronx) wrote :

Maybe interesting things to note:

1. When in bad state:
  - The mentioned ENV var change in /etc/default/dnsmasq and `sudo systemctl restart dnsmasq` immediately resolves the issue
2. When in good state:
  - Commenting out the ENV var and running `sudo systemctl restart dnsmasq` does not immediately re-cause the issue
  - After a reboot, the bad state can be observed again (systemd-resolved using 100% of a CPU core)

Maybe there is an order of operations thing here?

Dave Chiluk (chiluk) on 2017-12-12
tags: added: indeed
Peter (peter-schmitteckert) wrote :

Adding

 DNSMASQ_EXCEPT=lo

to
 /etc/default/dnsmasq

solves the problem in my case, kubuntu 17.10, kernel 4.15.rc7,
for several days now.

Peter

Roland (roland-breedveld) wrote :

same issue with 17.10, takes a complete core,
DNSMASQ_EXCEPT=lo is not working, also after reboot, it still takes a core

Ketil Malde (ketil-ii) wrote :

Just to chime in: Noticed slow machine and systemd-resolved taking 100% cpu. Edited /etc/systemd/resolved.conf to DNSSEC=no (not off, as "no" was commented out in the file). Tried systemctl restart systemd-resolved, but nothing changed. Tried killall systemd-resolved, nothing. Tried killall -9, and it respawned a well-behaved process. I recently installed dnsmasq when my computer failed to resolve anything. This is a fairly up-to-date Ubuntu 17.10.

Amir H Piri (amirh.piri) wrote :

I also trying confront with this problem, but there is no solution at all. I tried to disable the systemd-resolved service but after that my domain resolver stop working and I lost access to webpages. I also have dnsmasq. I don't know why but I think this service is come from Ubuntu 16.04.
I upgraded my Ubuntu from 16.04 to Ubuntu 17.10 and then upgrade it to 18.04. Maybe this is why the problem is created.

Pieter (diepes) wrote :

Ubuntu 18.04.

edited $sudo vim /etc/resolv.conf

changed

nameserver 127.0.0.1

to

nameserver 127.0.0.53

CPU calmed down.

Andreas Hasenack (ahasenack) wrote :

@diepes, interesting. Looks like there was a loop going on there.

Can everybody please check their /etc/resolv.conf to see if something similar might be happening with it?

Ricky Brent (rickybrent) wrote :

This happened on my machine, and diepes's fix worked for me as well.

Justin Bennett (justinjb) wrote :

I just started noticing this issue on my machine today. Ubuntu 18.04. I also had nameserver 127.0.0.1 in /etc/resolv.conf. Changed to 127.0.0.53, per diepes' comment and CPU calmed down for me as well.

amir (amirse) wrote :

This solution works, but it's not persistent. /etc/resolv.conf is a symlink to /run/resolvconf/resolv.conf, which is ephemeral. It'll be recreated as 127.0.0.1.
Any news on an actual fix for this, rather than just a workaround?
Thanks!

Andreas Hasenack (ahasenack) wrote :

What do you guys have inside /etc/resolvconf? In terms of directories, files, and their contents.

Do you have the resolvconf package installed by any chance?

Andreas Hasenack (ahasenack) wrote :

And looks like we need to find out what is adding 127.0.0.1 to resolv.conf. It could come from many different sources, like a nameserver configuration in some /etc/network/interfaces carried over from xenial, or something in network-manager, etc. I guess you could start with a brute-force grep -r on /etc for 127.0.0.1?

It seems to be dnsmasq itself:

/etc/init.d/dnsmasq:163: echo "nameserver 127.0.0.1" | /sbin/resolvconf -a lo.$NAME

Andreas Hasenack (ahasenack) wrote :

I don't know what is the use case you guys have that you have to have dnsmasq and resolvconf installed. Maybe there is none and this is just an artifact from a release upgrade.

In Bionic, I only have dnsmasq-base (not dnsmasq) installed, and no resolvconf.

Looking at the initscript from dnsmasq, there seem to be a few options to disable it for local dns resolving, like the one suggested in https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1670959/comments/32, which I reproduce below:
Added this line to /etc/default/dnsmasq:

    DNSMASQ_EXCEPT=lo

Then restarting dnsmasq:

    sudo systemctl daemon-reload
    sudo systemctl restart dnsmasq

That will prevent the code quoted in https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1670959/comments/47 from running.

I also think it's an upgrade artifact. It happened on my upgraded laptop but not in the freshly installed desktop machine.
The problem in the laptop now (I think) hostapd that requires dnsmasq.
The workaround cited by @ahasenack in #49 seems to work.

Håkon Enger (hakon-enger) wrote :

I also had this issue after upgrading to 18.04. During the upgrade, I got asked if I wanted to keep or replace my /etc/dnsmasq.conf, I opted to replace with the package maintainer's version. The old config file had the option interface=enp0s25, the new one does not. I don't know/remember where this option came from. Could this be related?

tuxinvader (tuxinvader) wrote :

I have this on Ubuntu 18.04. In my case I use dnsmasq with a custom configuration providing DHCP and DNS to multiple bridges (Virtual Machines and containers). So I need them both (in so far as I need dnsmasq and am being forced to use systemd).

The problem is a DNS loop between systemd and dnsmasq. systemd-resolved forwards DNS queries in parallel to entries in /etc/resolv.conf and servers picked up from interfaces via DHCP. If you, like me have dnsmasq in resolv.conf, and a dnsmasq configuration that forwards to systemd, then you have a loop.

My work-around is to add `DNS=127.1.1.1` into /etc/systemd/resolved.conf. Nothing is listening on 127.1.1.1 so the queries go unanswered. The systemd resovler gets answers only from servers provided to me via DHCP or statically assigned to an interface.

Inside my dnsmasq.conf I have `server=127.0.0.53`, and resolv.conf has `nameserver 127.0.0.1` which is dnsmasq.

Karl Kastner (kastner-karl) wrote :

This annoying bug affected all upgrades to Bionic and Cosmic Cuttlefish I did so far and also effects Cosmic Cuttlefish. systemd-resolv and dnsmasq consume each 100% CPU and internet access becomes extremely slow.

My latest workaround is to disable automatic overwriting to /etc/resolf.conf with sudo dpkg-reconfigure resolvconf, and to replaced the spurious nameserver 127.0.01 in /etc/resolv.conf with nameserver 8.8.8.8

Andreas Hasenack (ahasenack) wrote :

There is some configuration that leads to a loop between these two resolvers, and that is what causes the cpu usage. If we could get a step by step way to reproduce it, it could probably be addressed.

JuanJo Ciarlante (jjo) wrote :

Also happening to me after 16.04 -> 18.04 LTS upgrade (via do-release-upgrade)

Robie Basak (racb) wrote :

From the above comments I think it's clear that the problem is caused by some kind of misconfiguration resulting in a loop. It isn't clear if that misconfiguration is coming from a user error or from some bug in an upgrade path somewhere. If the latter, then it's a valid bug. However, since we don't have steps to reproduce the problem, I'm marking this bug as Incomplete in dnsmasq, to make it clear that developers aren't expected to be able to make any progress on this problem until someone can provide steps to reproduce.

Changed in dnsmasq (Ubuntu):
status: Confirmed → Incomplete
description: updated
JuanJo Ciarlante (jjo) wrote :

For other souls facing this "Medium" issue,
a hammer-ish workaround that works for me:

1) Run:
apt-get install cpulimit

2) edit /lib/systemd/system/systemd-resolved.service:

2a) Comment out:
#Type=notify

2b) Replace line (may want to remove the -k to let cpulimit throttle it):
#ExecStart=!!/lib/systemd/systemd-resolved
ExecStart=!!/usr/bin/cpulimit -f -q -k -l 50 -- /lib/systemd/systemd-resolved

3) Run:

systemctl daemon-reload
systemctl restart systemd-resolved

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers