systemd-resolved using 100% CPU

Bug #1670959 reported by Tamas Papp
256
This bug affects 56 people
Affects Status Importance Assigned to Milestone
dnsmasq (Ubuntu)
Expired
Undecided
Unassigned
systemd (Ubuntu)
Invalid
Medium
Unassigned

Bug Description

[Triage Notes]
"Incomplete" in dnsmasq: may be a valid bug, but developers cannot make progress until someone can provide steps to reproduce. See comment 55.

[Original Description]
Sometimes systemd-resolved process is using 100% CPU.
After a while it changes back to normal.

It happens usually after connecting to the (wifi) network, like starting the OS.

strace output:

sendmsg(12, {msg_name(16)={sa_family=AF_INET, sin_port=htons(33589), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"6\215\201\200\0\1\0\1\0\0\0\1\4cs41\3wac\vedgecastcdn\3net\0\0\34\0\1\300\f\0\34\0\1\0\0\10\235\0\20&\6(\0\0024\0Y%L\4\6#f&\214\0\0)\377\326\0\0\0\0\0\0", 81}], msg_controllen=28, [{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, {ipi_ifindex=if_nametoindex("lo"), ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("127.0.0.53")}}], msg_flags=0}, 0) = 81
sendmsg(3, {msg_name(0)=NULL, msg_iov(4)=[{"PRIORITY=6\nSYSLOG_FACILITY=3\nCODE_FILE=../src/resolve/resolved-dns-stub.c\nCODE_LINE=363\nCODE_FUNCTION=dns_stub_process_query\nSYSLOG_IDENTIFIER=systemd-resolved\n", 160}, {"MESSAGE=", 8}, {"Processing query...", 19}, {"\n", 1}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 188
epoll_wait(4, [{EPOLLIN, {u32=3176459184, u64=94565471415216}}], 16, -1) = 1
clock_gettime(CLOCK_BOOTTIME, {44665, 938069872}) = 0
recvfrom(12, NULL, 0, MSG_PEEK|MSG_TRUNC, NULL, NULL) = 53
recvmsg(12, {msg_name(16)={sa_family=AF_INET, sin_port=htons(33589), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"Z\262\1\20\0\1\0\0\0\0\0\1\4cs41\3wac\vedgecastcdn\3net\0\0\34\0\1\0\0)\2\0\0\0\0\0\0\0", 3936}], msg_controllen=56, [{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, {ipi_ifindex=if_nametoindex("lo"), ipi_spec_dst=inet_addr("127.0.0.53"), ipi_addr=inet_addr("127.0.0.53")}}, {cmsg_len=20, cmsg_level=SOL_IP, cmsg_type=IP_TTL, {ttl=64}}], msg_flags=0}, 0) = 53
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=303, ...}) = 0
getrandom("\365I", 2, GRND_NONBLOCK) = 2
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=303, ...}) = 0
getrandom("\203;", 2, GRND_NONBLOCK) = 2
clock_gettime(CLOCK_BOOTTIME, {44665, 938446937}) = 0
open("/run/systemd/netif/links/3", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=303, ...}) = 0
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=303, ...}) = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 18
connect(18, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
epoll_ctl(4, EPOLL_CTL_ADD, 18, {EPOLLIN, {u32=3176610576, u64=94565471566608}}) = 0
write(18, "\203;\1\20\0\1\0\0\0\0\0\1\4cs41\3wac\vedgecastcdn\3net\0\0\34\0\1\0\0)\2\0\0\0\0\0\0\0", 53) = 53
clock_gettime(CLOCK_BOOTTIME, {44665, 938833717}) = 0
clock_gettime(CLOCK_BOOTTIME, {44665, 938875138}) = 0
epoll_ctl(4, EPOLL_CTL_DEL, 18, NULL) = 0
close(18) = 0

journalctl output:

Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:35 parsec systemd-resolved[1512]: Processing query...
Mar 08 08:25:41 parsec dnsmasq[1545]: Maximum number of concurrent DNS queries reached (max: 150)

As you can see, I would use it together with dnsmasq.

ProblemType: Bug
DistroRelease: Ubuntu 17.04
Package: systemd 232-18ubuntu1
ProcVersionSignature: Ubuntu 4.10.0-9.11-generic 4.10.0
Uname: Linux 4.10.0-9-generic x86_64
NonfreeKernelModules: zfs zunicode zavl zcommon znvpair
ApportVersion: 2.20.4-0ubuntu2
Architecture: amd64
Date: Wed Mar 8 08:20:18 2017
MachineType: Hewlett-Packard HP EliteBook Folio 1020 G1
ProcEnviron:
 LANGUAGE=en_US:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/zsh
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-4.10.0-9-generic root=UUID=a54fe703-35d4-47ac-9c6e-4034421531fb ro rootflags=subvol=@
SourcePackage: systemd
UpgradeStatus: Upgraded to zesty on 2015-05-24 (653 days ago)
dmi.bios.date: 03/09/2015
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: M77 Ver. 01.05
dmi.board.name: 2271
dmi.board.vendor: Hewlett-Packard
dmi.board.version: KBC Version 91.4C
dmi.chassis.asset.tag: CNU51199KV
dmi.chassis.type: 10
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvrM77Ver.01.05:bd03/09/2015:svnHewlett-Packard:pnHPEliteBookFolio1020G1:pvrA3009DD18303:rvnHewlett-Packard:rn2271:rvrKBCVersion91.4C:cvnHewlett-Packard:ct10:cvr:
dmi.product.name: HP EliteBook Folio 1020 G1
dmi.product.version: A3009DD18303
dmi.sys.vendor: Hewlett-Packard

Revision history for this message
Tamas Papp (tomposmiko) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Changed in systemd (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Luis Michael Ibarra (clvx) wrote :

It also happens when you are connected by wire. XPS13 9350 affected after upgrading from 16.10(yaketti) to 17.04(zetsy(.

Revision history for this message
debb1046 (debb1046) wrote :

Also had 100% CPU after upgrade, together with failing DNS resolution and dnsmasq complaining about having reached the max number of simultaneous requests.
Removing dnsmasq has fixed it for me.

Revision history for this message
Encolpe Degoute (encolpe) wrote :

Remove dnsmasq is not an option when you are using lxd/lxc.
You may reassign this bug to dnsmasq.

A temporary solution can be to append your nameserver in /etc/resolvconf/resolv.conf.d/head

Revision history for this message
Jeremy Bícha (jbicha) wrote :

lxc and lxd do not depend on dnsmasq in Ubuntu 17.04.

Revision history for this message
Danny (lesarde) wrote :
Download full text (6.2 KiB)

1. top
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19147 systemd+ 20 0 50848 6232 4572 R 99.0 0.1 5:43.81 systemd-resolve
 1472 dnsmasq 20 0 57020 3084 2620 R 63.1 0.0 44:03.49 dnsmasq

2. journalctl -u systemd-resolved.service
Apr 16 13:01:59 Danny systemd-resolved[19147]: DNSSEC validation failed for question clients-china.l.google.com IN A: failed-auxiliary
Apr 16 13:02:02 Danny systemd-resolved[19147]: Using degraded feature set (UDP+EDNS0) for DNS server 127.0.0.1.
Apr 16 13:02:08 Danny systemd-resolved[19147]: Server 127.0.0.1 does not support DNSSEC, downgrading to non-DNSSEC mode.
Apr 16 13:06:24 Danny systemd-resolved[19147]: Server 192.168.128.1 does not support DNSSEC, downgrading to non-DNSSEC mode.
Apr 16 13:11:24 Danny systemd-resolved[19147]: Grace period over, resuming full feature set (UDP+EDNS0+DO+LARGE) for DNS server 127.0.0.1.

3. journalctl -u dnsmasq.service
Apr 16 13:18:27 Danny dnsmasq[1472]: Maximum number of concurrent DNS queries reached (max: 150)
Apr 16 13:18:33 Danny dnsmasq[1472]: Maximum number of concurrent DNS queries reached (max: 150)
Apr 16 13:18:39 Danny dnsmasq[1472]: Maximum number of concurrent DNS queries reached (max: 150)
Apr 16 13:18:45 Danny dnsmasq[1472]: Maximum number of concurrent DNS queries reached (max: 150)
Apr 16 13:18:51 Danny dnsmasq[1472]: Maximum number of concurrent DNS queries reached (max: 150)

4. lsb_release -a
Distributor ID: Ubuntu
Description: Ubuntu 17.04
Release: 17.04
Codename: zesty

5. sudo strace -p 1472 --- dnsmasq
bind(12, {sa_family=AF_INET, sin_port=htons(37051), sin_addr=inet_addr("0.0.0.0")}, 16) = 0
sendto(12, "g\325\1\20\0\1\0\0\0\0\0\1\rclients-china\1l\6goo"..., 55, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, 16) = 55
poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN}, {fd=7, events=POLLIN}, {fd=8, events=POLLIN}, {fd=9, events=POLLIN}, {fd=12, events=POLLIN}], 8, -1) = 2 ([{fd=4, revents=POLLIN}, {fd=12, revents=POLLIN}])
recvfrom(12, "g\325\201\200\0\1\0\6\0\0\0\1\rclients-china\1l\6goo"..., 5131, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("127.0.0.53")}, [16]) = 151
sendmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(33355), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"\272\303\201\220\0\1\0\6\0\0\0\1\rclients-china\1l\6goo"..., 151}], msg_controllen=0, msg_flags=0}, 0) = 151
close(12) = 0
recvmsg(4, {msg_name(16)={sa_family=AF_INET, sin_port=htons(57326), sin_addr=inet_addr("127.0.0.1")}, msg_iov(1)=[{"C\21\1\20\0\1\0\0\0\0\0\1\rclients-china\1l\6goo"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 55
socket(AF_INET, SOCK_DGRAM, IPPROTO_IP) = 12
fcntl(12, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0

6. sudo strace -p 1914
clock_gettime(CLOCK_BOOTTIME, {5858, 172230641}) = 0
open("/run/systemd/netif/links/2", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=303, ...}) = 0
stat("/e...

Read more...

Danny (lesarde)
description: updated
Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Does put 'DNSSEC=off' to /etc/systemd/resolved.conf help?

Revision history for this message
Luis Michael Ibarra (clvx) wrote :

Kai-Heng, At least for me, it's working now. Btw, before changing DNSSEC=off, I tested removing dnsmasq to keep only systemd-resolved(as someone mentioned above), and systemd-resolved didn't resolve any domain. I had to install dnsmasq again to have some sort of connection.

Is this bug related to this one[1]?

[1] https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1624320

Revision history for this message
Tamas Papp (tomposmiko) wrote :

I cannot reproduce the issue very recently, for about ~1 week.

Let me check it in another location tomorrow.

ii dnsmasq 2.76-5 all Small caching DNS proxy and DHCP/TFTP server
ii systemd 232-21ubuntu2 amd64 system and service manager
ii systemd-sysv 232-21ubuntu2 amd64 system and service manager - SysV links

Revision history for this message
Colan Schwartz (colan) wrote :

@clvx: I think the other bug is unrelated, although maybe that's what you're running into instead of this one.

I was also able to fix this problem by removing dnsmasq:

* sudo apt remove dnsmasq

As soon as I did that, both processes dropped to nothing, and DNS still resolves.

This is a good workaround, but perhaps the actual fix should be to remove the package on upgrade to 17.04?

Revision history for this message
Tamas Papp (tomposmiko) wrote :

Weird, I can reproduce the issue again after about 1-2 weeks of silence...
I can test the DNSSEC=off setting.

Revision history for this message
Tamas Papp (tomposmiko) wrote :

> This is a good workaround, but perhaps the actual fix should be to remove the package on upgrade to 17.04?

What do you mean by this?
The fix should not ever something like this.
There is reason why dnsmasq is installed on these machines.

This is a workaround by not good.

Revision history for this message
Tamas Papp (tomposmiko) wrote :

Recently I can definitely reproduce the issue anytime.

It's extremely annoying.

Revision history for this message
Tamas Papp (tomposmiko) wrote :

Actually it's happening continuously, it does not even require a wakup, of wifi reconnection or anything...

It's a complete useless peace of crap.

Revision history for this message
Sergio Callegari (callegar) wrote :

I see it all the time after an upgrade to 17.04. Medium priority seems an euphemism. This is a true showstopper. It is not just systemd-resolved taking a whole core. Dnsmasq takes half of another. The cpu runs hot. In a laptop, batteries drain in a snap.

Has this to do with https://unix.stackexchange.com/questions/304050/how-to-avoid-conflicts-between-dnsmasq-and-systemd-resolved? Should DNSStubListener=no be added to /etc/systemd/resolved.conf? Should dnsmasq not be there?

Note that this bug is now 2 months old, that is over 20% of the expected life of zesty.

Revision history for this message
tlk (sarcasticskull) wrote :

On a fresh 17.04 install systemd-resolved didn't give me any grief until I've manually installed dnsmasq, because I wanted NetworkManager to share my connection from desktop to a laptop and without dnsmasq it fails to do so.
Now I'm hitting this CPU hog problem.

Revision history for this message
tlk (sarcasticskull) wrote :

On another not, it seems that dnsmasq was dropped intentionally in regards to NM (https://launchpad.net/ubuntu/zesty/+source/network-manager/+changelog), in favor of systemd-resolved. But when I tried to NAT my desktop's connection to a laptop initially it didn't work out, so I looked into journalctl and found out that NM explicitly wants 'dnsmasq' executable... Another bug probably?

Revision history for this message
Adam Dingle (adam-yorba) wrote :

I've also run into this CPU problem on 17.04. Like tlk, I had to install dnsmasq because I'm running a hotspot on a USB wifi device, and the system log indicated that it wouldn't work unless dnsmasq is installed.

Revision history for this message
Adam Dingle (adam-yorba) wrote :

I just tried the workaround Kai-Heng mentioned in #9 above: I added 'DNSSEC=off' to /etc/systemd/resolved.conf, then rebooted. Unfortunately I'm still seeing this CPU problem.

Revision history for this message
Adam Dingle (adam-yorba) wrote :

It seems that the problem is that dnsmasq is forwarding DNS queries to the systemd-resolved stub server, but they are incompatible in some way:. From /var/log/syslog:

Jun 12 17:05:34 sparkly dnsmasq[1177]: using nameserver 127.0.0.53#53
Jun 12 17:05:49 sparkly dnsmasq[1177]: Maximum number of concurrent DNS queries reached (max: 150)

As a workaround, I edited /etc/dnsmasq.conf and added these lines to tell it to ignore resolve.conf and hardcoded my ISP's DNS servers instead:

no-resolv
server=75.75.75.75
server=75.75.76.76

It seems to work: I'm running a hotspot on a USB wifi device and it's working fine, and there's no sign of the CPU trouble.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in dnsmasq (Ubuntu):
status: New → Confirmed
Revision history for this message
MF (mmuruev) wrote :

Have the same issue with 17.04. Especially then qBittorent starting. Also huge lag with apt update resolve servers.

tags: added: resolved
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm I knew I remembered something related ...
In bug 1694156 we have had a - what appears to me as - similar case.
There we could track it down to a config issue, but maybe you here are affected by the same issue yet without a simple config issue to fix.

The clue there was, that that dnsmasq was configured as global dns server to resolved in a manually added /etc/resolvconf/resolv.conf.d/tail. Due to that searches to the local search domain that failed started to be passed back and forth between dnsmasq and systemd-resolvd.
Does anybody affected here have the same setup of dnsmasq being registered as global dns? You can check with:
 $ systemd-resolve --status
The local dnsmasq should not more be listed in there.

Not sure - but if all affected users would have such a "local dnsmasq is configured as global dns in systemd-resolved" then we will have to find which package/config/guide/upgrade-path sets it up that way and resolve that - but first of all please verify that this would help you.

Also as another option if you explicitly want your local dnsmasq to be configured in resolved as global dns server - then you might want to check if dnsmasq's option "--dns-loop-detect" via DNSMASQ_OPTS in /etc/default/dnsmasq.

Revision history for this message
Tim Richardson (tim-richardson) wrote :

"if dnsmasq's option "--dns-loop-detect" via DNSMASQ_OPTS in /etc/default/dnsmasq."
this does not help.

Revision history for this message
Michael Roth (mdroth) wrote :

@ChristianEhrhardt: I'm not sure how to determine whether systemd-resolver is forwarding back to dnsmasq, hopefully it's clear from the output:

mdroth@sif:~$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
# 127.0.0.53 is the systemd-resolved stub resolver.
# run "systemd-resolve --status" to see details about the actual nameservers.

nameserver 127.0.0.1

mdroth@sif:~$ ps aux | grep dnsmasq
dnsmasq 1122 1.6 0.0 55276 2864 ? S Sep18 10:11 /usr/sbin/dnsmasq -x /run/dnsmasq/dnsmasq.pid -u dnsmasq -r /run/dnsmasq/resolv.conf -7 /etc/dnsmasq.d,.dpkg-dist,.dpkg-old,.dpkg-new --local-service --trust-anchor=.,19036,8,2,49AAC11D7B6F6446702E54A1607371607A1A41855200FD2CE1CDDE32F24E8FB5
libvirt+ 1794 0.0 0.0 52380 396 ? S Sep18 0:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper
root 1795 0.0 0.0 52352 396 ? S Sep18 0:00 /usr/sbin/dnsmasq --conf-file=/var/lib/libvirt/dnsmasq/default.conf --leasefile-ro --dhcp-script=/usr/lib/libvirt/libvirt_leaseshelper
mdroth 7792 0.0 0.0 14244 1028 pts/6 S+ 06:43 0:00 grep --color=auto dnsmasq

mdroth@sif:~$ cat /run/dnsmasq/resolv.conf
nameserver 127.0.0.53

mdroth@sif:~$ systemd-resolve --status
Global
         DNS Servers: 127.0.0.1
          DNSSEC NTA: 10.in-addr.arpa
                      16.172.in-addr.arpa
                      168.192.in-addr.arpa
                      17.172.in-addr.arpa
                      18.172.in-addr.arpa
                      19.172.in-addr.arpa
                      20.172.in-addr.arpa
                      21.172.in-addr.arpa
                      22.172.in-addr.arpa
                      23.172.in-addr.arpa
                      24.172.in-addr.arpa
                      25.172.in-addr.arpa
                      26.172.in-addr.arpa
                      27.172.in-addr.arpa
                      28.172.in-addr.arpa
                      29.172.in-addr.arpa
                      30.172.in-addr.arpa
                      31.172.in-addr.arpa
                      corp
                      d.f.ip6.arpa
                      home
                      internal
                      intranet
                      lan
                      local
                      private
                      test

Link 5 (virbr0-nic)
      Current Scopes: none
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 4 (virbr0)
      Current Scopes: none
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Link 3 (wlp4s0)
      Current Scopes: DNS LLMNR/IPv4 LLMNR/IPv6
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no
         DNS Servers: 192.168.2.254
                      fd6d:6e53:4042::1
          DNS Domain: lan

Link 2 (enp0s31f6)
      Current Scopes: none
       LLMNR setting: yes
MulticastDNS setting: no
      DNSSEC setting: no
    DNSSEC supported: no

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Michael,
I'm certainly the least expert in this case - I was just trying to help mapping that other case I have seen.

Your ouput tells (me oO) that your resolvd talks to:
1. a link scope DNS 192.168.2.254 on wlp4s0
2. is set up to talk to a gloabl dns at 127.0.0.1

IIRC it will ask both and whoever answers first is the reply it gives (can be bad at times if the nack of server A is faster than the detail from server B).

On 127.0.0.53 that is resolvd itself listening.

The libvirt dnsmasq usually has "except-interface=lo" so it should not listen on 127.0.0.1.
The case I mentioned in c#25 was a manual config change that made resolved call the dnsmasq causing the loop.
I agree that you can't easily derive if you are "looped" now from the output you have.

Maybe you could check in your case if/who does listen on 127.0.0.1:53?
$ sudo netstat -ltaupn | sed -rne 2p -e '/:53\b/p'

Does libvirt know the pid of the dnsmasq it spawned, maybe this can be used to make this check more find grained?
Like "is any of our pids binding :53 on something in systemd-resolved --status" maybe.

Revision history for this message
Tamas Papp (tomposmiko) wrote :

hi All,

Is it possible to eliminate systemd-resolved from the system?

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

@ Michael Roth (mdroth)

On zesty and later /etc/resolv.conf must not list "nameserver 127.0.0.1" this seems to be broken configuration state.
I am concerned how you got into such a state. Is your network managed by ifupdown with /etc/network/interfaces or NetworkManager? do you have resolvconf installed? What is the output of $ NetworkManager --print-config | grep dns ?

@tomposmiko
$ systemctl disable systemd-resolved
$ systemctl stop systemd-resolved
$ systemctl mask systemd-resolved
$ apt remove --purge libnss-resolve

Should do it. Then you should still have resolvconf & ifupdown installed, or resolvconf+dnsmasq+NetworkManager installed. Or without resolvconf manage /etc/resolv.conf by hand.
Note such system configurations may not be fully supported, but should be operational for simplistic networking setups.

Revision history for this message
Nikhil Verma (nikhilweee) wrote :

Is there an official solution yet? BTW https://askubuntu.com/a/968309/ seemed to help me out.

Revision history for this message
James Cuzella (trinitronx) wrote :

Also seeing this issue in Ubuntu 17.10 (artful) with:

dnsmasq-base 2.78-1
systemd 234-2ubuntu12.1

I can confirm that following https://askubuntu.com/a/968309/ resolves the issue!

Steps to fix the issue:

Added this line to /etc/default/dnsmasq:

    DNSMASQ_EXCEPT=lo

Then restarting dnsmasq:

    sudo systemctl daemon-reload
    sudo systemctl restart dnsmasq

No more systemd-resolved taking 99-100% of a CPU core!

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Interesting James,
so there is a little twist that might be related and lead to a resolution.

Usually the service of dnsmasq would add itself to the resolvconf server set.
It does so in:
start_resolvconf()
{
# If interface "lo" is explicitly disabled in /etc/default/dnsmasq
# Then dnsmasq won't be providing local DNS, so don't add it to
# the resolvconf server set.
        for interface in $DNSMASQ_EXCEPT
        do
                [ $interface = lo ] && return
        done

        # Also skip this if DNS functionality is disabled in /etc/dnsmasq.conf
        if grep -qs '^port=0' /etc/dnsmasq.conf; then
                return
        fi

        if [ -x /sbin/resolvconf ] ; then
                echo "nameserver 127.0.0.1" | /sbin/resolvconf -a lo.$NAME
        fi
        return 0
}

So I wonder if that "not adding to resolvconf is the actual way this is fixed for you per https://askubuntu.com/a/968309/

So while the config says "do not server lo", what it really does to fix your case is not adding it. That in turn one could test.

So once you are in the bad case could you try if
$ resolvconf -a lo.dnsmasq
would resolv the issue as well?

I currently fail to grasp all the potential regressions that change could have, but I'd think that knowing if
a) not serving on 127.0.0.1
or
b) not registering in resolvconf
is the actual trigger might help someone else to go further.

Revision history for this message
James Cuzella (trinitronx) wrote :

Christian: I've tried as I believe you have asked.

When in the bad case (I comment out the line: DNSMASQ_EXCEPT=lo and then reboot):

 - Running this did not resolve the issue: `sudo resolvconf -a lo.dnsmasq`
 - Running this did not resolve the issue: `echo "nameserver 127.0.0.1" | /sbin/resolvconf -a lo.dnsmasq`

So far only the DNSMASQ_EXCEPT=lo line in /etc/default/dnsmasq appears to resolve the issue.

Revision history for this message
James Cuzella (trinitronx) wrote :

Maybe interesting things to note:

1. When in bad state:
  - The mentioned ENV var change in /etc/default/dnsmasq and `sudo systemctl restart dnsmasq` immediately resolves the issue
2. When in good state:
  - Commenting out the ENV var and running `sudo systemctl restart dnsmasq` does not immediately re-cause the issue
  - After a reboot, the bad state can be observed again (systemd-resolved using 100% of a CPU core)

Maybe there is an order of operations thing here?

Dave Chiluk (chiluk)
tags: added: indeed
Revision history for this message
Peter (peter-schmitteckert) wrote :

Adding

 DNSMASQ_EXCEPT=lo

to
 /etc/default/dnsmasq

solves the problem in my case, kubuntu 17.10, kernel 4.15.rc7,
for several days now.

Peter

Revision history for this message
Roland (roland-breedveld) wrote :

same issue with 17.10, takes a complete core,
DNSMASQ_EXCEPT=lo is not working, also after reboot, it still takes a core

Revision history for this message
Ketil Malde (ketil-ii) wrote :

Just to chime in: Noticed slow machine and systemd-resolved taking 100% cpu. Edited /etc/systemd/resolved.conf to DNSSEC=no (not off, as "no" was commented out in the file). Tried systemctl restart systemd-resolved, but nothing changed. Tried killall systemd-resolved, nothing. Tried killall -9, and it respawned a well-behaved process. I recently installed dnsmasq when my computer failed to resolve anything. This is a fairly up-to-date Ubuntu 17.10.

Revision history for this message
Amir H Piri (amirh.piri) wrote :

I also trying confront with this problem, but there is no solution at all. I tried to disable the systemd-resolved service but after that my domain resolver stop working and I lost access to webpages. I also have dnsmasq. I don't know why but I think this service is come from Ubuntu 16.04.
I upgraded my Ubuntu from 16.04 to Ubuntu 17.10 and then upgrade it to 18.04. Maybe this is why the problem is created.

Revision history for this message
Pieter (diepes) wrote :

Ubuntu 18.04.

edited $sudo vim /etc/resolv.conf

changed

nameserver 127.0.0.1

to

nameserver 127.0.0.53

CPU calmed down.

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

@diepes, interesting. Looks like there was a loop going on there.

Can everybody please check their /etc/resolv.conf to see if something similar might be happening with it?

Revision history for this message
Ricky Brent (rickybrent) wrote :

This happened on my machine, and diepes's fix worked for me as well.

Revision history for this message
Justin Bennett (justinjb) wrote :

I just started noticing this issue on my machine today. Ubuntu 18.04. I also had nameserver 127.0.0.1 in /etc/resolv.conf. Changed to 127.0.0.53, per diepes' comment and CPU calmed down for me as well.

Revision history for this message
amir (amirse) wrote :

This solution works, but it's not persistent. /etc/resolv.conf is a symlink to /run/resolvconf/resolv.conf, which is ephemeral. It'll be recreated as 127.0.0.1.
Any news on an actual fix for this, rather than just a workaround?
Thanks!

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

What do you guys have inside /etc/resolvconf? In terms of directories, files, and their contents.

Do you have the resolvconf package installed by any chance?

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

And looks like we need to find out what is adding 127.0.0.1 to resolv.conf. It could come from many different sources, like a nameserver configuration in some /etc/network/interfaces carried over from xenial, or something in network-manager, etc. I guess you could start with a brute-force grep -r on /etc for 127.0.0.1?

Revision history for this message
Darryl Browne (darryl.browne.icom) wrote :

It seems to be dnsmasq itself:

/etc/init.d/dnsmasq:163: echo "nameserver 127.0.0.1" | /sbin/resolvconf -a lo.$NAME

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I don't know what is the use case you guys have that you have to have dnsmasq and resolvconf installed. Maybe there is none and this is just an artifact from a release upgrade.

In Bionic, I only have dnsmasq-base (not dnsmasq) installed, and no resolvconf.

Looking at the initscript from dnsmasq, there seem to be a few options to disable it for local dns resolving, like the one suggested in https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1670959/comments/32, which I reproduce below:
Added this line to /etc/default/dnsmasq:

    DNSMASQ_EXCEPT=lo

Then restarting dnsmasq:

    sudo systemctl daemon-reload
    sudo systemctl restart dnsmasq

That will prevent the code quoted in https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1670959/comments/47 from running.

Revision history for this message
Romano Giannetti (romano-giannetti) wrote :

I also think it's an upgrade artifact. It happened on my upgraded laptop but not in the freshly installed desktop machine.
The problem in the laptop now (I think) hostapd that requires dnsmasq.
The workaround cited by @ahasenack in #49 seems to work.

Revision history for this message
Håkon Enger (hakon-enger) wrote :

I also had this issue after upgrading to 18.04. During the upgrade, I got asked if I wanted to keep or replace my /etc/dnsmasq.conf, I opted to replace with the package maintainer's version. The old config file had the option interface=enp0s25, the new one does not. I don't know/remember where this option came from. Could this be related?

Revision history for this message
TuxInvader (tuxinvader) wrote :

I have this on Ubuntu 18.04. In my case I use dnsmasq with a custom configuration providing DHCP and DNS to multiple bridges (Virtual Machines and containers). So I need them both (in so far as I need dnsmasq and am being forced to use systemd).

The problem is a DNS loop between systemd and dnsmasq. systemd-resolved forwards DNS queries in parallel to entries in /etc/resolv.conf and servers picked up from interfaces via DHCP. If you, like me have dnsmasq in resolv.conf, and a dnsmasq configuration that forwards to systemd, then you have a loop.

My work-around is to add `DNS=127.1.1.1` into /etc/systemd/resolved.conf. Nothing is listening on 127.1.1.1 so the queries go unanswered. The systemd resovler gets answers only from servers provided to me via DHCP or statically assigned to an interface.

Inside my dnsmasq.conf I have `server=127.0.0.53`, and resolv.conf has `nameserver 127.0.0.1` which is dnsmasq.

Revision history for this message
Karl Kastner (kastner-karl) wrote :

This annoying bug affected all upgrades to Bionic and Cosmic Cuttlefish I did so far and also effects Cosmic Cuttlefish. systemd-resolv and dnsmasq consume each 100% CPU and internet access becomes extremely slow.

My latest workaround is to disable automatic overwriting to /etc/resolf.conf with sudo dpkg-reconfigure resolvconf, and to replaced the spurious nameserver 127.0.01 in /etc/resolv.conf with nameserver 8.8.8.8

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

There is some configuration that leads to a loop between these two resolvers, and that is what causes the cpu usage. If we could get a step by step way to reproduce it, it could probably be addressed.

Revision history for this message
JuanJo Ciarlante (jjo) wrote :

Also happening to me after 16.04 -> 18.04 LTS upgrade (via do-release-upgrade)

Revision history for this message
Robie Basak (racb) wrote :

From the above comments I think it's clear that the problem is caused by some kind of misconfiguration resulting in a loop. It isn't clear if that misconfiguration is coming from a user error or from some bug in an upgrade path somewhere. If the latter, then it's a valid bug. However, since we don't have steps to reproduce the problem, I'm marking this bug as Incomplete in dnsmasq, to make it clear that developers aren't expected to be able to make any progress on this problem until someone can provide steps to reproduce.

Changed in dnsmasq (Ubuntu):
status: Confirmed → Incomplete
description: updated
Revision history for this message
JuanJo Ciarlante (jjo) wrote :

For other souls facing this "Medium" issue,
a hammer-ish workaround that works for me:

1) Run:
apt-get install cpulimit

2) edit /lib/systemd/system/systemd-resolved.service:

2a) Comment out:
#Type=notify

2b) Replace line (may want to remove the -k to let cpulimit throttle it):
#ExecStart=!!/lib/systemd/systemd-resolved
ExecStart=!!/usr/bin/cpulimit -f -q -k -l 50 -- /lib/systemd/systemd-resolved

3) Run:

systemctl daemon-reload
systemctl restart systemd-resolved

Revision history for this message
Jasem Mutlaq (mutlaqja) wrote :

I'm affected by this bug, but using dnsmasq-base employed by NetworkManager and not dnsmasq package. The dnsmasq is what is consuming one core of the CPU 100% after a few minutes, not systemd-resolved. After heating the CPU for a few minutes, it drops again and the cycle repeats after a while. I keep seeing these errors:

Maximum number of concurrent DNS queries reached (max: 150)
failed to send packet: Resource temporarily unavailable

This is on a fresh 18.04 installation, not an upgrade.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Jasem,
it seems you are hitting a similar but not the necessarily same issue.
As outlined before all discussions here ended as incomplete unable to be actioned and needs clear steps to reproduce to continue.

I'd ask you to file a new bug for your case and there provide as much info as possible how to recreate your case.

Revision history for this message
Ali Baghernejad (alibaghernejad) wrote :

Same issue on Ubuntu 18.4, that occurred after system startup.

Linux ali-GE620DX 4.15.0-51-generic #55-Ubuntu SMP Wed May 15 14:27:21 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message
Dan Streetman (ddstreet) wrote :

please reopen if this is still an issue

Changed in systemd (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for dnsmasq (Ubuntu) because there has been no activity for 60 days.]

Changed in dnsmasq (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.