systemd-resolve crashes fairly often (and reports various assertions)

Bug #1906331 reported by Mekk
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
Medium
Unassigned
Focal
Incomplete
Medium
Unassigned
Groovy
Won't Fix
Medium
Unassigned
Hirsute
Fix Released
Medium
Unassigned

Bug Description

[impact]

systemd-resolved crashes

[test case]

see original description; I can't reproduce so I'm relying on the reporter(s) to test/verify.

[regression potential]

any regression would likely occur while processing sd_event objects, which are used throughout systemd code; this could result in crashes in almost any part of systemd code. However a more likely regression would be leaks of sd_event objects due to failure to release the final ref for an object.

[scope]

This is needed for f/g/h

This might be fixed by upstream commit f814c871e65df8552a055dd887bc94b074037833; if so, that commit isn't included in any systemd release yet, and so is needed in h and earlier.

[other info]

I believe this is caused by a freed sd_event object that is then processed and calls the on_query_timeout callback with invalid state, leading to failed assertion, which causes resolved to crash; that's what analysis of the crash dump appears to indicate. This may be fixed by the upstream commit referenced in [scope], which takes additional refs during function calls. However I haven't reproduced this myself, so I'm only guessing as to the cause and solution at this point.

I'm unsure why this would not occur in bionic, but per comment 5 it seems it doesn't happen in that release.

[original description]

(Tested on regularly updated Ubuntu 20.04, currently i use systemd 245.4-4ubuntu3.2)

I observe fairly lot of segfaults of systemd-resolve. Frequency vary but … see below.

I have no clue what is the reason. Specific feature of my machine is that apart from normal cable connection (to OpenWRT router) I use OpenVPN for business network (and this submits specific nameserver for myorg.local domain).

~~~~~
$ LC_ALL=C dmesg -T --level=info | grep systemd-resolve
[Sun Nov 29 11:47:37 2020] systemd-resolve[1629307]: segfault at 190eed7bdc6 ip 00007fd98f771dc9 sp 00007ffc2352a100 error 4 in libsystemd-shared-245.so[7fd98f74c000+16e000]
[Sun Nov 29 11:57:27 2020] systemd-resolve[1629787]: segfault at 1f ip 000055ab7b0cb686 sp 00007fff78ce4bd0 error 4 in systemd-resolved[55ab7b0a4000+3e000]
[Sun Nov 29 12:07:37 2020] systemd-resolve[1630481]: segfault at 191 ip 000055ca69fed91c sp 00007ffc4d757dc0 error 6 in systemd-resolved[55ca69fc2000+3e000]
[Sun Nov 29 13:12:26 2020] systemd-resolve[1638829]: segfault at 19224162371 ip 00007fc1bc9b9dc9 sp 00007ffc21378170 error 4 in libsystemd-shared-245.so[7fc1bc994000+16e000]
[Sun Nov 29 13:32:57 2020] systemd-resolve[1639886]: segfault at 1926d8126d3 ip 00007f7ed17e9dc9 sp 00007ffda2cea0b0 error 4 in libsystemd-shared-245.so[7f7ed17c4000+16e000]
[Sun Nov 29 13:42:37 2020] systemd-resolve[1640246]: segfault at 61 ip 0000558d992e2686 sp 00007fff08906af0 error 4 in systemd-resolved[558d992bb000+3e000]
[Sun Nov 29 15:42:26 2020] systemd-resolve[1645397]: segfault at 1943c92afc7 ip 00007fd4c1721dc9 sp 00007fff25259ce0 error 4 in libsystemd-shared-245.so[7fd4c16fc000+16e000]
[Sun Nov 29 16:02:36 2020] systemd-resolve[1646052]: segfault at 1947ecb3726 ip 00007f1008549dc9 sp 00007fff44a6db70 error 4 in libsystemd-shared-245.so[7f1008524000+16e000]
[Sun Nov 29 17:42:35 2020] systemd-resolve[1649403]: segfault at 71 ip 000055a37fe5a686 sp 00007ffd9a160440 error 4 in systemd-resolved[55a37fe33000+3e000]
[Sun Nov 29 17:52:35 2020] systemd-resolve[1649759]: segfault at 558d292947d0 ip 0000558d292947d0 sp 00007ffec7ab3bf8 error 15
[Sun Nov 29 19:17:55 2020] systemd-resolve[1652349]: segfault at 558995b77cf0 ip 0000558995b77cf0 sp 00007ffe545ae4a8 error 15
[Sun Nov 29 19:32:35 2020] systemd-resolve[1652640]: segfault at 19773c20194 ip 00007f66bb529dc9 sp 00007fffd7066fc0 error 4 in libsystemd-shared-245.so[7f66bb504000+16e000]
[Sun Nov 29 20:03:54 2020] systemd-resolve[1653715]: segfault at 197e3aee918 ip 00007fdc40b51dc9 sp 00007ffde484fbf0 error 4 in libsystemd-shared-245.so[7fdc40b2c000+16e000]
[Sun Nov 29 20:22:24 2020] systemd-resolve[1654540]: segfault at 19820a05297 ip 00007f6a92839dc9 sp 00007ffe4ba00440 error 4 in libsystemd-shared-245.so[7f6a92814000+16e000]
[Sun Nov 29 21:13:10 2020] systemd-resolve[1660272]: segfault at 555f9a5915e0 ip 0000555f9a5915e0 sp 00007fff053e5e68 error 15
[Sun Nov 29 21:32:34 2020] systemd-resolve[1661026]: segfault at 1991af73f2e ip 00007ff194021dc9 sp 00007fffa6d61680 error 4 in libsystemd-shared-245.so[7ff193ffc000+16e000]
[Sun Nov 29 22:03:20 2020] systemd-resolve[1661941]: segfault at 5625966828e0 ip 00005625966828e0 sp 00007ffdf5a8bb48 error 15
[Sun Nov 29 22:32:44 2020] systemd-resolve[1662604]: segfault at 199f18ae01d ip 00007f457c9d1dc9 sp 00007ffc62b80ef0 error 4 in libsystemd-shared-245.so[7f457c9ac000+16e000]
[Sun Nov 29 23:12:23 2020] systemd-resolve[1664072]: segfault at 73b8 ip 0000562619f8c93a sp 00007ffd527b7ef0 error 6 in systemd-resolved[562619f61000+3e000]
[Sun Nov 29 23:22:34 2020] systemd-resolve[1664423]: segfault at 19aaa4d4c00 ip 00007f2621539dc9 sp 00007ffc73102280 error 4 in libsystemd-shared-245.so[7f2621514000+16e000]
[Mon Nov 30 00:12:23 2020] systemd-resolve[1666158]: segfault at 19b5c72000a ip 00007f530b5c1dc9 sp 00007ffc6007ccf0 error 4 in libsystemd-shared-245.so[7f530b59c000+16e000]
[Mon Nov 30 00:47:54 2020] systemd-resolve[1667280]: segfault at 100000036 ip 00007f0736b8bbe8 sp 00007fffed4d3cb0 error 4 in libsystemd-shared-245.so[7f0736acc000+16e000]
[Mon Nov 30 01:57:53 2020] systemd-resolve[1669463]: segfault at 558d6b61c0c0 ip 0000558d6b61c0c0 sp 00007ffc68df7198 error 15
[Mon Nov 30 02:58:08 2020] traps: systemd-resolve[1672553] general protection fault ip:55b967d86760 sp:7fffaecf4468 error:0 in systemd-resolved[55b967d5f000+3e000]
[Mon Nov 30 03:38:08 2020] systemd-resolve[1673682]: segfault at 19e3c4d5050 ip 00007fdf0ba29dc9 sp 00007ffe4d561430 error 4 in libsystemd-shared-245.so[7fdf0ba04000+16e000]
[Mon Nov 30 05:07:22 2020] systemd-resolve[1681387]: segfault at 7f7e31c39c10 ip 00007f7e31c39c10 sp 00007ffe5ad31f58 error 15 in libc-2.31.so[7f7e31c39000+3000]
[Mon Nov 30 05:47:43 2020] systemd-resolve[1682761]: segfault at 1a00b86dff0 ip 00007f69066c9dc9 sp 00007fff079c4fd0 error 4 in libsystemd-shared-245.so[7f69066a4000+16e000]
[Mon Nov 30 06:17:42 2020] systemd-resolve[1684158]: segfault at 1a070831ad3 ip 00007f308aa19dc9 sp 00007fffa7de12e0 error 4 in libsystemd-shared-245.so[7f308a9f4000+16e000]
[Mon Nov 30 06:27:21 2020] systemd-resolve[1684457]: segfault at 1a094465eb1 ip 00007f751c9b9dc9 sp 00007ffee6452930 error 4 in libsystemd-shared-245.so[7f751c994000+16e000]
[Mon Nov 30 07:12:21 2020] systemd-resolve[1685888]: segfault at 1a13a2db693 ip 00007f7dfb029dc9 sp 00007ffeb5f30600 error 4 in libsystemd-shared-245.so[7f7dfb004000+16e000]
[Mon Nov 30 08:17:21 2020] traps: systemd-resolve[1688510] general protection fault ip:559181bc2760 sp:7fff2ea2f1d8 error:0 in systemd-resolved[559181b9b000+3e000]
[Mon Nov 30 08:27:31 2020] systemd-resolve[1691391]: segfault at 1a2475be1c2 ip 00007fd38b271dc9 sp 00007ffdc116dad0 error 4 in libsystemd-shared-245.so[7fd38b24c000+16e000]
[Mon Nov 30 08:58:25 2020] systemd-resolve[1692357]: segfault at 1a2af5e7f63 ip 00007fbae5c81dc9 sp 00007ffdc36e31a0 error 4 in libsystemd-shared-245.so[7fbae5c5c000+16e000]
[Mon Nov 30 10:02:20 2020] systemd-resolve[1697183]: segfault at 208 ip 000055f232f3793a sp 00007ffc9fad5230 error 6 in systemd-resolved[55f232f0c000+3e000]
[Mon Nov 30 10:12:20 2020] systemd-resolve[1697500]: segfault at 1a3be1bb4b4 ip 00007fbd22749dc9 sp 00007ffec3404b30 error 4 in libsystemd-shared-245.so[7fbd22724000+16e000]
[Mon Nov 30 11:17:30 2020] systemd-resolve[1701684]: segfault at 1a4a6e7be98 ip 00007f8a6c801dc9 sp 00007ffea4117850 error 4 in libsystemd-shared-245.so[7f8a6c7dc000+16e000]
[Mon Nov 30 12:58:18 2020] systemd-resolve[1707203]: segfault at ffffffff00000020 ip 000055bdf867d686 sp 00007ffe2d636bb0 error 5 in systemd-resolved[55bdf8656000+3e000]
[Mon Nov 30 14:25:41 2020] systemd-resolve[1710420]: segfault at 1a7426fba1a ip 00007f2a1e869dc9 sp 00007ffec5869d40 error 4 in libsystemd-shared-245.so[7f2a1e844000+16e000]
[Mon Nov 30 14:52:39 2020] systemd-resolve[1712923]: segfault at 55f7d8f636e8 ip 000055f7d8f636e8 sp 00007ffce3c1f528 error 15
[Mon Nov 30 15:02:29 2020] systemd-resolve[1713448]: segfault at 1a7c60ef1a0 ip 00007f173d8f1dc9 sp 00007ffc715163d0 error 4 in libsystemd-shared-245.so[7f173d8cc000+16e000]
[Mon Nov 30 15:14:18 2020] systemd-resolve[1714394]: segfault at 54 ip 00007f3806f89dc9 sp 00007ffea68d70e0 error 4 in libsystemd-shared-245.so[7f3806f64000+16e000]
[Mon Nov 30 15:42:18 2020] systemd-resolve[1715906]: segfault at 100000003 ip 00007fe65ee59dc9 sp 00007ffe0f486460 error 4 in libsystemd-shared-245.so[7fe65ee34000+16e000]
[Mon Nov 30 16:02:39 2020] systemd-resolve[1717460]: segfault at 7 ip 00007f5a4de19dc9 sp 00007ffd1fa46490 error 4 in libsystemd-shared-245.so[7f5a4ddf4000+16e000]
[Mon Nov 30 16:22:28 2020] systemd-resolve[1719182]: segfault at 1a8e42924e4 ip 00007faa5c251dc9 sp 00007ffedcf4c120 error 4 in libsystemd-shared-245.so[7faa5c22c000+16e000]
[Mon Nov 30 16:37:18 2020] systemd-resolve[1719462]: segfault at 1f ip 0000557e68947686 sp 00007ffd6ec75570 error 4 in systemd-resolved[557e68920000+3e000]
[Mon Nov 30 17:43:02 2020] traps: systemd-resolve[1722859] general protection fault ip:5575fa1388d4 sp:7ffd0835bce0 error:0 in systemd-resolved[5575fa10d000+3e000]
[Mon Nov 30 18:12:48 2020] systemd-resolve[1725814]: segfault at 55b69de577c8 ip 000055b69de577c8 sp 00007ffee9d70f18 error 15
[Mon Nov 30 18:37:38 2020] systemd-resolve[1727788]: segfault at 74756f666d ip 00005563cd41091c sp 00007ffd51addc90 error 6 in systemd-resolved[5563cd3e5000+3e000]
[Mon Nov 30 18:48:02 2020] systemd-resolve[1728444]: segfault at 308 ip 000055b57aa0b93a sp 00007ffeefd7f400 error 6 in systemd-resolved[55b57a9e0000+3e000]
[Mon Nov 30 19:32:28 2020] systemd-resolve[1730898]: segfault at 5648012b4770 ip 00005648012b4770 sp 00007ffea0bd9e18 error 15
[Mon Nov 30 19:53:01 2020] systemd-resolve[1732347]: segfault at ffffffff00000020 ip 000055b0ace66686 sp 00007ffc73f3c9d0 error 5 in systemd-resolved[55b0ace3f000+3e000]
[Mon Nov 30 19:59:17 2020] systemd-resolve[1733001]: segfault at 52dd6b26 ip 00007fafebdf9dc9 sp 00007ffd591cd2a0 error 4 in libsystemd-shared-245.so[7fafebdd4000+16e000]
[Mon Nov 30 20:27:17 2020] systemd-resolve[1735033]: segfault at 1ac55647a0b ip 00007fb26ea49dc9 sp 00007ffdd481ff30 error 4 in libsystemd-shared-245.so[7fb26ea24000+16e000]
[Mon Nov 30 22:42:37 2020] systemd-resolve[1746028]: segfault at 1ae39b02ae4 ip 00007f82f18d9dc9 sp 00007ffc088615b0 error 4 in libsystemd-shared-245.so[7f82f18b4000+16e000]
[Mon Nov 30 22:57:36 2020] systemd-resolve[1746999]: segfault at 1ae6963116c ip 00007f85cc831dc9 sp 00007ffce6f3f5f0 error 4 in libsystemd-shared-245.so[7f85cc80c000+16e000]
[Tue Dec 1 00:22:16 2020] traps: systemd-resolve[1862379] general protection fault ip:55fb9c6b3760 sp:7fff017519a8 error:0 in systemd-resolved[55fb9c68c000+3e000]
[Tue Dec 1 00:57:16 2020] systemd-resolve[1905008]: segfault at 1b015f1c2e7 ip 00007f47185b1dc9 sp 00007ffd899bd1a0 error 4 in libsystemd-shared-245.so[7f471858c000+16e000]
~~~~~

I also noted various interesting assertions in journal, in next comment…

Revision history for this message
Mekk (marcin-kasperski) wrote :

systemd-resolve journal is fairly full of failed assertions. During last 3 days I got 301 of them.

This one is very frequent (I got it 294 times)
~~~~~
Nov 28 21:10:02 platon systemd-resolved[1590676]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:520, function dns_query_complete(). Aborting.
~~~~~

The rest:
~~~~~
Nov 28 03:35:02 platon systemd-resolved[1542718]: Assertion '*_head == _item' failed at src/resolve/resolved-dns-query.c:397, function dns_query_free(). Aborting.
Nov 28 04:20:02 platon systemd-resolved[1546490]: Assertion 'p->n_ref > 0' failed at src/resolve/resolved-dns-question.c:33, function dns_question_unref(). Aborting.
Nov 28 14:20:23 platon systemd-resolved[1572471]: Assertion '*_head == _item' failed at src/resolve/resolved-dns-query.c:397, function dns_query_free(). Aborting.
Nov 28 19:34:52 platon systemd-resolved[1587570]: Assertion '*_head == _item' failed at src/resolve/resolved-dns-query.c:372, function dns_query_free(). Aborting.
Nov 29 18:59:52 platon systemd-resolved[1651715]: Assertion 'p->n_ref > 0' failed at src/libsystemd/sd-event/sd-event.c:1912, function sd_event_source_unref(). Aborting.
Nov 30 14:35:23 platon systemd-resolved[1710789]: Assertion '*_head == _item' failed at src/resolve/resolved-dns-query.c:397, function dns_query_free(). Aborting.
Nov 30 18:29:52 platon systemd-resolved[1726691]: Assertion 'p->n_ref > 0' failed at src/libsystemd/sd-event/sd-event.c:1912, function sd_event_source_unref(). Aborting.
~~~~~

Revision history for this message
Mekk (marcin-kasperski) wrote :

The machine as such works without much problems, from time to time is under heavy load (make -j4 and such…) but I use it as my work desktop without noticeable problems. Network also works.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in systemd (Ubuntu):
status: New → Confirmed
Revision history for this message
Andreas Bühmann (buehmann) wrote :
Download full text (3.3 KiB)

I seem to experience the same crashes with systemd 246.6-1ubuntu1 in Ubuntu 20.10.

Maybe this is just a coincidence, but my nearest upstream router and DNS server is also a OpenWRT box.

----
root@nb:~# LC_ALL=C dmesg -T --level=info | grep systemd-resolve
[Sun Jan 3 18:16:16 2021] systemd-resolve[111337]: segfault at bd4fc2b1 ip 00007fd30bfcce2d sp 00007fffe20fc600 error 4 in libsystemd-shared-246.so[7fd30be7c000+183000]
[Sun Jan 3 18:24:52 2021] systemd-resolve[113203]: segfault at 2938c5e25a ip 00007f411794fe2d sp 00007ffdf9cb4090 error 4 in libsystemd-shared-246.so[7f41177ff000+183000]
[Sun Jan 3 19:19:31 2021] systemd-resolve[128158]: segfault at ce4afa92 ip 00007fa333aace2d sp 00007ffd7a9fad80 error 4 in libsystemd-shared-246.so[7fa33395c000+183000]
[Sun Jan 3 19:31:09 2021] systemd-resolve[130559]: segfault at 208 ip 000055f0f07a88de sp 00007ffdb6a507b0 error 6 in systemd-resolved[55f0f078e000+43000]
[Sun Jan 3 20:07:40 2021] systemd-resolve[140293]: segfault at 1600000000 ip 0000001600000000 sp 00007ffdf00dc878 error 14 in systemd-resolved[5641b5699000+a000]
[Sun Jan 3 20:30:54 2021] systemd-resolve[145920]: segfault at 100000003 ip 00007f2ef6e2ee2d sp 00007ffee0887750 error 4 in libsystemd-shared-246.so[7f2ef6cde000+183000]
[Sun Jan 3 20:42:32 2021] systemd-resolve[148365]: segfault at cd29ee30 ip 00007fecea8cce2d sp 00007ffcfb1d0f10 error 4 in libsystemd-shared-246.so[7fecea77c000+183000]
[Sun Jan 3 20:52:42 2021] systemd-resolve[151159]: segfault at 208 ip 00005556f9cec8de sp 00007ffd290577b0 error 6 in systemd-resolved[5556f9cd2000+43000]
----
root@nb:~# journalctl -u systemd-resolved.service -g Assertion | tail
Jan 03 14:37:37 nb systemd-resolved[115214]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Jan 03 14:43:02 nb systemd-resolved[118852]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Jan 03 14:47:40 nb systemd-resolved[120689]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Jan 03 14:58:03 nb systemd-resolved[123746]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Jan 03 15:08:12 nb systemd-resolved[125963]: Assertion '*_head == _item' failed at src/resolve/resolved-dns-query.c:331, function dns_query_free(). Aborting.
Jan 03 19:39:20 nb systemd-resolved[133292]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Jan 03 19:52:41 nb systemd-resolved[136857]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Jan 03 20:19:23 nb systemd-resolved[143423]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Jan 03 21:00:18 nb systemd-resolved[155101]: Assertion '*_head == _item' failed at src/resolve/resolved-dns-query.c:331, ...

Read more...

Revision history for this message
Jim MacKenzie (jim-photojim) wrote :

This bug affects me, too, on an amd64 system after an in-place do-release-upgrade upgrade from 18.04 LTS yesterday.

I have no interactions with an OpenWRT box, though.

My config is currently a WiFi connection away from home, with an automatic OpenVPN tunnel to my router at home, using split DNS to resolve a private .prv domain for my home network.

The problem did not exist with 18.04 LTS.

Revision history for this message
Dan Streetman (ddstreet) wrote :

can anyone attach a coredump from a crashed systemd-resolved?

Changed in systemd (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Jarryd Keir (jkeir) wrote :

This bug affects me too - I have attached the crash file

❰jkeir❙~❱✘≻ sudo LC_ALL=C dmesg -T --level=info | grep systemd-resolve 12:44:13
[Thu Feb 11 15:02:39 2021] systemd-resolve[79914]: segfault at 22e7ea78de ip 00007f5ac34d3eed sp 00007fffb411e9a0 error 4 in libsystemd-shared-246.so[7f5ac3383000+183000]
❰jkeir❙~❱✔≻ journalctl -u systemd-resolved.service -g Assertion | tail 12:46:16
Feb 09 08:33:52 x1 systemd-resolved[306534]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Feb 09 14:38:06 x1 systemd-resolved[310807]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Feb 09 15:13:59 x1 systemd-resolved[358205]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
-- Reboot --
-- Reboot --
Feb 10 15:03:18 x1 systemd-resolved[1516]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Feb 11 19:02:09 x1 systemd-resolved[174016]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Feb 11 23:58:38 x1 systemd-resolved[194154]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Feb 12 10:37:00 x1 systemd-resolved[226148]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.
Feb 12 12:34:16 x1 systemd-resolved[291595]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:479, function dns_query_complete(). Aborting.

Dan Streetman (ddstreet)
Changed in systemd (Ubuntu):
status: Incomplete → In Progress
Revision history for this message
Dan Streetman (ddstreet) wrote :

If this is reproducable for anyone affected, can you test with the systemd build from this ppa:
https://launchpad.net/~ddstreet/+archive/ubuntu/systemd

Dan Streetman (ddstreet)
description: updated
Changed in systemd (Ubuntu Groovy):
status: New → In Progress
Changed in systemd (Ubuntu Focal):
status: New → In Progress
Changed in systemd (Ubuntu Groovy):
importance: Undecided → Medium
Changed in systemd (Ubuntu Hirsute):
importance: Undecided → Medium
Changed in systemd (Ubuntu Focal):
importance: Undecided → Medium
Changed in systemd (Ubuntu Hirsute):
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Focal):
assignee: nobody → Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Groovy):
assignee: nobody → Dan Streetman (ddstreet)
Dan Streetman (ddstreet)
description: updated
Revision history for this message
Dan Streetman (ddstreet) wrote :

@marcin-kasperski, @buehmann, @jim-photojim, @jkeir, can any of you please test with the build from the ppa from comment 8?

Revision history for this message
Jarryd Keir (jkeir) wrote : Re: [Bug 1906331] Re: systemd-resolve crashes fairly often (and reports various assertions)
Download full text (12.4 KiB)

Hi Dan,

I will test the patch today and report back to you

Thanks,
J

On Wed, 3 Mar 2021, 5:51 am Dan Streetman, <email address hidden>
wrote:

> @marcin-kasperski, @buehmann, @jim-photojim, @jkeir, can any of you
> please test with the build from the ppa from comment 8?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1906331
>
> Title:
> systemd-resolve crashes fairly often (and reports various assertions)
>
> Status in systemd package in Ubuntu:
> In Progress
> Status in systemd source package in Focal:
> In Progress
> Status in systemd source package in Groovy:
> In Progress
> Status in systemd source package in Hirsute:
> In Progress
>
> Bug description:
> [impact]
>
> systemd-resolved crashes
>
> [test case]
>
> see original description; I can't reproduce so I'm relying on the
> reporter(s) to test/verify.
>
> [regression potential]
>
> any regression would likely occur while processing sd_event objects,
> which are used throughout systemd code; this could result in crashes
> in almost any part of systemd code. However a more likely regression
> would be leaks of sd_event objects due to failure to release the final
> ref for an object.
>
> [scope]
>
> This is needed for f/g/h
>
> This might be fixed by upstream commit
> f814c871e65df8552a055dd887bc94b074037833; if so, that commit isn't
> included in any systemd release yet, and so is needed in h and
> earlier.
>
> [other info]
>
> I believe this is caused by a freed sd_event object that is then
> processed and calls the on_query_timeout callback with invalid state,
> leading to failed assertion, which causes resolved to crash; that's
> what analysis of the crash dump appears to indicate. This may be fixed
> by the upstream commit referenced in [scope], which takes additional
> refs during function calls. However I haven't reproduced this myself,
> so I'm only guessing as to the cause and solution at this point.
>
> I'm unsure why this would not occur in bionic, but per comment 5 it
> seems it doesn't happen in that release.
>
> [original description]
>
> (Tested on regularly updated Ubuntu 20.04, currently i use systemd
> 245.4-4ubuntu3.2)
>
> I observe fairly lot of segfaults of systemd-resolve. Frequency vary
> but … see below.
>
> I have no clue what is the reason. Specific feature of my machine is
> that apart from normal cable connection (to OpenWRT router) I use
> OpenVPN for business network (and this submits specific nameserver for
> myorg.local domain).
>
> ~~~~~
> $ LC_ALL=C dmesg -T --level=info | grep systemd-resolve
> [Sun Nov 29 11:47:37 2020] systemd-resolve[1629307]: segfault at
> 190eed7bdc6 ip 00007fd98f771dc9 sp 00007ffc2352a100 error 4 in
> libsystemd-shared-245.so[7fd98f74c000+16e000]
> [Sun Nov 29 11:57:27 2020] systemd-resolve[1629787]: segfault at 1f ip
> 000055ab7b0cb686 sp 00007fff78ce4bd0 error 4 in
> systemd-resolved[55ab7b0a4000+3e000]
> [Sun Nov 29 12:07:37 2020] systemd-resolve[1630481]: segfault at 191 ip
> 000055ca69fed91c sp 00007ffc4d757dc0 error 6 in
> systemd-resolved[55ca69fc...

Revision history for this message
Mekk (marcin-kasperski) wrote :

Seems it did not help too much.

I installed PPA version from comment 8, and ended up rebooting about an hour ago. Since then 3 core dumps.

Two of those were preceded with failed assertion quoted below:

mar 03 18:56:12 platon systemd-resolved[9217]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)'
failed at src/resolve/resolved-dns-query.c:520, function dns_query_complete(). Aborting.
mar 03 18:56:13 platon systemd[1]: systemd-resolved.service: Main process exited, code=dumped
, status=6/ABRT
-- Subject: Proces jednostki zakończył działanie
-- Defined-By: systemd
-- Support: http://www.ubuntu.com/support
--
-- Proces ExecStart= należący do jednostki systemd-resolved.service zakończył działanie.
--
-- Kod wyjścia procesu: „dumped”, jego stan wyjścia: 6.
mar 03 18:56:13 platon systemd[1]: systemd-resolved.service: Failed with result 'core-dump'.

Revision history for this message
Mekk (marcin-kasperski) wrote :

Above obtained with:

$ apt-cache policy systemd
systemd:
  Zainstalowana: 245.4-4ubuntu3.5~202103031348~ubuntu20.04.1

Revision history for this message
Mekk (marcin-kasperski) wrote :

Since I wrote above, two more crashes. I enable apport, so last one created something:

$ sudo ls -al /var/crash/_lib_systemd_systemd-resolved.145.crash
-rw-r----- 1 systemd-resolve whoopsie 962231 lut 27 07:38 /var/crash/_lib_systemd_systemd-resolved.145.crash

can I use this file to provide valuable info?

Revision history for this message
Mekk (marcin-kasperski) wrote :

Ups, rollback, that's some old file, misread.

Revision history for this message
Dan Streetman (ddstreet) wrote :

that's unfortunate; if you're able to gather a new crashdump, it would help.

Revision history for this message
Mekk (marcin-kasperski) wrote :

systemd-resolved still crashing more-or-less every 10 minutes.

Some update arrived (I installed it 15 minutes ago) - 245.4-4ubuntu3.5~202103051349~ubuntu20.04.1 - it also crashed few mins since the installation.

I tried various approaches to kernel.core_pattern, but nothing is gathered. As I understand, the way it aborts itself doesn't lead to core creation.

~~~~~~~~

There are few failing assertions leading to the failure (stats for the version mentioned in #12)

a) Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:520, function dns_query_complete(). Aborting

   Most frequent one, 753 times this week

b) Assertion '*_head == _item' failed at src/resolve/resolved-dns-query.c:372, function dns_query_free(). Aborting.

   49 times this week

c) Assertion 'p->n_ref > 0' failed at src/libsystemd/sd-event/sd-event.c:1912, function sd_event_source_unref(). Aborting.

   9 times this week

d) Assertion 'q->auxiliary_for->n_auxiliary_queries > 0' failed at src/resolve/resolved-dns-query.c:370, function dns_query_free(). Aborting

   6 times this week

e) Assertion 'p->n_ref > 0' failed at src/resolve/resolved-dns-question.c:33, function dns_question_unref(). Aborting.

   2 times this week

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 247.3-1ubuntu4

---------------
systemd (247.3-1ubuntu4) hirsute; urgency=medium

  [ Dimitri John Ledkov ]
  * d/p/debian/UBUNTU-resolved-Mitigate-DVE-2018-0001-by-retrying-NXDOMAIN-with.patch:
    Patch updated to reduce log level to debug
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=299002546ec2d62e7f0dd7d614ba958fc9df83c2

  [ Dan Streetman ]
  * d/p/lp1906331-sd-event-ref-event-loop-while-in-sd_event_prepare-ot.patch:
    Take event reference while processing (LP: #1906331)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=1bc38abcd3b62d317fcb62b72e26d9cb2e35ccf9
  * d/p/lp1917458-udev-rules-add-rule-to-create-dev-ptp_hyperv.patch:
    Create symlink for hyperv-provided ptp device (LP: #1917458)
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=8f1ee790ad66395457ca64cb5f8a01fdd8aabe47

  [ Balint Reczey ]
  * Pick proposed patch for not returning early in udevadm (LP: #1914062)
    File: debian/patches/lp1914062-udevadm-don-t-return-early.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=d8c80751a97b0c6c4df972f6f8325293aa1607c4
  * debian/tests/control: Mark systemd-fsckd flaky again.
    As promised in LP: 1915126, until further investigation.
    File: debian/tests/control
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=68fbaab272af81aab29497f7c6a3e4e6e9aa091b

 -- Balint Reczey <email address hidden> Thu, 04 Mar 2021 12:19:05 +0100

Changed in systemd (Ubuntu Hirsute):
status: In Progress → Fix Released
Revision history for this message
Mekk (marcin-kasperski) wrote :

Do you plan releasing this version on PPA (for 20.04, which I use)? I'd be glad to test…

Revision history for this message
Dan Streetman (ddstreet) wrote :

> Do you plan releasing this version on PPA

that was for the hirsute release - are you able to test that (development) release to ensure the problem is fixed? it's the same patches as for focal, so it may not be fixed there.

Alternately if you have a test environment (NOT some system that you care about, as it may break) can you test the latest upstream systemd code from this ppa:
https://launchpad.net/~ubuntu-support-team/+archive/ubuntu/systemd

Revision history for this message
Mekk (marcin-kasperski) wrote :

I use my computer for work and need reasonably stable environment. So no, sorry. I am OK with installing newer systemd, testing it, and maybe downgrading in case of (new) problems, but I can't upgrade whole distro just now.

Would using PPA from comment 19 be as limited as PPA from comment 8 (systemd only, without messing with other packages)?

Revision history for this message
Dan Streetman (ddstreet) wrote :

> Would using PPA from comment 19 be as limited as PPA from comment 8
> (systemd only, without messing with other packages)?

It's only systemd and debhelper version 13, but I understand if you don't want to install on your system, as the build from that ppa is completely untested (it's just a daily build of upstream systemd code). I would not recommend installing it on a system that you aren't ok with breaking.

Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Groovy):
assignee: Dan Streetman (ddstreet) → nobody
Changed in systemd (Ubuntu Focal):
assignee: Dan Streetman (ddstreet) → nobody
Changed in systemd (Ubuntu Groovy):
status: In Progress → Incomplete
Changed in systemd (Ubuntu Focal):
status: In Progress → Incomplete
Dan Streetman (ddstreet)
Changed in systemd (Ubuntu Groovy):
status: Incomplete → Won't Fix
Revision history for this message
Mekk (marcin-kasperski) wrote :

Hmm, hmm. Maybe I found something?

My systemd-resolve which steadily crashed every 3-6 minutes for years now survived
full 12 minutes since last restart. And it started to resolve unknown names quickly instead of lagging on them for 10s

The change?

   sudo apt remove resolvconf

(this removal in particular changed /etc/resolv.conf symlink from /run/resolvconf/resolv.conf to /run/systemd/resolve/stub-resolv.conf)

Revision history for this message
Mekk (marcin-kasperski) wrote :

… so if there is some clash between the two packages, mayhaps they should conflict?

PS For the sake of history: this is very old system which I use since 2009 (starting from
Ubuntu Karmic and upgrading from LTS to LTS). Resolvconf was there since 2012 and I simply
didn't know that it should be removed (should it?) after upgrade which brought systemd-resolved.

PS2 systemd-resolved still running OK, already 20 minutes.

Revision history for this message
Mekk (marcin-kasperski) wrote :

More than 12 hours now (since `apt remove resolvconf`).

systemd-resolved still running without crash

Revision history for this message
Mekk (marcin-kasperski) wrote (last edit ):

To summarize:

1. Removing resolvconf looks like practical solution for people who face the problem

2. I do not know, whether crashes in such a case are natural and expected, or
    they simply trigger some buggy behaviour unlikely in „normal” situation.
   Depending on that either there should be some installation-level protection
  (like Conflict between packages) … or this is the way to reproduce.

Dan Streetman (ddstreet)
Changed in systemd (Ubuntu):
assignee: Dan Streetman (ddstreet) → nobody
Changed in systemd (Ubuntu Hirsute):
assignee: Dan Streetman (ddstreet) → nobody
Revision history for this message
Yaroslav (glodov) wrote :

What if I need it for NS server?
I cannot remove resolvconf. :(

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.