systemd-resolved crashes due to use-after-free bug

Bug #2012943 reported by Naveen chand
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Fix Released
Undecided
Unassigned
Focal
Fix Released
Undecided
Unassigned

Bug Description

[ Impact ]

The continuous systemd-resolved crashes delay/hang the device startup.
And this leads to unresponsive devices in the system. Specifically, the crash looks like:

Dec 16 12:51:21 TREND-24-AF-7A systemd[1]: Started Time & Date Service.
Dec 16 12:51:24 TREND-24-AF-7A systemd[1]: systemd-resolved.service: Main process exited, code=killed, status=11/SEGV
[...]
Dec 16 12:53:47 TREND-24-AF-7A systemd-resolved[2591]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:520, function dns_query_complete(). Aborting.
Dec 16 12:53:47 TREND-24-AF-7A systemd[1]: systemd-resolved.service: Main process exited, code=killed, status=6/ABRT

[ Test Plan ]

The exact steps to reproduce this issue are still not known.
But we see this crash only in Static IP Addressing mode enabled, where systemd-resolved is enabled for LLMNR service.
But we were not able to see this crash in DHCP mode.

Steps to reproduce:
1) Powercycle the device.
2) Soft-reboot.

It was also pointed out by Brian Murray that this error in the Ubuntu error tracker is likely the same bug: https://errors.ubuntu.com/problem/3cb08ae5efaa4d8c6ce992f7cebd2751ae3f168f. Therefore, we would expect to stop seeing this error in the tracker as a result of this patch.

[ Where problems could occur ]

The patch[1] simply disables the timer event source for a DNS query when the struct representing that query is free'd. I cannot see any realistic regression potential, because if the timer event fired on the DNS query after it has been free'd, then that would be this bug. I.e. no working code should be relying on the timer event source still being around after the query is free'd.

[1] https://github.com/systemd/systemd/commit/73bfd7be042cc63e7649242b377ad494bf74ea4b

Related branches

Revision history for this message
Nick Rosbrook (enr0n) wrote :

From what I can tell, this patch is present in Jammy and newer, but not Focal.

Changed in systemd (Ubuntu):
status: New → Fix Released
Changed in systemd (Ubuntu Focal):
status: New → Triaged
Revision history for this message
Nick Rosbrook (enr0n) wrote :

I have prepared the fix for this in git because I have more context from other discussions, but can you please add a bit more detail here? E.g. at least provide the stack trace you have observed which suggests the above patch is correct?

Revision history for this message
Naveen chand (h413048) wrote :

The system is hung due to system-resolved crash

```
Dec 16 12:51:21 TREND-24-AF-7A systemd[1]: Started Time & Date Service.
Dec 16 12:51:24 TREND-24-AF-7A systemd[1]: systemd-resolved.service: Main process exited, code=killed, status=11/SEGV
```

also the system-resolved service is aborted.

```
Dec 16 12:53:47 TREND-24-AF-7A systemd-resolved[2591]: Assertion 'DNS_TRANSACTION_IS_LIVE(q->state)' failed at src/resolve/resolved-dns-query.c:520, function dns_query_complete(). Aborting.
Dec 16 12:53:47 TREND-24-AF-7A systemd[1]: systemd-resolved.service: Main process exited, code=killed, status=6/ABRT
```

Revision history for this message
Naveen chand (h413048) wrote :

Hi Nick,
The setup is having multiple devices with the same configuration connected to a network.
If you need more details about his please go through the below bug.
https://bugs.launchpad.net/shiner/+bug/2001620

Regards,
Naveen

Revision history for this message
Nick Rosbrook (enr0n) wrote :

Thanks, I am aware of the private bug you linked. My intention was to get as much information as appropriate on to this public bug so that the case for SRU'ing is clear.

Nick Rosbrook (enr0n)
description: updated
Revision history for this message
Nick Rosbrook (enr0n) wrote :

In any case, I think we are good now. Thanks for the extra information about the crash.

summary: - Systemd-resolved is crashing
+ systemd-resolved crashes due to use-after-free bug
Nick Rosbrook (enr0n)
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Naveen, or anyone else affected,

Accepted systemd into focal-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/systemd/245.4-4ubuntu3.22 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-focal to verification-done-focal. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-focal. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in systemd (Ubuntu Focal):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-focal
Revision history for this message
Ubuntu SRU Bot (ubuntu-sru-bot) wrote : Autopkgtest regression report (systemd/245.4-4ubuntu3.22)

All autopkgtests for the newly accepted systemd (245.4-4ubuntu3.22) for focal have finished running.
The following regressions have been reported in tests triggered by the package:

fwupd/1.7.9-1~20.04.1 (armhf)
gvfs/1.44.1-1ubuntu1.2 (amd64, arm64, ppc64el)
linux-bluefield/5.4.0-1063.69 (arm64)
linux-hwe-5.15/5.15.0-72.79~20.04.1 (ppc64el)
linux-intel-iotg-5.15/5.15.0-1030.35~20.04.1 (amd64)
linux-lowlatency-hwe-5.15/5.15.0-72.79~20.04.1 (arm64)

Please visit the excuses page listed below and investigate the failures, proceeding afterwards as per the StableReleaseUpdates policy regarding autopkgtest regressions [1].

https://people.canonical.com/~ubuntu-archive/proposed-migration/focal/update_excuses.html#systemd

[1] https://wiki.ubuntu.com/StableReleaseUpdates#Autopkgtest_Regressions

Thank you!

Revision history for this message
Bugra Aydogar (bugraaydogar) wrote :

Hi Brian,

Commercial Engineering(CE) and field team provided a custom core20 snap to Naveen(customer) that includes the suggested fix and it is being tested around 2 months. According to the customer input, the reported issue is no longer seen and the fix is valid.

Thank you,
Bugra

tags: added: verification-done-focal
removed: verification-needed verification-needed-focal
Revision history for this message
Kenyon Ralph (kralph) wrote :

With 245.4-4ubuntu3.21, I was seeing resolved crash every few minutes due to this bug. With 245.4-4ubuntu3.22, I have not seen this crash, resolved seems to be working correctly.

Revision history for this message
Robie Basak (racb) wrote :

This is blocked from release - please see comment 8.

Revision history for this message
Kenyon Ralph (kralph) wrote :

Those test failures look like they don't have anything to do with the systemd update.

Revision history for this message
Nick Rosbrook (enr0n) wrote :

I agree that the remaining failures look unrelated to this systemd update, and they should be hinted:

fwupd/1.7.9-1~20.04.1 armhf: This is a known issue (bug 1994143) and is addressed by a fwupd upload in focal-proposed.
linux-azure-5.15/5.15.0-1039.46~20.04.1 amd64, linux-lowlatency-hwe-5.15/5.15.0-73.80~20.04.1 arm64: These are failing with badpkg[1][2], which appears to be caused by toolchain dependency issues.

[1] https://autopkgtest.ubuntu.com/results/autopkgtest-focal/focal/arm64/l/linux-lowlatency-hwe-5.15/20230531_172319_82f87@/log.gz
[2] https://autopkgtest.ubuntu.com/results/autopkgtest-focal/focal/amd64/l/linux-azure-5.15/20230531_140824_1120e@/log.gz

Revision history for this message
Robie Basak (racb) wrote :

Thank you for the analysis! This all seem reasonable so we can land this.

But first I need to get the report clean. I ran fwupd/armhf on Focal against migration-reference/0 which failed as expected. That should cause fwupd to no longer flag as a regression when the report is regenerated (I hope!).

I'll do the same for linux-azure-5.15/amd64 and linux-lowlatency-hwe-5.15/arm64 now.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package systemd - 245.4-4ubuntu3.22

---------------
systemd (245.4-4ubuntu3.22) focal; urgency=medium

  * resolve: fix potential memleak and use-after-free (LP: #2012943)
    File: debian/patches/lp2012943-resolve-fix-potential-memleak-and-use-after-free.patch
    https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=ed2729587663dbab3583d06492b715df2896874e

 -- Nick Rosbrook <email address hidden> Mon, 27 Mar 2023 13:54:06 -0400

Changed in systemd (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Robie Basak (racb) wrote : Update Released

The verification of the Stable Release Update for systemd has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.