cpu might go into a hot loop on systems with uncommon udev data

Bug #1996176 reported by Christian Ehrhardt 
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libvirt (Ubuntu)
Fix Released
Undecided
Christian Ehrhardt 
Jammy
Fix Released
Undecided
Christian Ehrhardt 
Kinetic
Fix Released
Undecided
Christian Ehrhardt 

Bug Description

[ Impact ]

 * udev rules can carry a lot of data. In a particular case
   that we have found it was a somewhat overeager bios that
   delivered more than 8k content in a single event. This
   triggered an issue in systemds libudev which we fixed
   in systemd upstream and will find its way into stable
   releases as well.

 * But at the same time it showed that libvirt is crashing
   rather hard on an event like this and that led to a
   libvirt thread to essentially
    a) become a busy loop burnign cpu
    b) missing events as the handler died

 * Upstream accepted my fix to harden the event handler for
   that situation which allows it to survive working with
   the older broken systemd as well as any other situation
   that might cause this.
   This SRU will backport that to Ubuntu releases.

[ Test Plan ]

 * There might be artificial ways to test this, but we know
   that the internal system diglett has one of the bad bios
   that trigger this error.
 * On such a system one just needs to run these steps

1. libvirt runs fine
2. something (could be something completely else) triggers udev events
3. one of those udev events has a massive content size
4. systemd libudev delivers EINVAL due to failing on the size
5. libvirt udev listener dies due to that
6. later on many thing in libvirt do a "push event, wait for handler"
   which appears to be a hot loop burning a full cpu all the time.

   With the fix the busy loop will not be triggered.

[ Where problems could occur ]

 * The udev event handler of libvirt will now survive EINVAL
   return codes. I consider it unlikely but there might be a
   chance that other situations cause those which now would
   no more leading to an exit of the handler.
   Nevertheless the consequence of silently having that handler
   dying is bad, and IMHO with the fix - even in this unlikely
   environment - it is better to have the handler continue working.

[ Other Info ]

 * n/a

----

Situation:
1. libvirt runs fine
2. something (could be something completely else) triggers udev events
3. one of those udev events has a massive content size
4. systemd libudev delivers EINVAL due to failing on the size
5. libvirt udev listener dies due to that
6. later on many thing in libvirt do a "push event, wait for handler"
   which appears to be a hot loop burning a full cpu all the time.

As outlined in the upstream bug [2] there is an issue between
a) systemd before a fix for [1] (=anything before ~now) fixing #4 of the above
b) libvirt before a fix for [2] fixing #5 of the above
c) bios data (or anything else) creating rather huge udev content

By now systemd was fixed upstream avoiding the issue in the long run.

But we should consider to harden at least the most recent LTS by backporting the fix [3] to avoid this issue. If we get a real report by anyone on older releases we can even consider that.

[1]: https://github.com/systemd/systemd/issues/24987
[2]: https://gitlab.com/libvirt/libvirt/-/issues/245
[3]: https://listman.redhat.com/archives/libvir-list/2022-November/235723.html

Related branches

tags: added: libvirt-23.04 server-todo
tags: removed: server-todo
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This has completed upstream now
  https://listman.redhat.com/archives/libvir-list/2022-November/235723.html
and is applied as:
  https://gitlab.com/libvirt/libvirt/-/commit/33a38492b75acb7dbec9b64c41a5dba4acde4240

There is another bug 1993304 which I'd like to combine in one upload.
So there is a bit more delay to get this fixed.

Changed in libvirt (Ubuntu):
status: New → Triaged
Changed in libvirt (Ubuntu Jammy):
status: New → Triaged
Changed in libvirt (Ubuntu Kinetic):
status: New → Triaged
tags: added: server-todo
Changed in libvirt (Ubuntu):
assignee: nobody → Christian Ehrhardt  (paelzer)
Changed in libvirt (Ubuntu Jammy):
assignee: nobody → Christian Ehrhardt  (paelzer)
Changed in libvirt (Ubuntu Kinetic):
assignee: nobody → Christian Ehrhardt  (paelzer)
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Already uploaded to Lunar-proposed.
Now tests and review is complete for SRUs as well so I uploaded it to J+K -unapproved too.

Changed in libvirt (Ubuntu):
status: Triaged → In Progress
Changed in libvirt (Ubuntu Jammy):
status: Triaged → In Progress
Changed in libvirt (Ubuntu Kinetic):
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 8.6.0-0ubuntu5

---------------
libvirt (8.6.0-0ubuntu5) lunar; urgency=medium

  * d/p/u/tests-Fix-libxlxml2domconfigtest-with-latest-xen.patch: fix FTBFS
    with latest libxl

libvirt (8.6.0-0ubuntu4) lunar; urgency=medium

  [ Lena Voytek ]
  * d/p/u/fix-swtpm-pid-duplication.patch: Clean up swtpm pids after a vm
    shuts down (LP: #1997269)

  [Christian Ehrhardt ]
  * d/p/u/lp-1993304-apparmor-allow-getattr-on-usb-devices.patch: prevent
    apparmor denials on USB forwarding (LP: #1993304)
  * d/p/u/lp-1996176-nodedev-ignore-EINVAL-from-libudev-in-udevEventHandl.patch:
    tolerate the impact of too large udev data avoiding a busy loop
    (LP: #1996176)

 -- Christian Ehrhardt <email address hidden> Tue, 22 Nov 2022 16:13:36 +0100

Changed in libvirt (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Please test proposed package

Hello Christian, or anyone else affected,

Accepted libvirt into kinetic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/8.6.0-0ubuntu3.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-kinetic to verification-done-kinetic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-kinetic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in libvirt (Ubuntu Kinetic):
status: In Progress → Fix Committed
tags: added: verification-needed verification-needed-kinetic
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Christian, or anyone else affected,

Accepted libvirt into jammy-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/libvirt/8.0.0-1ubuntu7.4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, what testing has been performed on the package and change the tag from verification-needed-jammy to verification-done-jammy. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-jammy. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in libvirt (Ubuntu Jammy):
status: In Progress → Fix Committed
tags: added: verification-needed-jammy
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Check if a system is affected by bad dmi data and a systemd passing on the error.
$ sudo SYSTEMD_LOG_LEVEL=debug /usr/bin/udevadm monitor &
$ sudo udevadm trigger --action=add --subsystem-match dmi
KERNEL[669861.508604] add /devices/virtual/dmi/id (dmi)
sd-device-monitor: Invalid message length.

While the system I found the issue was upgraded and no more has the issue I found that the sister system is affected by the same. Some re-setup and I should be able to verify there.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Doing nothing it got stuck as expected without the fix (jammy):
   2035 root 20 0 1576496 39888 31488 R 100.0 0.1 0:15.20 libvirtd

$ sudo systemctl restart libvirtd
gets it back down

$ sudo udevadm trigger --action=add --subsystem-match dmi
# plus waiting a minute or so gets it back up

There are so many thing sin proposed right now - wow.
Not copy&pasting the full output here as I'd usually do :-/
Lots of work party people.
Nevertheless a full upgrade to proposed worked just fine.

After that I was no more able to tip it into the hot loop no matter what I triggered in udev.

=> Verified for Jammy

Will look at kinetic tomorrow, need to check another bug on jammy before upgrading ...

tags: added: verification-done-jammy
removed: verification-needed-jammy
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Upgrading to Kinetic for the next test ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Same problem on Kinetic (as expected) before the update:
   1073 root 20 0 1659848 47424 33744 R 93.8 0.1 0:20.74 libvirtd

Upgrading was just fine (again there was so much in flight that I won't copy all of it).
But an excerpt:
...
Setting up python3-software-properties (0.99.27.1) ...
Setting up linux-headers-5.19.0-27-generic (5.19.0-27.28) ...
Setting up libvirt-daemon-driver-qemu (8.6.0-0ubuntu3.1) ...
Setting up libvirt-daemon (8.6.0-0ubuntu3.1) ...
Setting up libvirt-daemon-driver-storage-rbd (8.6.0-0ubuntu3.1) ...
Setting up qemu-system-gui (1:7.0+dfsg-7ubuntu2.1) ...
Setting up libqmi-glib5:amd64 (1.32.0-1ubuntu0.22.10.1) ...
Setting up linux-headers-generic (5.19.0.27.24) ...
Setting up software-properties-common (0.99.27.1) ...
Setting up libvirt-daemon-system (8.6.0-0ubuntu3.1) ...
Installing new version of config file /etc/apparmor.d/abstractions/libvirt-qemu ...
virtlockd.service is a disabled or a static unit, not starting it.
Setting up libvirt-daemon dnsmasq configuration.
...

BTW even correctly spotting this due to a lib update :-)
...
VM guests are running outdated hypervisor (qemu) binaries on this host:
 'j' with pid 1711

On trigger and libvirt interaction it says down in regard to cpu usage:
   4589 root 20 0 1659824 44516 33428 S 0.3 0.1 0:00.96 libvirtd

tags: added: verification-done verification-done-kinetic
removed: verification-needed verification-needed-kinetic
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 8.0.0-1ubuntu7.4

---------------
libvirt (8.0.0-1ubuntu7.4) jammy; urgency=medium

  * d/p/u/lp-1993304-apparmor-allow-getattr-on-usb-devices.patch: prevent
    apparmor denials on USB forwarding (LP: #1993304)
  * d/p/u/lp-1996176-nodedev-ignore-EINVAL-from-libudev-in-udevEventHandl.patch:
    tolerate the impact of too large udev data avoiding a busy loop
    (LP: #1996176)

 -- Christian Ehrhardt <email address hidden> Tue, 22 Nov 2022 15:59:28 +0100

Changed in libvirt (Ubuntu Jammy):
status: Fix Committed → Fix Released
Revision history for this message
Brian Murray (brian-murray) wrote : Update Released

The verification of the Stable Release Update for libvirt has completed successfully and the package is now being released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package libvirt - 8.6.0-0ubuntu3.1

---------------
libvirt (8.6.0-0ubuntu3.1) kinetic; urgency=medium

  [ Lena Voytek ]
  * d/p/u/fix-swtpm-pid-duplication.patch: Clean up swtpm pids after a vm
    shuts down (LP: #1997269)

  [Christian Ehrhardt ]
  * d/p/u/lp-1993304-apparmor-allow-getattr-on-usb-devices.patch: prevent
    apparmor denials on USB forwarding (LP: #1993304)
  * d/p/u/lp-1996176-nodedev-ignore-EINVAL-from-libudev-in-udevEventHandl.patch:
    tolerate the impact of too large udev data avoiding a busy loop
    (LP: #1996176)

 -- Christian Ehrhardt <email address hidden> Tue, 22 Nov 2022 11:21:30 +0100

Changed in libvirt (Ubuntu Kinetic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.