Behaviour change in systemd 243 breaks libvirt autopkgtest

Bug #1844879 reported by Balint Reczey
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
systemd
New
Unknown
libvirt (Ubuntu)
Invalid
Medium
Christian Ehrhardt 
systemd (Ubuntu)
Fix Released
High
Unassigned

Bug Description

With systemd 243 libvirt autopkgtest starts failing:

http://autopkgtest.ubuntu.com/packages/libv/libvirt/eoan/amd64

https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-eoan/eoan/amd64/libv/libvirt/20190921_104518_69667@/log.gz
...
+ virt-xml-validate debian/tests/smoke-lxc.xml
debian/tests/smoke-lxc.xml validates
+ virsh define debian/tests/smoke-lxc.xml
Domain sl defined from debian/tests/smoke-lxc.xml

+ rm -f /var/log/libvirt/lxc/sl.log
+ virsh start sl
Domain sl started

+ grep -qs starting up /var/log/libvirt/lxc/sl.log
+ check_domain
+ grep/bin/ls

Domain sl has been undefined

 -qs sl[[:space:]]\+running
+ virsh list
+ virsh lxc-enter-namespace --noseclabel sl /bin/ls /bin/ls
+ systemctl restart libvirtd
+ check_domain
+ grep -qs sl[[:space:]]\+running
+ virsh list
+ cleanup
+ [ -z ]
+ virsh destroy sl
error: Failed to destroy domain sl
error: Requested operation is not valid: domain is not running
+ true
+ virsh undefine sl
+ CLEANED_UP=1
autopkgtest [10:41:05]: test smoke-lxc: -----------------------]
autopkgtest [10:41:07]: test smoke-lxc: - - - - - - - - - - results - - - - - - - - - -
smoke-lxc FAIL non-zero exit status 1
...

The behaviour change is still present in systemd master (v243-112-g984b96aa7a)

I can revert the change in systemd, but it also fixes an issue reported at upstream.
I will take another look later, but wanted to document my findings.

Revision history for this message
Balint Reczey (rbalint) wrote :

The commit where the regression is introduced in this commit:

commit 0219b3524f414e23589e63c6de6a759811ef8474
Author: Donald Buczek <email address hidden>
Date: Thu Apr 25 09:39:41 2019 +0200

    cgroup: Continue unit reset if cgroup is busy

    When part of the cgroup hierarchy cannot be deleted (e.g. because there
    are still processes in it), do not exit unit_prune_cgroup early, but
    continue so that u->cgroup_realized is reset.

    Log the known case of non-empty cgroups at debug level and other errors
    at warning level.

    Fixes https://github.com/systemd/systemd/issues/12386

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks Rbalint for the report,
once the post-sprint cleanup work is done I'll try to take a look myself on the libvirt side as well.

This area already had some Delta because Debians autopkgtests never worked well in LP-infra.
Maybe this will be another one of those ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FOr now I kicked off tests (local VMs) with and without --apt-pocket=proposed=src:systemd, to be revisited later.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

To summarize (out of the noisy test log).
What happens with this patch applied is that libvirt looses its LXC-guest on the restart of the service.

With the XML from the test this is what happens:
$ virsh define debian/tests/smoke-lxc.xml
$ virsh start sl
$ virsh list
 Id Name State
------------------------
 2280 sl running
$ virsh lxc-enter-namespace --noseclabel sl /bin/ls /bin/ls
/bin/ls
# At this point we know and have confirmed the guest works
# Now restarting libvirtd breaks it.
$ systemctl restart libvirtd
# Now the guest container is gone

Pre restart it looks like:
4 0 2426 1 20 0 138712 16132 poll_s Sl ? 0:00 /usr/lib/libvirt/libvirt_lxc --name sl --console 20 --security=apparmor --handshake 23
4 0 2428 2426 20 0 4084 3216 poll_s Ss+ pts/0 0:00 \_ /bin/bash

The guest log in /var/log/libvirt/lxc/sl.log gets this entry in !BOTH! cases:
2019-09-23 10:51:46.004+0000: 2426: error : virNetSocketReadWire:1804 : End of file while reading data: Input/output error
So this is a common red-herring, the system with systemd 241-7ubuntu1 has the same as well.

Another red-herring is the libvirtd service on restart reporting the LXC processes as left-over. This also happens with old and new systemd:
Sep 23 12:51:46 autopkgtest systemd[1]: libvirtd.service: Found left-over process 2428 (bash) in control group while starting unit. Ignoring.
Sep 23 12:51:46 autopkgtest systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Sep 23 12:51:46 autopkgtest systemd[1]: libvirtd.service: Found left-over process 2426 (libvirt_lxc) in control group while starting unit. Ignoring.
Sep 23 12:51:46 autopkgtest systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.

The error that is different between 241 and 243 is from the libvirtd daemon:
Sep 23 12:51:46 autopkgtest libvirtd[2448]: internal error: No valid cgroup for machine sl
Sep 23 12:51:46 autopkgtest libvirtd[2448]: End of file while reading data: Input/output error

This is not just a visibility problem, the process that was the guest really is gone at this point.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm, there isn't an obvious cgroup/lxd patch in upstream libvirt master branch atm that would suggest itself as fix to test to apply to libvirt.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Not sure yet what we could do on the libvirt side.
For the time being reverting it on systems (as suggested by rbalint) might be the better fix.

Changed in systemd (Ubuntu):
status: New → Confirmed
Changed in libvirt (Ubuntu):
status: New → Incomplete
Changed in systemd (Ubuntu):
importance: Undecided → High
Changed in libvirt (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI reported upstream (Systemd) as https://github.com/systemd/systemd/issues/13629

Revision history for this message
Balint Reczey (rbalint) wrote :

@paelzer: OK, I will go with the revert. It still sounds that libvirt may need a change to keep the lxc guest even with the systemd change in, so if you don't mind I keep this bug open to track this.
If it turns out that libvirt does everything right then I'll be happy to forward the issue to systemd upstream.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Pinged upstream libvirt to be aware of the case as well.
=> https://www.redhat.com/archives/libvir-list/2019-September/msg00974.html

Changed in systemd:
status: Unknown → New
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI: We have identified a behavioral change in libvirt, but while I'll address that it was not the reason for the issue reported here.
Upstream discussion on the systemd issue continues ...

Changed in libvirt (Ubuntu):
assignee: nobody → Christian Ehrhardt  (paelzer)
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.5 KiB)

This bug was fixed in the package systemd - 243-2ubuntu1

---------------
systemd (243-2ubuntu1) focal; urgency=medium

  * Merge to Ubuntu from experimental
  * Refresh patches:
    - Dropped changes:
      * Cherrypick ask-password: prevent buffer overrow when reading from keyring.
        File: debian/patches/ask-password-prevent-buffer-overrow-when-reading-fro.patch
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=6d6e9cbd4fc6e018031a4762e88f2c3aa19e24e8
      * random-util: eat up bad RDRAND values seen on AMD CPUs.
        File: debian/patches/+rdrand-workaround-on-amd.patch
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?h=ubuntu-eoan&id=6ab88231efca4b04b26de6cfb5d671be154aabe0
    - Remaining changes:
      * Recommend networkd-dispatcher
        File: debian/control
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=d1e3b2c7e4757119da0d550b0b3c0a6626a176dc
      * Enable EFI/bootctl on armhf.
        File: debian/control
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=043122f7d8a1487bfd357e815a6ece1ceea6e7d1
      * debian/control: strengthen dependencies.
        File: debian/control
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=d1ecf0c372f5212129c85ae60fddf26b2271a1fe
      * Add conflicts with upstart and systemd-shim
        File: debian/control
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=83ed7496afc7c27be026014d109855f7d0ad1176
      * Specify Ubuntu's Vcs-Git
        File: debian/control
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=fd832930ef280c9a4a9dda2440d5a46a6fdb6232
      * Ubuntu/extra: ship dhclient-enter hook.
        Files:
        - debian/extra/dhclient-enter-resolved-hook
        - debian/rules
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=f3398a213f80b02bf3db0c1ce9e22d69f6d56764
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=258893bae8cbb12670e4807636fe8f7e9fb5407a
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=0725c1169ddde4f41cacba7af3e546704e2206be
      * udev-udeb: ship modprobe.d snippet to force scsi_mod.scan=sync in d-i.
        Files:
        - debian/extra/modprobe.d-udeb/scsi-mod-scan-sync.conf
        - debian/udev-udeb.install
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=eb6d8a2b9504917abb7aa2c4035fdbb7b98227f7
      * debian/extra/start-udev: Set scsi_mod scan=sync even if it's builtin to the kernel (we previously only set it in modprobe.d)
        Files:
        - debian/extra/start-udev
        https://git.launchpad.net/~ubuntu-core-dev/ubuntu/+source/systemd/commit/?id=6b72628f8de991e2c67ac4289fc74daf3abe7d14
      * debian/extra/units/systemd-resolved.service.d/resolvconf.conf:
        drop resolvconf.conf drop-in, resolved integration moved to resolvconf package.
      * debian/extra/wrap_cl.py: add changelog formatter
        Files:
        - debian/extra/wrap_cl.py
        - debian/gbp...

Changed in systemd (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

This no more affects libvirt, marking it as such to not show up in excuses.

Changed in libvirt (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.