autopkgtest breaks in groovy at 6.6.0-1ubuntu1 for smoke-lxc

Bug #1892826 reported by Christian Ehrhardt 
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
libguestfs (Ubuntu)
Fix Released
Undecided
Unassigned
libvirt (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

We had issues of the smoke tests behaving differently in Debian Ci and Ubuntu in the past.
We carry some delta already for these that might need to get updated as that seems to be another related case:
  smoke-lxc FAIL non-zero exit status 1

Domain sl started

+ grep -qs starting up /var/log/libvirt/lxc/sl.log
+ check_domain
+ grep -qs sl[[:space:]]\+running
+ virsh list
+ virsh lxc-enter-namespace --noseclabel sl /bin/ls /bin/ls
error: Requested operation is not valid: Init pid is not yet available

Currently passes in Debian because of:
smoke-lxc SKIP Test requires machine-level isolation but testbed does not provide that

I only see this error with the version in proposed, so maybe a real bug in libvirt-lxc?

Related branches

Changed in libvirt (Ubuntu):
status: New → Confirmed
tags: added: update-excuse
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hrm, no idea how that passed by so far, maybe as libvirt-lxc is a universe package it was no more triggered. IMHO it also isn't much focussed/loved by upstream (which is to some extend why we demoted it).

But I see:
[582093.524644] libvirt_lxc[261446]: segfault at 0 ip 0000000000000000 sp 00007ffdd2345598 error 14 in libvirt_lxc[5587e42aa000+8000]
[582093.524650] Code: Bad RIP value.

description: updated
summary: - autopkgtest breaks in groovy at 6.6.0-1ubuntu1
+ autopkgtest breaks in groovy at 6.6.0-1ubuntu1 for smoke-lxc
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

libvirt log from the crash (we know the segfault, but before there is some PID/machine not found which might be the reason):

libvirtd[262920]: error from service: GetMachineByPID: PID 263033 does not belong to any known machine
kernel: libvirt_lxc[263033]: segfault at 0 ip 0000000000000000 sp 00007ffc8b365758 error 14 in libvirt_lxc[557529f91000+8000]
groovy kernel: Code: Bad RIP value.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

systemd[1]: machine-lxc\x2d263033\x2dsl.scope: Succeeded.
systemd-machined[6502]: Machine lxc-263033-sl terminated.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

2020-08-25 07:14:30.657+0000: 264543: debug : virFileFindResourceFull:1768 : Resolved 'libvirt_lxc' to '/usr/lib/libvirt/libvirt_lxc'

From package:
libvirt-daemon: /usr/lib/libvirt/libvirt_lxc

The following should be enough for debug:
$ sudo apt install libvirt-daemon-system-dbgsym libvirt-daemon-dbgsym libvirt-daemon-driver-lxc-dbgsym

debug : virCommandRunAsync:2618 : About to run PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin LIBVIRT_DEBUG=1 LIBVIRT_LOG_OUTPUTS=1:stderr /usr/lib/libvirt/libvirt_lxc --name sl --console 24 --security=apparmor --handshake 30
...

To have the correct env set up we need to stop libvirt (via gdb) just when it would start the lxc helper.

break on
virCommandRunAsync
and then on
virLXCProcessConnectMonitor

At this point we have the helper running
265197 /usr/lib/libvirt/libvirt_lxc --name sl --console 25 --security=apparmor --handshake 30

The backtrace on the crash is this:
#0 0x0000000000000000 in ?? ()
#1 0x00007fd6da74ef83 in __GI_xdr_uint64_t (xdrs=0x7ffd4a0179e0, uip=0x7ffd4a017a90) at xdr_intXX_t.c:72
#2 0x00005579bfc2f60d in xdr_virLXCMonitorInitEventMsg (xdrs=<optimized out>, objp=<optimized out>) at lxc/lxc_monitor_protocol.c:32
#3 0x00007fd6dac02823 in virNetMessageEncodePayload () from target:/lib/x86_64-linux-gnu/libvirt.so.0
#4 0x00005579bfc2acbd in virLXCControllerEventSend (ctrl=0x5579c0acec20, procnr=2, proc=0x5579bfc2f600 <xdr_virLXCMonitorInitEventMsg>, data=0x7ffd4a017a90)
    at ../../../src/lxc/lxc_controller.c:2246
#5 0x00005579bfc2ca8b in virLXCControllerEventSendInit (initpid=265199, ctrl=0x5579c0acec20) at ../../../src/lxc/lxc_controller.c:2308
#6 virLXCControllerClientPrivateNew (client=<optimized out>, opaque=0x5579c0acec20) at ../../../src/lxc/lxc_controller.c:935
#7 0x00007fd6dac0f458 in virNetServerClientNew () from target:/lib/x86_64-linux-gnu/libvirt.so.0
#8 0x00007fd6dac12906 in ?? () from target:/lib/x86_64-linux-gnu/libvirt.so.0
#9 0x00007fd6dac0d9ae in ?? () from target:/lib/x86_64-linux-gnu/libvirt.so.0
#10 0x00007fd6daac217a in ?? () from target:/lib/x86_64-linux-gnu/libvirt.so.0
#11 0x00007fd6da8cd2df in g_main_context_dispatch () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#12 0x00007fd6da8cd688 in ?? () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#13 0x00007fd6da8cd753 in g_main_context_iteration () from target:/lib/x86_64-linux-gnu/libglib-2.0.so.0
#14 0x00007fd6daac2e34 in virEventGLibRunOnce () from target:/lib/x86_64-linux-gnu/libvirt.so.0
#15 0x00007fd6dac11f25 in virNetDaemonRun () from target:/lib/x86_64-linux-gnu/libvirt.so.0
#16 0x00005579bfc21932 in virLXCControllerMain (ctrl=0x5579c0acec20) at ../../../src/lxc/lxc_controller.c:1349
#17 virLXCControllerRun (ctrl=0x5579c0acec20) at ../../../src/lxc/lxc_controller.c:2433
#18 main (argc=<optimized out>, argv=<optimized out>) at ../../../src/lxc/lxc_controller.c:2702

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The host being libvirt starts to communicate with libvirt_lxc
in src/lxc/lxc_process.c:
    monitor = virLXCMonitorNew(vm, cfg->stateDir, &monitorCallbacks);

On the receiving side this then crashes
__GI_xdr_uint64_t is xdr_uint64_t is of glibc-2.31/sunrpc/xdr_intXX_t.c

Once the libvirt_lxc process exists
communication is via /run/libvirt/lxc/sl.sock

then on libvirtd break on
b virLXCMonitorNew

On the receiving end we see:
b virLXCControllerEventSendInit

That receives it like
Thread 1 "libvirt_lxc" hit Breakpoint 1, virLXCControllerEventSendInit (initpid=265337, ctrl=0x56399bb35c20) at ../../../src/lxc/lxc_controller.c:2304
2304 VIR_DEBUG("Init pid %lld", (long long)initpid);

The pid here is 265337 and the process exists
(gdb) p initpid
$1 = 265337
$ ps axlf
4 0 265335 1 20 0 143352 19352 - tl ? 0:00 /usr/lib/libvirt/libvirt_lxc --name sl --console 25 --security=apparmor --handshake 30
4 0 265337 265335 20 0 4240 3452 - Ss+ pts/0 0:00 \_ /bin/bash

(gdb) p *ctrl
$3 = {name = 0x56399bb360f0 "sl", vm = 0x56399bb468a0, def = 0x56399bb44c10, handshakeFd = -1, initpid = 265337, nnbdpids = 0, nbdpids = 0x0, nveths = 0, veths = 0x0, nnicindexes = 0,
  nicindexes = 0x0, npassFDs = 0, passFDs = 0x0, nsFDs = 0x0, nconsoles = 1, consoles = 0x56399bb4a000, devptmx = 0x56399bb45800 "/run/libvirt/lxc/sl.devpts/ptmx", nloopDevs = 0,
  loopDevFds = 0x0, securityManager = 0x56399bb49830, daemon = 0x56399bb4b030, firstClient = true, client = 0x56399bb71080, prog = 0x56399bb41810, inShutdown = false, timerShutdown = 1,
  cgroup = 0x56399bb6f250, fuse = 0x56399bb3ef00}

It wants to reply:
2308 virLXCControllerEventSend(ctrl,
2309 VIR_LXC_MONITOR_PROC_INIT_EVENT,
2310 (xdrproc_t)xdr_virLXCMonitorInitEventMsg,
2311 (void*)&msg);

(gdb) p ctrl->client
$6 = (virNetServerClientPtr) 0x56399bb71080

This eventually calls virNetMessageEncodePayload(msg, proc, data)

(gdb) p *msg
$14 = {tracked = false, buffer = 0x56399bb72000 "", bufferLength = 65540, bufferOffset = 28, header = {prog = 305402420, vers = 1, proc = 2, type = VIR_NET_MESSAGE, serial = 1,
    status = VIR_NET_OK}, cb = 0x0, opaque = 0x0, nfds = 0, fds = 0x0, donefds = 0, next = 0x0}
(gdb) p data
$15 = (void *) 0x7ffe62f988d0
(gdb) p proc
$16 = (xdrproc_t) 0x56399aba6600 <xdr_virLXCMonitorInitEventMsg>

All lookg quite normal, then it jumps with these values into
xdr_virLXCMonitorInitEventMsg -> __GI_xdr_uint64_t and on the return path
from there returns to 0x0 crashing due to that.

Also needs:
$ sudo apt install libvirt0-dbgsym
And actually best a -O0 build of this.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I can replicate this with libvirt v6.6.0 from git.
6.0 works from packaging
v6.0.0 from git does work as well.

There are two approaches now:
#1 the intro of tirpc
v6.0.0
configure: xdr: yes (CFLAGS='' LIBS='')
vs
v6.6.0
configure: xdr: yes (CFLAGS='-I/usr/include/tirpc' LIBS='-ltirpc')

Since the xdr code that wraps data crashes and the use of libntirpc-dev (new transport-independent RPC library - development files) is new that might be the primary issue here.
Also this [1] old thread triggers all the buzzwords matching this issue in one go which is a +1 on maybe being related.
And there is [2] in v6.6 and we added libtirpc-dev to the build-deps (as it is now strictly required).
configure: error: You must install the libtirpc >= 0.1.10 pkg-config module to compile libvirt
We can't build without tirpc anymore. So I can't just disable+try.

#2 just raw bisect the code and do something else until then
Chances are this will identify [2] but we will see.

[1]: https://www.redhat.com/archives/libvir-list/2014-September/msg01360.html
[2]: https://gitlab.com/libvirt/libvirt/-/commit/d7147b3797380de2d159ce6324536f3e1f2d97e3

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

The bisect confirms the assumption I had.

$ git bisect log
git bisect start
# old: [10ff7997c510171ab231d7c4a484abc1fecf3e78] Release of libvirt-6.0.0
git bisect old 10ff7997c510171ab231d7c4a484abc1fecf3e78
# new: [da94294404dde04c74fa5aba18d407fdb930e2dc] Release of libvirt-6.6.0
git bisect new da94294404dde04c74fa5aba18d407fdb930e2dc
# old: [95fcd29c72f43839c517786e343abc70e2a48445] docs: style git mirror links less prominently
git bisect old 95fcd29c72f43839c517786e343abc70e2a48445
# old: [8c6257942425c36e6f6d96629d334fb09a94da28] qemu: Use qemuSecurityDomainSetPathLabel() to set seclabes on not saved state files
git bisect old 8c6257942425c36e6f6d96629d334fb09a94da28
# old: [065f7d5ba9999edd16332b381139dbca1ebae140] remove redundant calls to virBufferFreeAndReset()
git bisect old 065f7d5ba9999edd16332b381139dbca1ebae140
# new: [ad3adcd5ec2c5eef0dadee709ee45240dfd589ab] qemu: hotplug: Don't regenerate iSCSI secret alias
git bisect new ad3adcd5ec2c5eef0dadee709ee45240dfd589ab
# new: [53a55eff5962d2975bcdfc2dc85ae60f475977dc] qemu_domain: moved qemuDomainNamespace to `qemu_domain`
git bisect new 53a55eff5962d2975bcdfc2dc85ae60f475977dc
# old: [c4951694786ecd45424769979762c17e4c8e56d0] m4: virt-sanlock: drop check for SANLK_INQ_WAIT
git bisect old c4951694786ecd45424769979762c17e4c8e56d0
# old: [78e76a8a42b7d3777e1bf6d6e8e0b594439d2b33] tests: use WITH_NSS instead of NSS
git bisect old 78e76a8a42b7d3777e1bf6d6e8e0b594439d2b33
# old: [f68a14d17f27f1ed26227e880f8ed8c900bbe711] secdrivers: Rename @stdin_path argument of virSecurityDomainSetAllLabel()
git bisect old f68a14d17f27f1ed26227e880f8ed8c900bbe711
# old: [9ad637c9651ff29955dd6aa8fe31f639b42b7315] docs: convert FIG files into SVG format
git bisect old 9ad637c9651ff29955dd6aa8fe31f639b42b7315
# new: [d7147b3797380de2d159ce6324536f3e1f2d97e3] m4: virt-xdr: rewrite XDR check
git bisect new d7147b3797380de2d159ce6324536f3e1f2d97e3
# old: [d3a1a3d708701a31078da5d68f50c268f52123e5] m4: virt-secdriver-selinux: drop obsolete function checks
git bisect old d3a1a3d708701a31078da5d68f50c268f52123e5
# first new commit: [d7147b3797380de2d159ce6324536f3e1f2d97e3] m4: virt-xdr: rewrite XDR check

commit d7147b3797380de2d159ce6324536f3e1f2d97e3
Author: Pavel Hrdina <email address hidden>
Date: Fri Jun 19 00:44:07 2020 +0200

    m4: virt-xdr: rewrite XDR check

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

With the switch to meson everything will be different now anyway.
But I'll report it upstream to make them aware there might be an issue hidden underneath.

This is a recent fix in 6.6 and should not really change things (but fails to do so).
For Groovy I'll revert the commit which works at build time just as it did before and gets the libvirt-lxc guest working again (no crash).

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Just verified git v6.6.0 plus reverted d7147b379 works.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Discussed here
https://www.redhat.com/archives/libvir-list/2020-August/msg00921.html

Comes down to glibc vs tirpc the linking at build time is wrong (uses the wrong version).

With glibc 2.32 this won't work anymore as --enable-obsolete-rpc doesn't exist by then.

For now I'll do the revert to unbreak things.
Once 2.32 hits we can drop the revert and rebuild.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

libguestfs could by affected by the same

root@g:~# eu-readelf -a /usr/lib/x86_64-linux-gnu/libguestfs.so.0 | grep xdr_uint64 | grep GLOBAL
  185: 0000000000000000 0 FUNC GLOBAL DEFAULT UNDEF xdr_uint64_t@GLIBC_2.2.5 (3)

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.5 KiB)

This bug was fixed in the package libvirt - 6.6.0-1ubuntu2

---------------
libvirt (6.6.0-1ubuntu2) groovy; urgency=medium

  * d/p/u/lp-1892826-Revert-m4-virt-xdr-rewrite-XDR-check.patch: avoid clashes
    between libtripc and glibc that break libvirt-lxc (LP: #1892826)
  * d/p/ubuntu-aa/lp-1892736-apparmor-allow-libvirtd-to-call-virtiofsd.patch:
    allow libvirt to control virtiofsd (LP: #1892736)

libvirt (6.6.0-1ubuntu1) groovy; urgency=medium

  * Merge with Debian 6.6.0-1 from experimental
    Among many other new features and fixes this includes fixes for:
    (LP: #1874647) - Stale libvirt cache leads to VM startup failures
    (LP: #1869796) - bad ordering and dependent restarts of services/sockets
    Remaining changes:
    - d/p/ubuntu-aa/lp-1847361-load-versioned-module.patch: allow loading
      versioned modules after qemu package upgrades (LP 1847361)
    - libvirt-uri.sh: Automatically switch default libvirt URI for users
      via user profile (xen URI on dom0, qemu:///system otherwise)
    - Disable libssh2 support (universe dependency)
    - Disable firewalld support (universe dependency)
    - Set qemu-group to kvm (for compat with older ubuntu)
    - Additional apport package-hook
    - Autostart default bridged network (As upstream does, but not Debian).
      In addition to just enabling it our solution provides:
      + do not autostart if subnet is already taken (e.g. in guests).
      + iterate some alternative subnets before giving up
    - d/p/ubuntu/Allow-libvirt-group-to-access-the-socket.patch: This is
      the group based access to libvirt functions as it was used in Ubuntu
      for quite long.
      + d/p/ubuntu/daemon-augeas-fix-expected.patch fix some related tests
        due to the group access change.
      + d/libvirt-daemon-system.postinst: add users in sudo to the libvirt
        group.
    - ubuntu/parallel-shutdown.patch: set parallel shutdown by default.
    - Update README.Debian with Ubuntu changes
    - d/p/ubuntu/ubuntu_machine_type.patch: accept ubuntu types as pci440fx
    - fix autopkgtests
      + d/t/control, d/t/smoke-qemu-session: fixup smoke-qemu-session by making
        vmlinuz available and accessible (Debian bug 848314)
      + d/t/control: fix smoke-qemu-session by ensuring the service will run
        installing libvirt-daemon-system
      + d/t/smoke-lxc: fix smoke-lxc by ignoring potential issues on destroy as
        long as the following undefine succeeds
      + d/t/smoke-lxc: use systemd instead of sysV to restart the service
    - dnsmasq related enhancements
      + run dnsmasq as libvirt-dnsmasq (LP: 1743718)
      + d/libvirt-daemon-system.postinst: add libvirt-dnsmasq user and group
      + d/libvirt-daemon-system.postrm: remove libvirt-dnsmasq user and group
        on purge
      + d/p/ubuntu/dnsmasq-as-priv-user: write dnsmasq config with user
        libvirt-dnsmasq and adapt the self tests to expect that config
      + d/libvirt-daemon-system.postinst: fix old libvirt-dnsmasq users group
      + Add dnsmasq configuration to work with system wide dnsmasq-base
    - debian/rules: disable the netcf backend. (LP: 1764314)
    - debian/patches/ubuntu/ovmf_paths.patch...

Changed in libvirt (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Update now that we have glibc 2.32

libguestfs as rebuilt correctly
root@g:~# eu-readelf -a /usr/lib/x86_64-linux-gnu/libguestfs.so.0 | grep xdr_uint64 | grep GLOBAL
  102: 0000000000000000 0 FUNC GLOBAL DEFAULT UNDEF xdr_uint64_t@TIRPC_0.3.0 (12)

libvirt not yet, I'll need to upload a revert of the change that was needed prior to glibc 2.32
root@g:~# eu-readelf -a /usr/lib/libvirt/libvirt_lxc | grep xdr_uint64 | grep GLOBAL
  104: 0000000000000000 0 FUNC GLOBAL DEFAULT UNDEF xdr_uint64_t@GLIBC_2.2.5 (4)

Because 2.32 doesn't provide that anymore

Changed in libguestfs (Ubuntu):
status: New → Fix Released
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Our glibc2.32 in groovy still has the xdr symbols of tripc, so the revert has to be retained for now:

root@g:~# dpkg -S /lib/x86_64-linux-gnu/libc.so.6
libc6:amd64: /lib/x86_64-linux-gnu/libc.so.6

root@g:~# dpkg -l libc6
ii libc6:amd64 2.32-0ubuntu3 amd64 GNU C Library: Shared libraries

root@g:~# eu-readelf -a /lib/x86_64-linux-gnu/libc.so.6 | grep xdr_uint64
 2049: 0000000000150c70 228 FUNC GLOBAL DEFAULT 16 xdr_uint64_t@GLIBC_2.2.5
  [ 180] xdr_uint64_t
   initial_location: +0x0000000000150c70 <xdr_uint64_t> (offset: 0x150c70)

Also while building fine with 2.31 (but later failing on usage by the symbol collision) it now with 2.32 a FTFBS without the patch:
/usr/bin/ld: lxc/libvirt_lxc-lxc_monitor_protocol.o: undefined reference to symbol 'xdr_enum@@TIRPC_0.3.0'
/usr/bin/ld: /lib/x86_64-linux-gnu/libtirpc.so.3: error adding symbols: DSO missing from command lin

To be sure this is to be retested with a libvirt rebuild ...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

hmmm :-/

Old/Current:
root@g:~# eu-readelf -a /usr/lib/libvirt/libvirt_lxc | grep xdr_uint64 | grep GLOBAL
  104: 0000000000000000 0 FUNC GLOBAL DEFAULT UNDEF xdr_uint64_t@GLIBC_2.2.5 (4)

New build has TIRPC linked now (built with the revert applied):
root@g:~# eu-readelf -a /usr/lib/libvirt/libvirt_lxc | grep xdr_uint64 | grep GLOBAL
   85: 0000000000000000 0 FUNC GLOBAL DEFAULT UNDEF xdr_uint64_t@TIRPC_0.3.0 (8)

But as mentioned even our glibc 2.32 still has xdr elements.
Let us hope this does not crash again.

root@g:~# eu-readelf -a /lib/x86_64-linux-gnu/libc.so.6 | grep xdr_uint64
 2049: 0000000000150c70 228 FUNC GLOBAL DEFAULT 16 xdr_uint64_t@GLIBC_2.2.5
  [ 180] xdr_uint64_t
   initial_location: +0x0000000000150c70 <xdr_uint64_t> (offset: 0x150c70)

So to summarize:
a) current libvirt build looks for xdr in glibc and still works
b) rebuild as-is looks for xdr in tirpc
c) rebuild with xdr changes no more reverted is an FTFBS

(a)+(b) seem to work fine in sniff tests done for bug 1887490 with lxc guests, but for extra confidence we need to make sure autopkgtest work pre-upload as well.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.