qemu-user-static 1:5.0-5ubuntu4 in groovy does not start armhf container

Bug #1890881 reported by Ryutaroh Matsumoto on 2020-08-08
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Debian)
Fix Released
Unknown
qemu (Ubuntu)
Medium
Unassigned
Focal
Undecided
Unassigned

Bug Description

This is somewhat similar but different from
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887606

The following bug does not occur with the upstream version qemu-5.1.0-rc3
built from it source code.

How to reproduce (as root) on amd64 host running Ubuntu:

apt-get install -t groovy qemu-user-static
(the Ubuntu package version is 1:5.0-5ubuntu4)

mmdebstrap --components="main restricted universe multiverse" --variant=standard --architectures=armhf focal /var/lib/machines/armhf-focal http://ports.ubuntu.com/ubuntu-ports/

systemd-nspawn -D /var/lib/machines/armhf-focal -b

# systemd-nspawn -M armhf-focal -b
Spawning container armhf-focal on /var/lib/machines/armhf-focal.
Press ^] three times within 1s to kill container.
systemd 245.4-4ubuntu3 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization systemd-nspawn.
Detected architecture arm.

Welcome to Ubuntu 20.04 LTS!

Set hostname to <armhf-focal>.
Caught <SEGV>, dumped core as pid 3.
Exiting PID 1...
Container armhf-focal failed with error code 255.

Afain, with qemu-5.1.0-rc3, the container starts fine.

Related branches

CVE References

Revision history for this message
Rafael David Tinoco (rafaeldtinoco) wrote :

Hello @emojifrak,

Thanks for reporting this.. I was indeed able to reproduce LP: #1886811 back then.

I'm subscribing @paelzer in this and the other bug...

https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887606
https://bugs.launchpad.net/qemu/+bug/1886811

Changed in qemu (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
tags: added: server-next
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thank you both already!

The last fix in git that clearly reads like a fix to this is "65b261a63a (linux-user: add netlink RTM_SETLINK command" which is already applied. This also is the latest change to the file, if there is another change affecting this it must be somewhere else.

Since this is reported to be bad at 5.0 but good at 5.1-rc3 - does one of you have the time to throw this test into a git bisect? That should be an easy way to identify which fix to add to the 5.0 in groovy.

Revision history for this message
Ryutaroh Matsumoto (emojifreak) wrote :

Hi @paelzer, what I had done was
testing "netlink RTM_SETLINK command" against the
latest git source of qemu (around July 2020),
and seeing that self-compiled version from the source worked fine.
I had never seen a working Ubuntu qemu-user-static package
that can start an armhf container on an amd64 host...
I just saw "fix released" status, so I assumed someone verified the
Ubuntu package...

If an urgent fix is not required (I think nobody complains this issue here
so very few people seem to use LXC/systemd-nspawn containers of armhf on amd64,
which is in my opinion convenient for testing Raspberry Pi SD card images),
then we can just wait the official release of qemu 5.1, which should not have this issue...

If fix of this issue is important (I am doubting if it is),
then
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1887606
still exists in qemu-user-static 1:4.2-3ubuntu6.3 in focal-updates,
armhf container cannot be used on amd64 Focal host, and
a fix in Focal is desirable.

I cannot see if this issue #LP1890881 exists in Focal or not
as #LP1887606 suppress this issue...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So if I could give you a PPA [1] for Focal with the fix for #LP1887606/#1886811 applied you could give it a try on focal right?

I have added a low prio task for focal to 1886811.

Once tested I'd ask you to:
1. state here if this bug (1890881) is present in focal as well
2. state there (1886811) that 1886811 is present in focal
3. state there (1886811) that the PPA helps
4. state there (1886811) how one would test that (to make an SRU template out of it)

I'm still unsure what we should do about this bug 1890881 then without having an isolated fix, but one step at a time ...

[1]: https://launchpad.net/~ci-train-ppa-service/+archive/ubuntu/4197

Note to myself, branch "lp-1890881-test-fix-1886811-in-FOCAL"

Revision history for this message
Ryutaroh Matsumoto (emojifreak) wrote :

> you could give it a try on focal right?

Yes, I can and I will.

> I'm still unsure what we should do about this bug 1890881 then without
> having an isolated fix, but one step at a time ...

I had a similar feeling: After seeing #LP1886811 fixed in
upstream QEMU, "systemd-nspawn -b" started fine on some architectures
and failed on others with self-compiled QEMU binaries.
But failing architectures were not used in my job, I did not file a
but report...

By the way, QEMU 5.1.0 was released.
https://download.qemu.org/qemu-5.1.0.tar.xz

Revision history for this message
Ryutaroh Matsumoto (emojifreak) wrote :

@paelzer, I got below. This seems to mean that #LP1887606 is fixed
in 1:4.2-3ubuntu6.4~ppa1 while #LP1890881 remains.

By the way, debian Bullseye 1:5.1+dfsg-0exp1 worked fine as far as I see.

Preparing to unpack .../qemu-user-static_1%3a4.2-3ubuntu6.4~ppa1_amd64.deb ...
Unpacking qemu-user-static (1:4.2-3ubuntu6.4~ppa1) over (1:4.2-3ubuntu6.3) ...
Setting up qemu-user-static (1:4.2-3ubuntu6.4~ppa1) ...
Processing triggers for man-db (2.9.1-1) ...
root@ryutaroh-CFSZ6-1L:/home/ryutaroh# systemctl restart binfmt-support
root@ryutaroh-CFSZ6-1L:/home/ryutaroh# systemd-nspawn -M armhf-focal -b
Spawning container armhf-focal on /var/lib/machines/armhf-focal.
Press ^] three times within 1s to kill container.
systemd 245.4-4ubuntu3 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization systemd-nspawn.
Detected architecture arm.

Welcome to Ubuntu 20.04 LTS!

Set hostname to <armhf-focal>.
Caught <SEGV>, dumped core as pid 3.
Exiting PID 1...
Container armhf-focal failed with error code 255.
root@ryutaroh-CFSZ6-1L:/home/ryutaroh#

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

®Rafael you said you were "able to reproduce LP: #1886811 back then." - is that true for this bug here as well. It tries to hide from me (fails before I ever get to it). If you can recreate it maybe we can sync so I can set up a a git bisect or such.

@Ryutaroh - thanks for the test already.
5.1 has many (probably too many) new things to throw it into Groovy so close before feature freeze.
If possible I'd prefer an individual fix.

Changed in qemu (Ubuntu Focal):
status: New → Confirmed
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm, I can recreate the issue myself.
So I thought - yeah onto bisect !

But I fail to get a self built /usr/bin/qemu-arm-static into this.
I wanted to see if 5.1 from git would work and then bisect from there.
But no matter what I try it keeps breaking on the same error.

/me starts to wonder where in the chain of KVM host, qemu-arm-static in the guest, systemd-nspawn in the guest that binary really would need to be.

At least
$ sudo apt remove qemu-user-static
$ sudo systemd-nspawn -D /var/lib/machines/armhf-focal -M armhf-focal -b
Spawning container armhf-focal on /var/lib/machines/armhf-focal.
Press ^] three times within 1s to kill container.
execv(/usr/lib/systemd/systemd, /lib/systemd/systemd, /sbin/init) failed: Exec format error
Container armhf-focal failed with error code 1.

So the qemu-static binfmt is used as I'd have assumed.

Init is systemd which is:
$ sudo file /var/lib/machines/armhf-focal/lib/systemd/systemd
/var/lib/machines/armhf-focal/lib/systemd/systemd: ELF 32-bit LSB shared object, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /lib/ld-linux-armhf.so.3, BuildID[sha1]=3ca7712fe69ab99bb55585c35af762a2491fc856, for GNU/Linux 3.2.0, stripped

But even if I delete ALL /usr/bin/qemu* binaries it still does the same.

$ sudo systemd-nspawn -D /var/lib/machines/armhf-focal -M armhf-focal -b
Spawning container armhf-focal on /var/lib/machines/armhf-focal.
Press ^] three times within 1s to kill container.
systemd 245.4-4ubuntu3 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=hybrid)
Detected virtualization systemd-nspawn.
Detected architecture arm.

Welcome to Ubuntu 20.04 LTS!

Set hostname to <groovy>.
Caught <SEGV>, dumped core as pid 3.
Exiting PID 1...
Container armhf-focal failed with error code 255.

The target in /var/lib/machines/armhf-focal/ only has core dumps of qemu_systemd and bash completions. It can't come from there either.

I beg your pardon for not being a nspawn+mmbootstrap expert but WTH where would I need to place my new qemu binary that I need to test ?

@Rafael - if you see where I'm blocking my own insight please feel free to speak up and let me know :-)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Download full text (3.3 KiB)

I disabled/enabled arm and this confirmed it is the interpreter being used.

$ sudo update-binfmts --disable qemu-arm
# now fails missing interpreter

Eventually I found I need to unimport&import it after exchanging the binary.
Then I get it to use the new version.

Note (disable all for some ease to read)
$ for i in /usr/share/binfmts/qemu-*; do sudo update-binfmts --unimport $i; done

$ sudo update-binfmts --unimport /usr/share/binfmts/qemu-arm
$ sudo cp /usr/bin/qemu-arm-static.pkg /usr/bin/qemu-arm-static
$ sudo update-binfmts --import /usr/share/binfmts/qemu-arm
$ sudo systemd-nspawn -D /var/lib/machines/armhf-focal -M armhf-focal -b
...
works now

$ machinectl
MACHINE CLASS SERVICE OS VERSION ADDRESSES
armhf-focal container systemd-nspawn ubuntu 20.04 -

And I see it running properly with its tree using qemu static:
$ sudo machinectl status armhf-focal
armhf-focal(a008dc84011f4382b74546f73f43cbaa)
           Since: Tue 2020-08-18 12:38:02 UTC; 2min 59s ago
          Leader: 81194 (systemd)
         Service: systemd-nspawn; class container
            Root: /var/lib/machines/armhf-focal
              OS: Ubuntu 20.04 LTS
            Unit: machine-armhf\x2dfocal.scope
                  └─payload
                    ├─init.scope
                    │ └─81194 /usr/bin/qemu-arm-static /lib/systemd/systemd
                    └─system.slice
                      ├─accounts-daemon.service
                      │ └─82253 /usr/bin/qemu-arm-static /usr/lib/accountsservice/accounts-daemon
                      ├─console-getty.service
                      │ └─82301 /usr/bin/qemu-arm-static /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 xterm-256color
                      ├─cron.service
                      │ └─82254 /usr/bin/qemu-arm-static /usr/sbin/cron -f
                      ├─dbus.service
                      │ ├─81377 /usr/bin/qemu-arm-static /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
                      │ ├─81689 /usr/bin/qemu-arm-static /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
                      │ └─82256 /usr/bin/qemu-arm-static /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-only
                      ├─networkd-dispatcher.service
                      │ └─82265 /usr/bin/qemu-arm-static /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
                      ├─rsyslog.service
                      │ └─82267 /usr/bin/qemu-arm-static /usr/sbin/rsyslogd -n -iNONE
                      ├─systemd-journald.service
                      │ └─82065 /usr/bin/qemu-arm-static /lib/systemd/systemd-journald
                      ├─systemd-logind.service
                      │ └─82271 /usr/bin/qemu-arm-static /lib/systemd/systemd-logind
                      └─systemd-resolved.service
                        └─82250 /usr/bin/qemu-arm-static /lib/systemd/systemd-resolved

There is one issue left to replace this in a bisect loop around
  cp: cannot create regular file '/usr/bin/qemu-arm-static': Text fi...

Read more...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

commit 65b261a63a48fbb3b11193361d4ea0c38a3c3dfd
Author: Laurent Vivier <email address hidden>
Date: Thu Jul 9 09:23:32 2020 +0200

    linux-user: add netlink RTM_SETLINK command

    This command is needed to be able to boot systemd in a container.

      $ sudo systemd-nspawn -D /chroot/armhf/sid/ -b
      Spawning container sid on /chroot/armhf/sid.
      Press ^] three times within 1s to kill container.
      systemd 245.6-2 running in system mode.
      Detected virtualization systemd-nspawn.
      Detected architecture arm.

      Welcome to Debian GNU/Linux bullseye/sid!

      Set hostname to <virt-arm>.
      Failed to enqueue loopback interface start request: Operation not supported
      Caught <SEGV>, dumped core as pid 3.
      Exiting PID 1...
      Container sid failed with error code 255.

    Signed-off-by: Laurent Vivier <email address hidden>
    Message-Id: <email address hidden>

 linux-user/fd-trans.c | 1 +
 1 file changed, 1 insertion(+)

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

But that makes no sense, since that is the patch we already carry in debian/patches/linux-user-add-netlink-RTM_SETLINK-command.patch

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

So some other patch between 5.0 and 65b261a6 is required "as well" but can't be seen in the bisecting as only later after 65b261a6 the test is successful.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Ignoring all changes later han RTM_SETLINK and those that sound related I came to:

  1 pick 13a0c21e64 linux-user/arm: BKPT should cause SIGTRAP, not be a syscall
  2 pick 62f141a426 linux-user/arm: Remove bogus SVC 0xf0002 handling
  3 pick ab546bd238 linux-user/arm: Handle invalid arm-specific syscalls correctly
  4 pick 3986a1721e linux-user/arm: Fix identification of syscall numbers
  5 pick 268b1b3dfb target/arm: Allow user-mode code to write CPSR.E via MSR
  6 pick 45e2813964 linux-user/arm: Reset CPSR_E when entering a signal handler
  7 pick fafe722927 linux-user/arm/signal.c: Drop TARGET_CONFIG_CPU_32
  8 pick 538fabcb46 linux-user: return target error codes for socket() and prctl()
  9 pick 2d92c6827c linux-user: implement OFD locks
 10 pick e865b97ff4 linux-user: syscall: ioctls: support DRM_IOCTL_VERSION
 11 pick d9679ee592 linux-user: add new netlink types
 12 pick 65b261a63a linux-user: add netlink RTM_SETLINK command

v5.0 + the above still fails, so it must be something else.
Maybe I need to bisect "5.0 + 65b261a63a", but not today

Revision history for this message
Ryutaroh Matsumoto (emojifreak) wrote :

> Eventually I found I need to unimport&import it after exchanging the binary.
> Then I get it to use the new version.

Yes... I also needed to run "systemctl restart binfmt-format",
which looks equivalent.

After qemu-user-static package is upgraded, user space emulation sometimes
fails, so I suspect that qemu-user(-static) upgrade script also needs to
run "systemctl restart binfmt-format" or equivalent.
But I am not completely sure and I have filed no bug report
to neither Ubuntu nor Debian...

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

git bisect reset
git bisect start
git bisect old v5.0.0
git bisect new 65b261a63a
#And then on each stop:
git cherry-pick 65b261a63a
# And some tweaks to convince bisect to do the right thing

That brings me to:
commit ee94743034bfb443cf246eda4971bdc15d8ee066
Author: Alex Bennée <email address hidden>
Date: Wed May 13 18:51:28 2020 +0100

    linux-user: completely re-write init_guest_space

If on the former aae8b87e "travis.yml: Improve the --disable-tcg test on s390x" + 65b261a63a the test fails, so ee947430 "linux-user: completely re-write init_guest_space" really is a hard dependency for the fix to work.

Trying v5.0 + ee947430 + 65b261a63a works as well.
So that is the fix for groovy - and being unreleased that seems fine for there.
But given the size of the patch I'll need to re-read it a few times if that is SRUable to Focal as well.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

@Ryutaroh - yes the package updates thes on install/upgrades
But I needed something to do on bisect - well it worked and I found what is needed.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

There are a bunch of Fixes for this commit in 5.1 and some dependencies to those.
These we need to pull in as well here - which makes it even less SRUable :-/

I guess we will fix it in Groovy (20.10) and users that rely on this valid but rare use case can use groovy or -later on- the Ubuntu Cloud Archive [1] that will backport the qemu of groovy together with Openstack Victoria then.

commit 2667e069e7b5807c69f32109d930967bc1b222cb
Author: Alex Bennée <email address hidden>
Date: Fri Jul 24 07:45:01 2020 +0100

    linux-user: don't use MAP_FIXED in pgd_find_hole_fallback

commit c1f6ad798c7bb328a6f387f2509bf86305383d37
Author: Alex Bennée <email address hidden>
Date: Wed Jul 1 14:56:45 2020 +0100

    linux-user/elfload: use MAP_FIXED_NOREPLACE in pgb_reserved_va

commit 5c3e87f345ac93de9260f12c408d2afd87a6ab3b
Author: Alex Bennée <email address hidden>
Date: Fri Jun 5 16:49:27 2020 +0100

    linux-user: deal with address wrap for ARM_COMMPAGE on 32 bit

commit ad592e37dfccf730378a44c5fa79acb603a7678d
Author: Alex Bennée <email address hidden>
Date: Fri Jun 5 16:49:26 2020 +0100

    linux-user: provide fallback pgd_find_hole for bare chroots

commit a932eec49d9ec106c7952314ad1adc28f0986076
Author: Alex Bennée <email address hidden>
Date: Thu May 21 14:57:48 2020 +0100

    linux-user: limit check to HOST_LONG_BITS < TARGET_ABI_BITS

[1]: https://wiki.ubuntu.com/OpenStack/CloudArchive

Changed in qemu (Ubuntu Focal):
status: Confirmed → Won't Fix
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

FYI - Tried on the proposed PPA, working now

Revision history for this message
Ryutaroh Matsumoto (emojifreak) wrote :

Thank you very much.
qemu-user-static 1:5.0-5ubuntu5~ppa2
also works fine for me!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package qemu - 1:5.0-5ubuntu6

---------------
qemu (1:5.0-5ubuntu6) groovy; urgency=medium

  * d/p/ubuntu/lp-1887935-vfio-ccw-allow-non-prefetch-ORBs.patch: fix boot
    from vfio-ccw (LP: #1887935)

qemu (1:5.0-5ubuntu5) groovy; urgency=medium

  * fix qemu-user-static initialization to allow executing systemd
    (LP: #1890881)
    - d/p/u/lp1890881-linux-user-completely-re-write-init_guest_space.patch
    - d/p/u/lp1890881-linux-user-deal-with-address-wrap-for-ARM_COMMPAGE-o.patch
    - d/p/u/lp1890881-linux-user-don-t-use-MAP_FIXED-in-pgd_find_hole_fall.patch
    - d/p/u/lp1890881-linux-user-elfload-use-MAP_FIXED_NOREPLACE-in-pgb_re.patch
    - d/p/u/lp1890881-linux-user-limit-check-to-HOST_LONG_BITS-TARGET_ABI_.patch
    - d/p/u/lp1890881-linux-user-provide-fallback-pgd_find_hole-for-bare-c.patch
  * fix assertion failue in net_tx_pkt_add_raw_fragment (LP: #1891187)
    CVE-2020-16092
    - d/p/u/lp-1891187-hw-net-net_tx_pkt-fix-assertion-failure-in-net_tx.patch

 -- Christian Ehrhardt <email address hidden> Tue, 25 Aug 2020 11:09:12 +0200

Changed in qemu (Ubuntu):
status: Triaged → Fix Released
Changed in qemu (Debian):
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.