armhf/armel cmake hangs when run with qemu-arm-static on amd64 host

Bug #1764555 reported by David Lechner on 2018-04-16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Ubuntu)

Bug Description

This is essentially a duplicate, at least symptom-wise, of #955379, but I am opening a new issue since that one was marked as fixed.

I have a script I use that uses pbuilder to build packages for Debian armel and armhf targets that runs on an Ubuntu bionic amd64 host. Packages that use CMake for the building tend to hand when searching for libraries. The hang is not always at the same place though, which implies a race condition. The larger the package that is being built, the more likely there is to be a hang. In particular, the opencv package is large enough, that it hangs every time and I cannot get it to build.

Steps to reproduce:

# on ubuntu bionic amd64 host
sudo apt-add-repository ppa:ev3dev/tools
# assuming apt-add-repository does apt update now
sudo apt install pbuilder-ev3dev git
git clone --depth=1
cd opencv
OS=debian ARCH=armhf DIST=stretch pbuilder-ev3dev base
OS=debian ARCH=armhf DIST=stretch pbuilder-ev3dev dev-build

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: qemu-user-static 1:2.11+dfsg-1ubuntu6
ProcVersionSignature: Ubuntu 4.15.0-13.14-generic 4.15.10
Uname: Linux 4.15.0-13-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.9-0ubuntu4
Architecture: amd64
CurrentDesktop: GNOME
Date: Mon Apr 16 17:35:27 2018
InstallationDate: Installed on 2013-05-13 (1799 days ago)
InstallationMedia: This
MachineType: Gigabyte Technology Co., Ltd. To be filled by O.E.M.
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-13-generic root=UUID=5ce09130-6a2a-4c3a-b11e-c1923a6bb767 ro acpi_enforce_resources=lax quiet splash
SourcePackage: qemu
UpgradeStatus: No upgrade log present (probably fresh install) 07/13/2017
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: F1
dmi.board.asset.tag: To be filled by O.E.M. 970A-DS3P FX
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrF1:bd07/13/2017:svnGigabyteTechnologyCo.,Ltd.:pnTobefilledbyO.E.M.:pvrTobefilledbyO.E.M.:rvnGigabyteTechnologyCo.,Ltd.:rn970A-DS3PFX:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvrToBeFilledByO.E.M.: To be filled by O.E.M. To be filled by O.E.M.
dmi.product.version: To be filled by O.E.M.
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

David Lechner (dlech) wrote :
David Lechner (dlech) wrote :

FWIW, I may have noticed this issue a few times when I was running xenial, but it happened so seldom, that I never really took note. On bionic, however, building the same packages using the exact same script, the hang happens much more frequently.

Note: spin-off of old bug 955379

The repro fails complaining about an invlaid treeish even thou it is in the repo you mentioned.

# OS=debian ARCH=armhf DIST=stretch pbuilder-ev3dev dev-build
gbp:info: Building with pbuilder for stretch:armhf
gbp:error: upstream/3.2.0+dfsg is not a valid treeish

I was trying to run that on qemu 2.11.1 and 2.12 as I'm pretty sure - just like the older bug - this is an upstream and not an Ubuntu specific issue.
But without being able to verify that I can't yet reasonably add the upstream qemu task.

Can you by any chance build qemu from git and try with that?

Changed in qemu (Ubuntu):
status: New → Incomplete
David Lechner (dlech) wrote :

> gbp:error: upstream/3.2.0+dfsg is not a valid treeish

If you used --depth=1 as I suggested, this is probably causing the problem. I just suggested this since the repo is about 300MB. I think you can fix it with `git pull --unshallow`.

I'll see if I can make some time to build and test upstream qemu.

David Lechner (dlech) wrote :

I'm having trouble building static qemu from git. I did `./configure --static --enable-debug` and get the following error:

  LINK aarch64-softmmu/qemu-system-aarch64
/usr/bin/x86_64-linux-gnu-ld: cannot find -lcacard
/usr/bin/x86_64-linux-gnu-ld: cannot find -lcacard
/usr/bin/x86_64-linux-gnu-ld: cannot find -lusbredirparser
/usr/bin/x86_64-linux-gnu-ld: cannot find -ludev
/usr/bin/x86_64-linux-gnu-ld: cannot find -lgtk-3
/usr/bin/x86_64-linux-gnu-ld: cannot find -latk-bridge-2.0
/usr/bin/x86_64-linux-gnu-ld: cannot find -latspi
/usr/bin/x86_64-linux-gnu-ld: cannot find -lsystemd
/usr/bin/x86_64-linux-gnu-ld: cannot find -lgdk-3
/usr/bin/x86_64-linux-gnu-ld: cannot find -lwayland-egl
/usr/bin/x86_64-linux-gnu-ld: cannot find -lepoxy
/usr/bin/x86_64-linux-gnu-ld: cannot find -lgraphite2
collect2: error: ld returned 1 exit status
Makefile:193: recipe for target 'qemu-system-aarch64' failed
make[1]: *** [qemu-system-aarch64] Error 1
Makefile:478: recipe for target 'subdir-aarch64-softmmu' failed
make: *** [subdir-aarch64-softmmu] Error 2

FWIW, qemu built fine without the --static option, but I need the static version to put into the chroot for testing.

It will need all the dev libs I assume.
Maybe just this:
 $ sudo apt build-dep qemu

Also the configure by default enables "too much", from the last build for static that is in Ubuntu:

../configure --with-pkgversion="Debian 1:2.11+dfsg-1ubuntu6" --extra-cflags="-g -O2 -fdebug-prefix-map=/<<BUILDDIR>>/qemu-2.11+dfsg=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -DCONFIG_QEMU_DATAPATH='\"/usr/share/qemu:/usr/share/seabios:/usr/lib/ipxe/qemu\"' -DVENDOR_UBUNTU" --extra-ldflags="-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,--as-needed" --prefix=/usr --sysconfdir=/etc --libdir=/usr/lib/x86_64-linux-gnu --libexecdir=/usr/lib/qemu --localstatedir=/var --disable-blobs --disable-strip --interp-prefix=/etc/qemu-binfmt/%M --localstatedir=/var
 --static --disable-system
 --target-list="i386-linux-user x86_64-linux-user alpha-linux-user aarch64-linux-user arm-linux-user armeb-linux-user cris-linux-user hppa-linux-user m68k-linux-user microblaze-linux-user microblazeel-linux-user mips-linux-user mipsel-linux-user mips64-linux-user mips64el-linux-user mipsn32-linux-user mipsn32el-linux-user nios2-linux-user or1k-linux-user ppc-linux-user ppc64-linux-user ppc64abi32-linux-user ppc64le-linux-user sh4-linux-user sh4eb-linux-user sparc-linux-user sparc64-linux-user sparc32plus-linux-user s390x-linux-user tilegx-linux-user"

Obviously you can strip targets and dirs for your case, so maybe the following after having the full build deps?

../configure --extra-cflags="-g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -DCONFIG_QEMU_DATAPATH='\"/usr/share/qemu:/usr/share/seabios:/usr/lib/ipxe/qemu\"' --extra-ldflags="-Wl,-Bsymbolic-functions -Wl,-z,relro -Wl,--as-needed" --prefix=/usr --sysconfdir=/etc --libdir=/usr/lib/x86_64-linux-gnu --libexecdir=/usr/lib/qemu --localstatedir=/var --disable-blobs --disable-strip --interp-prefix=/etc/qemu-binfmt/%M --localstatedir=/var
 --static --disable-system
 --target-list="aarch64-linux-user arm-linux-user armeb-linux-user"

Feel free to look through [1] for more.

On the retry to build on my side - I used depth=1
Here from my history file:
  304 sudo apt-add-repository ppa:ev3dev/tools
  305 sudo apt install pbuilder-ev3dev git
  306 apt install pbuilder-ev3dev git
  307 git clone --depth=1
  308 cd opencv/
  309 OS=debian ARCH=armhf DIST=stretch pbuilder-ev3dev base
  310 echo $?
  311 OS=debian ARCH=armhf DIST=stretch pbuilder-ev3dev dev-build

Anyway, if we can get new qemu working for you that would be even better.


David Lechner (dlech) wrote :

What I was trying to say is do NOT use --depth=1. It was bad advice. It did not occur to me that you would need more than one commit, but you do.

I'll keep trying as well.

David Lechner (dlech) wrote :

I managed to get it to build:

./configure --static --enable-debug --target-list=arm-linux-user

Then, I ran my pbuilder script again. It hung in cmake as usual. Here are the stack traces from gdb:

(gdb) bt
#0 0x000000006007b5c2 in process_pending_signals (cpu_env=0x635f6960)
    at /home/david/work/qemu/linux-user/signal.c:7406
#1 0x000000006004fabb in cpu_loop (env=0x635f6960)
    at /home/david/work/qemu/linux-user/main.c:798
#2 0x0000000060051114 in main (argc=52, argv=0x7ffc589855b8,
    envp=0x7ffc58985760) at /home/david/work/qemu/linux-user/main.c:5147
(gdb) info threads
  Id Target Id Frame
* 1 Thread 0x635c2940 (LWP 3644) "cmake" 0x000000006007b5c2 in process_pending_signals (cpu_env=0x635f6960)
    at /home/david/work/qemu/linux-user/signal.c:7406
  2 Thread 0x7ff9644a4700 (LWP 3646) "cmake" 0x0000000060321059 in syscall
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ff9644a4700 (LWP 3646))]
#0 0x0000000060321059 in syscall ()
(gdb) bt
#0 0x0000000060321059 in syscall ()
#1 0x000000006016456d in qemu_futex_wait (
    f=0x628562d8 <rcu_call_ready_event>, val=4294967295)
    at /home/david/work/qemu/include/qemu/futex.h:29
#2 0x0000000060164734 in qemu_event_wait (
    ev=0x628562d8 <rcu_call_ready_event>)
    at /home/david/work/qemu/util/qemu-thread-posix.c:445
#3 0x000000006016cdca in call_rcu_thread (opaque=0x0)
    at /home/david/work/qemu/util/rcu.c:261
#4 0x00000000602911db in start_thread (arg=0x7ff9644a4700)
    at pthread_create.c:463
#5 0x0000000060322e3f in clone ()

David Lechner (dlech) wrote :

Oops, ignore the previous comment. CMake was actually not stuck, it was just a *really* long running operation. I should have been suspicious since it did not get "stuck" in the usual place.

David Lechner (dlech) wrote :

Well, this is embarrassing... I just realized that my chroot had a copy of qemu-arm-static in it from before I upgraded to bionic and that binfmt_misc uses that copy instead of the host system copy. I wiped out the chroot and rebuilt it so that the new qemu-arm-static from bionic is included in the chroot. Running pbuilder no longer hangs.

Changed in qemu (Ubuntu):
status: Incomplete → Invalid

> my chroot had a copy of qemu-arm-static in it from before I upgraded to bionic and that binfmt_misc uses that copy instead of the host system copy.

Hmm, I wouldn't have expected it uses the internal one - nice catch - I'm glad it no more hangs for you!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers