qemu-debootstrap second stage hangs indefinitely

Bug #1712534 reported by Juerg Haefliger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Invalid
Low
Unassigned
qemu (Ubuntu)
Invalid
Low
Unassigned

Bug Description

I'm trying to build a foreign architecture chroot (on artful) using qemu-debootstrap but the result is an indefinite hang during the second stage.

To reproduce:

$ sudo qemu-debootstrap --arch=s390x xenial /tmp/chroot-xenial
<snip>
I: Extracting perl-base...
I: Extracting procps...
I: Extracting sed...
I: Extracting sensible-utils...
I: Extracting systemd...
I: Extracting systemd-sysv...
I: Extracting sysv-rc...
I: Extracting sysvinit-utils...
I: Extracting tar...
I: Extracting tzdata...
I: Extracting util-linux...
I: Extracting zlib1g...
I: Running command: chroot /tmp/chroot-xenial /debootstrap/debootstrap --second-stage
<hang>

Package versions:

$ dpkg -l qemu-user-static debootstrap
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==================-==============-==============-=========================================
ii debootstrap 1.0.91ubuntu1 all Bootstrap a basic Debian system
ii qemu-user-static 1:2.10~rc3+dfs amd64 QEMU user mode emulation binaries (static

Juerg Haefliger (juergh)
description: updated
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi Jürg,
never tried that as I had real s390x systems.
I need time to reproduce and analyze that properly.

But until then this sounded remotely known and I remember bug 1643619
Could you have a look if it appears the same (but for this arch) to you?

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Cross s390x was always as unsupported as possible to avoid people migrating off.
Never seen it working, but let me try your case.

A little matrix:

Host/Guest->
v amd64 s390x ppc64el
amd64 Y N Y
s390x Y Y slow-Y
ppc64el slow-Y N Y

I'd think it is just not meant to fully work.
It is a qemu crash after all so I could take a look, but knowing some history on how much good working s390x on x86 is "wanted" this is prio <low.

OTOH your Team has a set of real s390x KVM and z/VM systems, so for your testing try to use some of those?

Changed in qemu (Ubuntu):
status: New → Confirmed
importance: Undecided → Low
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

I asked to mirror that back to IBM if that is even supposed to work atm.

Revision history for this message
Juerg Haefliger (juergh) wrote :

Did some more testing and it doesn't seem to be a QEMU problem. I have an s390x VM and if I mount that image I can chroot into it just fine (after copying qemu-s390x-static).

strace shows that chroot is hanging on a futex:

rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], ~[BUS KILL SEGV STOP], 8) = 0
getpid() = 19585
gettid() = 19585
tgkill(19585, 19585, SIGABRT) = 0
rt_sigprocmask(SIG_SETMASK, ~[BUS KILL SEGV STOP], NULL, 8) = 0
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
futex(0x604eff00, FUTEX_WAIT_PRIVATE, 2, NULL

Not sure what's going on. As a test, I copied /lib and /bin from the VM image into the chroot and after that I was able to chroot into the chroot dir just fine. Any ideas?

Btw, this process (using qemu-debootstrap) works just fine for building an arm64 chroot (on an amd64 host).

Revision history for this message
Juerg Haefliger (juergh) wrote :

Correction: That is actually a Debian VM image, not Ubuntu.

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hmm, so the created chroot env hangs on a native s390x - interesting.
Thanks for the cross check Jürg.

For the arm64 note, yes as it works for many other architecture combinations as seen in my (rather unreadable) matrix in comment #2.

I wonder very much as I got it working on s390x host by just running the qemu-debootstrap (which falls back to no qemu needed as host=guest).

In the case you tried on s390x and it hung the chrooting - did you create the chroot on x86 and then copy it over?
Or did you run the qemu-debootstrap on s390x?
Did you run it on the ?older? Debian VM there?

OTOH The strace doesn't give me a ringing bell what it could be.
But if it really is 100% reproducible on the s390x host.
And if exchanging /lib & /bin really unlocks it how about copying the files in a loop and testing until it succeeds ?

You said you copied from Debian VM into the chroot, would you mind trying that with a Ubuntu VM.
And if reasonable/possible the files in a loop to check on which file it suddenly resolves to work?

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-158057 severity-high targetmilestone-inin1604
bugproxy (bugproxy)
tags: added: targetmilestone-inin1710
removed: targetmilestone-inin1604
Revision history for this message
Juerg Haefliger (juergh) wrote :

Sorry for the confusion, I did *not* try this on a s390x host. All my testing is done on an amd64 host. I'll do some more digging before I post more results/data and this time I'm trying to do this before I'm tired so that my comments end up making more sense (hopefully) :-P

bugproxy (bugproxy)
tags: added: severity-low
removed: severity-high
Revision history for this message
Juerg Haefliger (juergh) wrote :

More test results:

1) Copying qemu-s390x-static from artful into a debian chroot -> chroot works fine.
2) Copying qemu-s390x-static from artful into an artful chroot -> chroot hangs (default case).
3) Copying qemu-s390x-static from xenial into an artful chroot -> chroot aborts (core dumped).

Since this works with a debian chroot, it looks like qemu is not the problem here.

Also ,entering the artful chroot like the following also works:
  $ chroot artful-s390x /bin/static-sh

Static-sh is using static busybox. But as soon as I enter a different command the chroot hangs again. Looks like a libc issue now.

Revision history for this message
Juerg Haefliger (juergh) wrote :

Attached gdb to the hanging qemu-s390x-static process:

(gdb) info threads
  Id Target Id Frame
* 1 Thread 0x63a43900 (LWP 8276) "bash" __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
  2 Thread 0x7fe82c80f700 (LWP 8277) "bash" 0x000000006017fd09 in syscall ()

(gdb) bt
#0 __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1 0x0000000060111c0d in __pthread_mutex_lock (mutex=mutex@entry=0x604eff00 <tcg_ctx+288>) at ../nptl/pthread_mutex_lock.c:80
#2 0x00000000600c46da in qemu_mutex_lock (mutex=mutex@entry=0x604eff00 <tcg_ctx+288>) at ./util/qemu-thread-posix.c:65
#3 0x0000000060032ce3 in tb_lock () at ./accel/tcg/translate-all.c:170
#4 cpu_restore_state (cpu=cpu@entry=0x63a59060, retaddr=retaddr@entry=1611863860) at ./accel/tcg/translate-all.c:353
#5 0x0000000060031078 in handle_cpu_signal (old_set=0x7ffeb1cdef68, is_write=<optimized out>, address=<optimized out>, pc=1611863858) at ./user-exec.c:124
#6 cpu_s390x_signal_handler (host_signum=<optimized out>, pinfo=pinfo@entry=0x7ffeb1cdef70, puc=0x7ffeb1cdee40) at ./user-exec.c:229
#7 0x000000006004d341 in host_signal_handler (host_signum=11, info=0x7ffeb1cdef70, puc=0x7ffeb1cdee40) at ./linux-user/signal.c:646
#8 <signal handler called>
#9 0x0000000060131732 in abort ()
#10 0x0000000060058cdd in op_risbg (s=<optimized out>, o=0x7ffeb1cdf690) at ./target/s390x/translate.c:3390
#11 0x0000000060062bf9 in translate_one (env=<optimized out>, s=0x7ffeb1cdf6c0) at ./target/s390x/translate.c:5750
#12 gen_intermediate_code (cs=cs@entry=0x63a59060, tb=tb@entry=0x6050fa40 <static_code_gen_buffer+43376>) at ./target/s390x/translate.c:5851
#13 0x0000000060032f1f in tb_gen_code (cpu=cpu@entry=0x63a59060, pc=pc@entry=274886346830, cs_base=cs_base@entry=0, flags=flags@entry=3, cflags=<optimized out>,
    cflags@entry=0) at ./accel/tcg/translate-all.c:1283
#14 0x0000000060031f79 in tb_find (tb_exit=0, last_tb=0x0, cpu=0x0) at ./accel/tcg/cpu-exec.c:367
#15 cpu_exec (cpu=cpu@entry=0x63a59060) at ./accel/tcg/cpu-exec.c:675
#16 0x0000000060034470 in cpu_loop (env=env@entry=0x63a612f0) at ./linux-user/main.c:3236
#17 0x000000006000145b in main (argc=<optimized out>, argv=0x7ffeb1ce00b8, envp=<optimized out>) at ./linux-user/main.c:4862

(gdb) frame 1
#1 0x0000000060111c0d in __pthread_mutex_lock (mutex=mutex@entry=0x604eff00 <tcg_ctx+288>) at ../nptl/pthread_mutex_lock.c:80
80 ../nptl/pthread_mutex_lock.c: No such file or directory.

(gdb) print mutex.__data
$3 = {__lock = 2, __count = 0, __owner = 8276, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}

Hmm, thread 1 is waiting on a lock that is owned by ... thread 1. Not good.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

ubuntu targets zEC12+ which qemu cannot emulate on non-s390x hardware.
debian has lower ABI.
it is not expected for qemu-debootstrap to work to create ubuntu s390x chroots on non-s390x hosts.
I believe all of the above is (non-)working as expected.

Changed in qemu (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Thanks for confirming my expectation xnox!.
Good catch on the ABI I didn't realize we were higher than Debian, thought they also moved - but that totally makes sense.

Revision history for this message
Juerg Haefliger (juergh) wrote :

I don't understand. I can execute a static s390x busybox binary just fine using qemu-s390x-static. Things fall over as soon as glibc gets involved. Which makes me think it's a glibc issue and not a qemu problem. What am I missing?

Revision history for this message
Juerg Haefliger (juergh) wrote :

Ok, so I think I finally understand what's going on. Ubuntu is compiling the s390x binaries for an architecture that is too new for QEMU to emulate. This is not expected to work, so closing the ticket.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-09-01 03:17 EDT-------
IBM Bugzilla Status -> Rejected.. see last comment

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Invalid
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.