focal-live-server install fails on s390x (serial: >=20200327)

Bug #1869402 reported by Paride Legovini
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Undecided
Unassigned
curtin
Invalid
Undecided
Unassigned
subiquity
Fix Released
Undecided
Unassigned

Bug Description

Starting from the image with serial 20200327 the focal-live-server image is not installable anymore in s390x. The problem is storage related, and on 2020-03-27 subiquity changed the curtin snapshot it imports in the snap [1] to bf03e4f78744449a7dc45f4cff2134ef3b989a3b, which is a refactor of the storage config, so that's where I'd look. The arm64 image is failing too, but I didn't check if the failure mode is the same (yet).

It seems that despite the logs not logging a failure on:

Running command ['mount', '-t', 'ext4', '-o', 'defaults', '/dev/vda1', '/target/'] with allowed return codes [0] (capture=True)

/target is not mounted during the installation. Attached are the full logs.

[1] https://github.com/CanonicalLtd/subiquity/commit/c87e8a15d6d988021c175afc26ca8442daa38cf3

Revision history for this message
Paride Legovini (paride) wrote :
Paride Legovini (paride)
tags: added: rls-ff-incoming
Revision history for this message
Ryan Harper (raharper) wrote :

Mar 27 17:02:11 ubuntu-server curtin_log.1819[1946]: Running command ['mount', '-t', 'ext4', '-o', 'defaults', '/dev/vda1', '/target/'] with allowed return codes [0] (capture=True)
Mar 27 17:02:11 ubuntu-server curtin_log.1819[1946]: Running command ['udevadm', 'info', '--query=property', '--export', '/dev/vda1'] with allowed return codes [0] (capture=True)
Mar 27 17:02:11 ubuntu-server kernel: [ 88.992774] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)

Curtin has clearly mounted it to /target

The only thing we don't know is what the state of the source dir is:

start: cmd-install/stage-extract/builtin/cmd-extract: acquiring and extracting image from cp:///media/filesystem

Running command ['sh', '-c', 'mkdir -p "$2" && cd "$2" && rsync -aXHAS --one-file-system "$1/" .', '--', '/media/filesystem', '/target']

so; if possible, ls -al /media/filesystem ?

There's also a error in the journal that crashed a python process

Mar 27 17:02:12 ubuntu-server kernel: [ 89.617245] report_user_fault: 3 callbacks suppressed
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617246] User process fault: interruption code 0001 ilc:1 in libstdc++.so.6.0.21[3ff7e530000+9000]
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617258] CPU: 0 PID: 1848 Comm: python3 Tainted: P O 5.4.0-18-generic #22-Ubuntu
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617259] Hardware name: IBM 2964 N63 400 (KVM/Linux)
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617260] User PSW : 0705200180000000 000003ff7e53252a
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617262] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:1 AS:0 CC:2 PM:0 RI:0 EA:3
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617263] User GPRS: 00000000000000c8 000003ff7e532528 000003ff6c0a76d0 000003ff6c0008d0
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617264] 000003ff7e6e3466 000003ff6c098a60 0000000000860c6e 000003ff6c09b3a0
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617265] 000003ff6c0896b8 000003ff6c0b50b8 0000000000000001 000003ff6c0896b0
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617266] 000003ff7e7d5518 000003ff6c0896b0 000003ff7e6e34ec 000003ff76ff8d58
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617274] User Code: 000003ff7e532524: 7e532500 au %f5,1280(%r3,%r2)
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617274] #000003ff7e532528: 0000 illegal
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617274] >000003ff7e53252a: 03ff unknown
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617274] 000003ff7e53252c: 7e4275a8 au %f4,1448(%r2,%r7)
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617274] 000003ff7e532530: 0000 illegal
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617274] 000003ff7e532532: 03ff unknown
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617274] 000003ff7e532534: 7e4275c0 au %f4,1472(%r2,%r7)
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617274] 000003ff7e532538: 0000 illegal
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617282] Last Breaking-Event-Address:
Mar 27 17:02:12 ubuntu-server kernel: [ 89.617285] [<000003ff7e6e34ea>] 0x3ff7e6e34ea

Revision history for this message
Ryan Harper (raharper) wrote :

I'm marking curtin task invalid. At this point, curtin logs show it doing everything it should; please reopen if you find out more information that points to curtin not doing the right thing.

Changed in curtin:
status: New → Invalid
Revision history for this message
Paride Legovini (paride) wrote :

Thanks for the feedback Ryan. This failure mode does not always happen, for example the latest ISO testing job on s390x succeeded, however I can easily reproduce it by attempting a manual install like this:

virt-install --name focal-test --memory 4096 --disk disk-image.img,bus=virtio,size=8 --cdrom /path/to/focal-live-server-s390x.iso

After subiquity detect the error, if I jump to a shell, /target is not mounted, however I think this is because subiquity unmounts it at some point:

DEBUG subiquitycore.utils:83 arun_command ['/snap/subiquity/1570/usr/bin/python3', '-m', 'curtin', 'unmount', '-t', '/target'] exited with code 0

The Python crash you pointed out seems to be the first point where the install process fails. This is from another run:

https://paste.ubuntu.com/p/6SJvCB3whx/

Revision history for this message
Paride Legovini (paride) wrote :

Still happens with the beta image; problem reported on the ISO tracker.

summary: - focal-live-server install fails on s390x (serial: 20200327)
+ focal-live-server install fails on s390x (serial: >=20200327)
Revision history for this message
Ubuntu QA Website (ubuntuqa) wrote :

This bug has been reported on the Ubuntu ISO testing tracker.

A list of all reports related to this bug can be found here:
http://iso.qa.ubuntu.com/qatracker/reports/bugs/1869402

tags: added: iso-testing
Revision history for this message
Ryan Harper (raharper) wrote :

Nothing in subiquity or curtin python can cause an illegal instruction
directly; it's byte-code running in python3. python3 is not linked to
libstdc++ so maybe one of the programs that curtin execs is linked to
libstdc++

It looks like sgdisk is linked to libstdc++; I don't know if this
is related or not. This likely needs some s390x library/kernel
debugging to get further.

Revision history for this message
Paride Legovini (paride) wrote :

The same crash was reported in LP: #1854965, but apparently it just went away at some point, the root cause was not found.

Frank Heimes (fheimes)
tags: added: installer s390x
Revision history for this message
Frank Heimes (fheimes) wrote :

Just tried this on a 20.04 kvm host with a 20.04 guest, but with first of all (manually: snap refresh) updating to edge (passed over: subiquity-channel=latest/edge/case), so that I used subiquity 20.03.3+git24.2ae75954 and I could complete the installation.
(Hoping that the issue you reported, Paride, is not happening sporadically ...)

Revision history for this message
Paride Legovini (paride) wrote :

Thanks Frank for testing this out. I just retried with the latest Focal live-server image (serial: 20200331), and hit the problem again. This on a Bionic host and installing with this command:

virt-install --name focal-test --memory 4096 --disk focal-test.img,bus=virtio,size=8 --cdrom /path/to/focal-live-server-s390x.iso

Found this in the kernel log again:

User process fault: interruption code 0001 ilc:1 in libstdc++.so.6.0.21[3ff88830000+9000]

Revision history for this message
Paride Legovini (paride) wrote :

I am not sure this happens all the time: the automated daily ISO tests do sometimes succeed, but this happens at least very, very often. I can't recall being able to finish a manual install since when I opened this bug.

Revision history for this message
Paride Legovini (paride) wrote :

Attaching /var/log and /var/crash of a failed install. In /var/crash we have:

1585753281.509275198.block_probe_fail.crash
1585753281.509275198.block_probe_fail.meta
1585753281.951209307.block_probe_fail.crash
1585753281.951209307.block_probe_fail.meta
1585753283.064648390.block_probe_fail.crash
1585753283.064648390.block_probe_fail.meta
1585753283.196457148.block_probe_fail.crash
1585753283.196457148.block_probe_fail.meta
1585753283.197583437.disk_probe_fail.crash
1585753283.197583437.disk_probe_fail.meta
1585753283.900408268.block_probe_fail.crash
1585753283.900408268.block_probe_fail.meta
1585753283.901388884.disk_probe_fail.crash
1585753283.901388884.disk_probe_fail.meta
1585753291.641262770.install_fail.crash
1585753291.641262770.install_fail.meta

There are similarities with:

https://bugs.launchpad.net/subiquity/+bug/1865027

Revision history for this message
Paride Legovini (paride) wrote :

I tested it only a couple of times, but subiquity edge does not appear to crash.

Revision history for this message
Joshua Powers (powersj) wrote :

Fixes are committed to edge, will be published to stable soon. Not sure of actual root cause, but not reproducing anymore.

Changed in ubuntu-z-systems:
status: New → Fix Committed
Changed in subiquity:
status: New → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :

I think this is fixed since quite a while.
I at least change the project part to Fix Released.

Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Changed in subiquity:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.