[Ubuntu 20.04] zdevs autoconfigured via DPM lead to crash

Bug #1919420 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Canonical Foundations Team
subiquity
Fix Released
Undecided
Skipper Bug Screeners

Bug Description

I tried to install Ubuntu 20.04.2 from the ISO image "ubuntu-20.04.2-live-server-s390x.iso" available at http://cdimage.ubuntu.com/ubuntu/releases/20.04.2/release/ on an LPAR.

For this I used the "Load from ISO image" panel on the HMC with the
"/ubuntu.ins" file. I then used the manual network config dialog
to configure a qeth network device and loaded the ISO image from
the URL: http://cdimage.ubuntu.com/ubuntu/releases/20.04.2/release/ubuntu-20.04.2-live-server-s390x.iso

This all worked nicely (Thanks for bringing back the manual network
configuration!) and gave me a subiquity installer available via "ssh installer@<ip>" as well as on the "Integrated ASCII Console".

In the first installer step I chose "English" as the language,
then the next step is the keyboard layout but there the installer
immediately shows a "Sorry the installer encountered an internal error"
dialog (both via SSH and the integrated ASCII console).
The "Continue" button doesn't get me further and "Sent to Canonical"
then completely crashes the installer (which at least restarts).

I've used the same installation flow successfully with the
Ubuntu 20.10 ISO image. So I don't think I'm doing something
wrong. I also collected a the crash log which I've attached.
I think the relevant part here might be the following:

 2021-03-16 11:19:31,460 INFO subiquity:119 Arguments passed: ['/snap/subiquity/2278/usr/bin/subiquity']
 2021-03-16 11:19:31,460 DEBUG subiquitycore.screen:133 KDGKBTYPE failed OSError(25, 'Inappropriate ioctl for device')
 2021-03-16 11:19:31,461 DEBUG asyncio:54 Using selector: EpollSelector
 2021-03-16 11:19:31,464 DEBUG subiquity.signals:50 connect_signal: l10n:language-selected -> KeyboardController.language_selected
 2021-03-16 11:19:31,465 DEBUG subiquitycore.core:92 known signals: ['l10n:language-selected']
 2021-03-16 11:19:31,465 DEBUG subiquitycore.core:120 starting controllers
 2021-03-16 11:19:31,465 DEBUG subiquitycore.core:123 controllers started
 2021-03-16 11:19:31,465 DEBUG subiquitycore.screen:133 KDGKBTYPE failed OSError(25, 'Inappropriate ioctl for device')
 2021-03-16 11:19:31,466 DEBUG subiquity/Progress/_wait_status:107 start:
 2021-03-16 11:19:31,467 INFO subiquity/Welcome:107 start: starting UI
 2021-03-16 11:19:31,468 DEBUG subiquitycore.screen:133 KDGKBTYPE failed OSError(25, 'Inappropriate ioctl for device')
 2021-03-16 11:19:31,468 DEBUG subiquitycore.screen:133 KDGKBTYPE failed OSError(25, 'Inappropriate ioctl for device')
 2021-03-16 11:19:31,468 DEBUG subiquitycore.screen:133 KDGKBTYPE failed OSError(25, 'Inappropriate ioctl for device')
 2021-03-16 11:19:31,468 DEBUG subiquitycore.screen:133 KDGKBTYPE failed OSError(25, 'Inappropriate ioctl for device')
 2021-03-16 11:19:31,468 DEBUG subiquitycore.screen:133 KDGKBTYPE failed OSError(25, 'Inappropriate ioctl for device')

So I'm guessing the code is probing for a keyboard but we don't
have any or even usb so something fails.

Revision history for this message
bugproxy (bugproxy) wrote : Installer Log

Default Comment by Bridge

tags: added: architecture-s39064 bugnameltc-192018 severity-high targetmilestone-inin20042
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
tags: added: s390x subiquity
tags: added: installer
removed: subiquity
affects: linux (Ubuntu) → subiquity
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Foundations Team (canonical-foundations)
summary: - On s390x interactive subiquity from the 20.04.2 ISO crashes on keyboard
- selection screen breaking install
+ [Ubuntu 20.04] On s390x interactive subiquity from the 20.04.2 ISO
+ crashes on keyboard selection screen breaking install
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
importance: Undecided → High
Revision history for this message
Frank Heimes (fheimes) wrote : Re: [Ubuntu 20.04] On s390x interactive subiquity from the 20.04.2 ISO crashes on keyboard selection screen breaking install

Hi, I'm not convinced that this problem is due to the keyboard change itself, since the installer does performs different tasks asynchronously and in parallel.

I just started a 20.04.2 LPAR install (on a z13) and was able to change the keyboard to German and to "English (US)" without problems (and even back to "English (UK)" - if using the Integrated ASCII Console as well as in a remote ssh installer session.

Btw. it should not be needed to change the keyboard if you perform the installation via a remote ssh connection, because the local keyboard layout is kept in this case.

I have for example a German keyboard active on my local workstation and if I perform an installation using a remote ssh session and proceed with the default language and keyboard (which is "English (US)" and proceed until I finally reach a screen that allows me to enter any characters with the keyboard (the first one is the Proxy screen), I can type for example "Umlauts", € or ß (not that it makes sense, but just to proof that in case of a remote installer connection via ssh, not keyboard change in needed).

So you should be able to perform an installation w/o having to change your keyboard layout in the installer.

Nevertheless, a crash happened in your case and we'll have a look.

Did you do the installation with the help of a remote ssh connection (which is recommended) or using the "Integrated ASCII Console"?
And did you do something 'out of band' in the installer shell?

In case you still have access to the system could you attach the entire /var/log content (and /var/crash - but I guess the already attached file was the only one in /var/crash <in addition to it's corresponding meta file>).

Changed in ubuntu-z-systems:
status: New → Triaged
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-03-17 05:52 EDT-------
(In reply to comment #8)
> Hi, I'm not convinced that this problem is due to the keyboard change
> itself, since the installer does performs different tasks asynchronously and
> in parallel.
>
> I just started a 20.04.2 LPAR install (on a z13) and was able to change the
> keyboard to German and to "English (US)" without problems (and even back to
> "English (UK)" - if using the Integrated ASCII Console as well as in a
> remote ssh installer session.
>
> Btw. it should not be needed to change the keyboard if you perform the
> installation via a remote ssh connection, because the local keyboard layout
> is kept in this case.
>
> I have for example a German keyboard active on my local workstation and if I
> perform an installation using a remote ssh session and proceed with the
> default language and keyboard (which is "English (US)" and proceed until I
> finally reach a screen that allows me to enter any characters with the
> keyboard (the first one is the Proxy screen), I can type for example
> "Umlauts", ? or ? (not that it makes sense, but just to proof that in case
> of a remote installer connection via ssh, not keyboard change in needed).
>
> So you should be able to perform an installation w/o having to change your
> keyboard layout in the installer.

Oh I didn't even attempt to change the keyboard, it just crashed on that
screen without making any changes. But yeah makes sense that at least
via SSH the keyboard layout isn't really relevant.

>
> Nevertheless, a crash happened in your case and we'll have a look.
>
> Did you do the installation with the help of a remote ssh connection (which
> is recommended) or using the "Integrated ASCII Console"?
> And did you do something 'out of band' in the installer shell?

I tried both and had the same effect on both. I also tried the installation
twice, crashing at the same point. I didn't touch the shell until
after the crash.

>
> In case you still have access to the system could you attach the entire
> /var/log content (and /var/crash - but I guess the already attached file was
> the only one in /var/crash <in addition to it's corresponding meta file>).

I've retried a couple of times getting the same crash so there might be duplication but I'll attach it.

Revision history for this message
bugproxy (bugproxy) wrote : /var/crash

------- Comment (attachment only) From <email address hidden> 2021-03-17 05:55 EDT-------

Revision history for this message
bugproxy (bugproxy) wrote : /var/log

------- Comment (attachment only) From <email address hidden> 2021-03-17 05:56 EDT-------

Revision history for this message
Frank Heimes (fheimes) wrote : Re: [Ubuntu 20.04] On s390x interactive subiquity from the 20.04.2 ISO crashes on keyboard selection screen breaking install

Yes that would make sense that the crash just happened on the change keyboard screen, but due to some other work in the background.
Thx for sharing the additional logs.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So this is (in a roundabout way) a failure to parse lszdev output. It expects the value for 'pers' to 'on' or 'off' but here it appears to be 'auto'. Does that sound plausible? I don't actually know what pers=auto would mean, nor how it should be presented to the user...

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Oh, this is a z14/DPM thing? My reading is that pers=auto should probably be presented more-or-less the same as pers=on but maybe we should expose the distinction to the user?

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-03-18 04:23 EDT-------
(In reply to comment #14)
> Oh, this is a z14/DPM thing? My reading is that pers=auto should probably be
> presented more-or-less the same as pers=on but maybe we should expose the
> distinction to the user?

z15 but yes DPM. Where are you thinking of exposing this?

Revision history for this message
Frank Heimes (fheimes) wrote : Re: [Ubuntu 20.04] On s390x interactive subiquity from the 20.04.2 ISO crashes on keyboard selection screen breaking install

Yes, that must be a DPM system.
Niklas, can you confirm that this is a DPM system (maybe also also type/model and if auto-conf is enabled <see below>)?
(Unfortunately I am currently not aware of a direct way to figure that out is a system is in DPM mode from within Linux, 'lszdev --auto-conf' is probably no sufficient since it can be deactivated, and /proc/sysinfo, /proc/cpuinfo and lscpu may at max report a PR/SM system.)

And please could you run:
'lszdev --pairs --columns id,type,on,exists,pers,auto,failed,names' as well as 'lszdev --auto-conf'
in an installer shell (at the very beginning, first screen, before any device configurations were done) and share the output?

The man page does unfortunately not go into much details about the possible "pers" values and their meaning, but I found some more details at the 'Device Drivers, Features and Command' guide (https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ufdd/ludd_c_auto_manage_disp.html).
That let's me think that if a device is listed by 'lszdev --auto-conf' [well, there is a typo in the guide, it's '--auto-conf' not '--auto-config'], it will have a 'yes' in the 'AUTO' column and with that probably marked with 'auto' in the 'pers' column? (the above output would show that)

Since auto-conf data (aka devices) can be overwritten, but only by manually changing it's status (with chzdev) in the current session or with chzdev using 'zdev:early=1' in a previous session, which is both not the case here since all this happens in an early phase of the installation itself, I think it should be okay to treat "pers=auto" like "pers=on".
But it would be good if IBM could confirm this?!

Revision history for this message
Frank Heimes (fheimes) wrote :

Hi Niklas, we obviously commented in parallel.
Well, we don't want to really expose that or distinguish between DPM and non-DPM (more the opposite). The installer does a probing that incl. a 'lszdev --pairs --columns id,type,on,exists,pers,auto,failed,names' call and further processes the output.
And so far we expected only 'on' or 'off' under pers, but you case showed that it can also be 'auto' on a DPM system. Now thinking of bringing it together again ...

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (3.2 KiB)

------- Comment From <email address hidden> 2021-03-18 05:11 EDT-------
Here are the two commands you asked about. I'll ask around how to best handle the auto case, I'm definitely
no expert on zFCP.

root@ubuntu-server:/# lszdev --pairs --columns id,type,on,exists,pers,auto,failed,names
id="0.0.1900" type="zfcp-host" on="yes" exists="yes" pers="auto" auto="yes" failed="no" names=""
id="0.0.1940" type="zfcp-host" on="yes" exists="yes" pers="auto" auto="yes" failed="no" names=""
id="0.0.1980" type="zfcp-host" on="yes" exists="yes" pers="auto" auto="yes" failed="no" names=""
id="0.0.19c0" type="zfcp-host" on="yes" exists="yes" pers="auto" auto="yes" failed="no" names=""
id="0.0.1900:0x5005076309049435:0x4078408400000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdj sg9"
id="0.0.1900:0x5005076309049435:0x4078408500000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdl sg11"
id="0.0.1900:0x5005076309049435:0x407840d800000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdn sg13"
id="0.0.1900:0x5005076309049435:0x4079404200000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdp sg15"
id="0.0.1940:0x5005076309005435:0x4078408400000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdb sg1"
id="0.0.1940:0x5005076309005435:0x4078408500000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdd sg3"
id="0.0.1940:0x5005076309005435:0x407840d800000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdf sg5"
id="0.0.1940:0x5005076309005435:0x4079404200000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdh sg7"
id="0.0.1980:0x5005076309045435:0x4078408400000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdi sg8"
id="0.0.1980:0x5005076309045435:0x4078408500000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdk sg10"
id="0.0.1980:0x5005076309045435:0x407840d800000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdm sg12"
id="0.0.1980:0x5005076309045435:0x4079404200000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdo sg14"
id="0.0.19c0:0x5005076309009435:0x4078408400000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sda sg0"
id="0.0.19c0:0x5005076309009435:0x4078408500000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdc sg2"
id="0.0.19c0:0x5005076309009435:0x407840d800000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sde sg4"
id="0.0.19c0:0x5005076309009435:0x4079404200000000" type="zfcp-lun" on="yes" exists="yes" pers="no" auto="no" failed="no" names="sdg sg6"
id="0.0.bd00:0.0.bd01:0.0.bd02" type="qeth" on="yes" exists="yes" pers="auto" auto="yes" failed="no" names="encbd00"

root@ubuntu-server:/# lszdev --auto-conf
TYPE ID AUTO
zfcp-host 0.0.1900 yes
zfcp-host 0.0.1940 yes
zfcp-host 0.0.1980 ...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-03-18 05:41 EDT-------
(In reply to comment #16)
> Since auto-conf data (aka devices) can be overwritten, but only by manually
> changing it's status (with chzdev) in the current session or with chzdev
> using 'zdev:early=1' in a previous session, which is both not the case here
> since all this happens in an early phase of the installation itself, I think
> it should be okay to treat "pers=auto" like "pers=on".
> But it would be good if IBM could confirm this?!

The intention of auto-config is that users don't have to manually perform any of the z-specific device configuration steps (e.g. setting CCW devices online, grouping CCWGROUP networking devices, etc.). For this purpose DPM provides all required data via a firmware interface and an initramfs script applies that via chzdev during early boot. Since users may change the LPAR configuration via DPM later, this z-specific data should also not be persisted in Linux to prevent stale config data.

I'm wondering why the installer system cares for the persistent settings at all - can a user inject persistent device configuration into the installer image in any way?

To get back to your question: I'm not entirely sure what the effect of treating pers=auto and pers=on the same would be in this particular context. The intended effect is that the installer should work with the resulting Linux devices (e.g. block devices, networking interfaces) that are enabled via auto-config as if they were manually enabled by the user (which they are, just not from within Linux).

As a side-note: you can check if auto-config data is available by reading from the associated sysfs file and checking if any data is returned.

Example on DPM:
# wc -c < /sys/firmware/sclp_sd/config/data
4096

And on non-DPM:
# wc -c < /sys/firmware/sclp_sd/config/data
0

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote : Re: [Ubuntu 20.04] On s390x interactive subiquity from the 20.04.2 ISO crashes on keyboard selection screen breaking install

The value of pers is accessed only in this function, which is determining the status text to show for a given zdev:

def status(zdevinfo):
    if zdevinfo.failed:
        # for translator: failed is a zdev device status
        return Color.info_error(Text(_("failed"), align="center"))
    if zdevinfo.auto and zdevinfo.on:
        # for translator: auto is a zdev device status
        return Color.info_minor(Text(_("auto"), align="center"))
    if zdevinfo.pers and zdevinfo.on:
        # for translator: online is a zdev device status
        return Text(_("online"), align="center")
    return Text("", align="center")

So I think that in fact what we set the 'pers' field to when pers is 'auto' is going to be irrelevant, because if pers is 'auto' then the 'auto' field will be true. In that case we just need to fix the parsing to handle the pers='auto' case better.

Changed in subiquity:
status: New → In Progress
summary: - [Ubuntu 20.04] On s390x interactive subiquity from the 20.04.2 ISO
- crashes on keyboard selection screen breaking install
+ [Ubuntu 20.04] zdevs autoconfigured via DPM lead to crash
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Changed in subiquity:
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I'm pretty sure this should be fixed in the 21.04 release or the latest focal dailies but I would appreciate a check of that!

Changed in subiquity:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-05-06 09:09 EDT-------
(In reply to comment #21)
> I'm pretty sure this should be fixed in the 21.04 release or the latest
> focal dailies but I would appreciate a check of that!

I can confirm that the problem does not appear on 21.04.
I couldn't find daily images for 20.04 for s390x but might
be missing something, could you give me a pointer?

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-05-06 10:36 EDT-------
Heinz-Werner gave me a hint and I can confirm this problem is solved on
the current installer image downloaded from http://cdimage.ubuntu.com/ubuntu-server/focal/daily-live/pending/

Revision history for this message
Frank Heimes (fheimes) wrote :

I'm personally not sure if it's yet in the 20.04 dailies
(which are btw. available here: https://cdimage.ubuntu.com/ubuntu-server/focal/daily-live/current/)
but we'll plan to get there and especially having it in 20.04.3.

Changed in ubuntu-z-systems:
milestone: none → ubuntu-20.04.3
Revision history for this message
Frank Heimes (fheimes) wrote :

And btw. thanks for confirming, Niklas.

Revision history for this message
Frank Heimes (fheimes) wrote :

Oh nice ! (missed comment #17)
In this case I can close the ticket now as Fix Released - thx !

Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Yay thanks for confirming.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-05-07 02:23 EDT-------
IBM Bugzilla status->closed, Fix Released by Canonical

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.