Getting error "ipconfig: no devices to configure" while trying to autoinstall in a VLAN env on s390x

Bug #1924794 reported by Frank Heimes
26
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Canonical Foundations Team
initramfs-tools
Fix Released
Undecided
Canonical Foundations Team
subiquity
Invalid
Undecided
Unassigned
casper (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Trying to autoinstall on a LPAR (that is connected to a VLAN environment) and using the 21.04 image from today (time stamp April 16th) fails and to boot ends up with the following error message:

BusyBox v1.30.1 (Ubuntu 1:1.30.1-6ubuntu2) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs) [6n
ip: SIOCGIFFLAGS: No such device
ip: can't find device 'encc000'
ipconfig: encc000.2653: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: encc000.2653: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: encc000.2653: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: encc000.2653: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: encc000.2653: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: encc000.2653: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: encc000.2653: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: encc000.2653: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: encc000.2653: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
ipconfig: encc000.2653: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
no search or nameservers found in /run/net-encc000.2653.conf /run/net-*.conf /ru
n/net6-*.conf
Connecting to installserver:80 (installserver:80)
wget: can't connect to remote host (installserver): Network is unreachable
Unable to find a live file system on the network

So the network device encc000 and it related encc000.2653 is not automatically activated/enabled like it was in the past.

It worked in the past with the same config I used today
ip=10.245.236.14::10.245.236.1:255.255.255.0:hostname:encc000.2653:none:10.245.236.1 vlan=encc000.2653:encc000 url=http://installserver:80/ubuntu-live-server-21.04/hirsute-live-server-s390x.iso autoinstall ds=nocloud-net;s=http://installserver:80/autoinstall/hostname/ --- quiet

With "worked in the past" I mean it's already a while ago when I tried it last time - right now it does not seem to work with 21.04 and 20.04.2 - BUT only on this particular environment where I have to specify the VLAN as kernel arg ("parm-file").

It works in a different non VLAN environments (for example in a z/VM guest environment where I do NOT have to specify the VLAN at the autoistall config, since the VLAN is handled there by the z/VM vSwitch).

Doing an interactive installation on this particular LPAR (incl. specifying the VLAN id) works fine!

Related branches

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
importance: Undecided → High
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Frank Heimes (fheimes)
tags: added: 21.04 hirsute
Revision history for this message
Frank Heimes (fheimes) wrote :

I just did an autoinstall attempt with having removed all config snippets from the parm-file that are required for the VLAN:

--------%<----------------%<----------------%<----------------%<----------------%<--------
A manual update of the initial RAM-disk is required.
QETH device 0.0.c000:0.0.c001:0.0.c002 configured
Note: The initial RAM-disk must be updated for these changes to take effect:
- QETH device 0.0.c000:0.0.c001:0.0.c002
IP-Config: encc000 hardware address 12:34:45:67:89:0a mtu 1500
IP-Config: encc000 guessed broadcast address 10.123.124.255
IP-Config: encc000 complete:
address: 10.123.124.14 broadcast: 10.123.124.255 netmask: 255.255.255.0

gateway: 10.123.124.1 dns0 : 10.123.124.1 dns1 : 0.0.0.0

host : hostname
rootserver: 0.0.0.0 rootpath:
filename :
Connecting to install-server:80 (installer-server:80)

BusyBox v1.30.1 (Ubuntu 1:1.30.1-6ubuntu2) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs) [6nwget: can't connect to remote host (install-server): Connection timed
out
Unable to find a live file system on the network
--------%<----------------%<----------------%<----------------%<----------------%<--------

and now the device c000 and the interface encc000 get activated:

encc000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq qlen 1000
link/ether 26:e4:7c:23:c1:c9 brd ff:ff:ff:ff:ff:ff
inet 10.245.236.14/24 brd 10.245.236.255 scope global encc000
valid_lft forever preferred_lft forever
inet6 fe80::24e4:7cff:fe23:c1c9/64 scope link
valid_lft forever preferred_lft forever

The installation can of course not be completed in this case (since the VLAN config is now missing), but it shows that this is only a problem with handling potential VLAN config in the early boot stage (casper?).

Frank Heimes (fheimes)
tags: added: regression
tags: added: fr-1294
Frank Heimes (fheimes)
Changed in casper (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
assignee: Canonical Foundations Team (canonical-foundations) → nobody
Changed in initramfs-tools:
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

The part of this that doesn't make sense is "ip: can't find device 'encc000'" -- is that device just not appearing in time? It should be chzdev activated just before the initramfs code tries to set up the vlan, maybe it's a race? (is this behaviour consistent?). It's strange that using the nic without a vlan works but there is a bit more time (and a wait for udev, hmm) between the device getting chzdev-ed and used in that case.

The code in initramfs-tools hasn't changed at all here between groovy and hirsute so I don't know what's changed. We can always blame the kernel I guess.

Revision history for this message
Frank Heimes (fheimes) wrote :

Yes, this behavior is quite consistent - and it happens on different systems that require vlan configurations (tested on two systems - each several times - happened every time).
I may try on another or more systems - maybe under less load conditions ...

Well, I'm not sure if looking back to groovy is enough, I have to admit that I mainly used autoinstall on focal, and tried it here and there on other releases too, but then w/o vlan for just doing a quick test (I should have taken more time for tests with vlan though).

One of the two systems that I used for the test is the same I used before with autoinstall (and vlan), and the environment didn't really changed - so I'm sure that it's not due to changes in the hw, setup or environment itself. (On the 2nd system autoinstall was used for the 1st time).

The z system got moved a few month ago to a different data center, but that didn't changed anything internally with the OSA network adapters, their ports or configuration - there might have been changes in the external network (different switches etc.), but that shouldn't have such an impact. And btw. doing a manual/interactive installation (incl. vlan) works fine - and that is what I do very often - on all releases.

And yes, the test with just dropping the vlan config is interesting. That make me think that it must have to do with the vlan config and setup itself. Well, maybe a race - but that race got for some reason introduced at some point in time ...

Revision history for this message
Frank Heimes (fheimes) wrote :

I found an older (known to work) config in my notes for 20.04.1 - and I just retried it and can confirm that autoinstall (with vlan) works with 20.04.1 on the same system LPAR system.

I'll proceed now using newer and newer releases and see where issues start to occur ...

Revision history for this message
Frank Heimes (fheimes) wrote :

Summary of autoinstall w/ vlan tests:

*** 20.04.1:
I see these messages, but the device gets finally configured:
"chzdev: Unknown device type or device ID format: c000.2653"
"Use 'chzdev --help' for more information"
"QETH device 0.0.c000:0.0.c001:0.0.c002 configured"
"IP-Config: encc000.2653 hardware address d6:21:7e:b3:49:da mtu 1500"
"IP-Config: encc000.2653 guessed broadcast address 10.245.236.255"
"IP-Config: encc000.2653 complete:"
(there seems to be a second try to get the configured after the iso was downloaded, but that doesn't seem to harm:
"chzdev: Unknown device type or device ID format: c000.2653"
"Use 'chzdev --help' for more information"
"QETH device 0.0.c000:0.0.c001:0.0.c002 already configured" )
No further error visible.

*** 20.04.2:
Similar to 20.04.1
but I get later msgs like this:
"finish: subiquity/ErrorReporter/1619073531.572634459.unknown/add_info: written"
"to /var/crash/1619073531.572634459.unknown.crash"
probably LP#1924575

*** 20.10 (and later):
The device cannot be configured anymore:
"(initramfs) [6n"
"ip: SIOCGIFFLAGS: No such device"
"ip: can't find device 'encc000'"
"ipconfig: encc000.2653: SIOCGIFINDEX: No such device"
"ipconfig: no devices to configure"
"ipconfig: encc000.2653: SIOCGIFINDEX: No such device"
"ipconfig: no devices to configure"
"ipconfig: encc000.2653: SIOCGIFINDEX: No such device"
"ipconfig: no devices to configure"
"ipconfig: encc000.2653: SIOCGIFINDEX: No such device"
"ipconfig: no devices to configure"
"ipconfig: encc000.2653: SIOCGIFINDEX: No such device"
"ipconfig: no devices to configure"
"ipconfig: encc000.2653: SIOCGIFINDEX: No such device"
"ipconfig: no devices to configure"
"ipconfig: encc000.2653: SIOCGIFINDEX: No such device"
"ipconfig: no devices to configure"
"ipconfig: encc000.2653: SIOCGIFINDEX: No such device"
"ipconfig: no devices to configure"
"ipconfig: encc000.2653: SIOCGIFINDEX: No such device"
"ipconfig: no devices to configure"
"ipconfig: encc000.2653: SIOCGIFINDEX: No such device"
"ipconfig: no devices to configure"
"no search or nameservers found in /run/net-encc000.2653.conf /run/net-*.conf /ru"
n/net6-*.conf

Revision history for this message
Frank Heimes (fheimes) wrote :
Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Well in the focal -> groovy time we find this change (http://launchpadlibrarian.net/481474731/initramfs-tools_0.137ubuntu4_0.137ubuntu5.diff.gz):

  # activate non-autoconfigured s390x devices
  for dev in $DEVICE $DEVICE6 $IP6 $VLAN_LINK; do
+ # skip $DEVICE which is processed in $VLAN_LINK
+ echo ${VLAN} | grep -q ${dev} && continue
   case ${dev} in
   enc*)
    zdev=${dev#enc}

which I think is just wrong: it's trying to avoid calling chzdev -e on the encc000.2653 device but it also avoids calling it on encc000 :(

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :
Revision history for this message
Frank Heimes (fheimes) wrote :

Yes, that could be it - thx for digging into this!

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

Can you try https://people.canonical.com/~mwh/initrd.lp1924794? This is a patched initrd from the 21.04 release ISO, so the other files should also come from that ISO.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

I've uploaded the fix to impish. The bug is present in groovy and hirsute but there's probably no reason to backport the fix -- we are unlikely to make release media for them again...

Changed in subiquity:
status: New → Invalid
Changed in casper (Ubuntu):
status: New → Invalid
Changed in initramfs-tools:
status: New → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: New → Fix Committed
Revision history for this message
Frank Heimes (fheimes) wrote :

I agree, no reason to backport to groovy and hirsute.
We may just need to ensure that the problem not affect 20.04.3.

Revision history for this message
Michael Hudson-Doyle (mwhudson) wrote :

So the initramfs-tools with the fix has migrated now, the next impish daily should have it (serial 20210511 or later). Would you be able to test to see if that helps?

Revision history for this message
Frank Heimes (fheimes) wrote :

Yes, I can confirm that autoinstall of an s390x system, where the qeth device is attached to a VLAN (in my case an LPAR system) works with the updated initramfs that is included in the impish image with timestamp 20210511.

I've attached the entire console log.

Looks good - many thx!

Changed in initramfs-tools:
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.