Some restarts fail due to missing base devices

Bug #1623383 reported by Stefan Bader on 2016-09-14
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
systemd (Ubuntu)
Critical
Dimitri John Ledkov

Bug Description

Arch: s390x
Release: Yakkety / 16.10

This happens on some (but not all) system starts with Yakkety. In Xenial (which is using the same 4.4 kernel version the Yakkety systems were using when the problem was first observed) this did not happen. The system (LPAR) this was seen first was an upgrade from Xenial but since then has been freshly installed with Yakkety. The same behaviour is seen on a zVM guest running Yakkety.

The attached syslog shows a failed boot, followed by one that did work. Note the "Found device .*(sclp|encc00).*" messages in the good boot. Those are missing in the bad attempt and as a result networking and console fail to be usable. Also note, those boots were 4.8 kernels but we saw this with 4.4 kernels, too.
This might be a systemd problem / race, I just filed it into udev for now as that better matches the not finding basic devices symptom.

Stefan Bader (smb) wrote :
affects: ubuntu → udev (Ubuntu)
tags: added: s390x
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in udev (Ubuntu):
status: New → Confirmed
Brad Figg (brad-figg) wrote :

This issue is now blocking kernel regression testing for these s390 variants.

Changed in udev (Ubuntu):
importance: Undecided → Critical
Stefan Bader (smb) wrote :

For testing I enabled "debug" in the kernel command-line since that has some influence on how systemd acts but I found both the LPAR and the zVM guest unreachable this morning which is a sign they did not survive being rebooted.

Stefan Bader (smb) wrote :

This is the (compressed otherwise it would be 13M) journal from a zVM guest failed boot. One thing that maybe should noted, we have some zfcp disks enabled on the LPAR and zVM guest. That is at least for the zVM guest the same config as we have for the guest running Xenial.

Stefan Bader (smb) wrote :

enc600 is the qeth net device for the zVM guest. There seem to be issues visible with that in the log.

Stefan Bader (smb) wrote :
tags: added: xenial
tags: added: yakkety
removed: xenial
Changed in udev (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
Stefan Bader (smb) wrote :

Spending more time looking at the two logs, I saw that in the failed boot case the
systemd-modules-load.service is never stopping. The last message there is that the ib_iser module (which is the second of 2) got loaded and thats all.
When the boot succeeds, then the systemd init process receives a SIGCHLD from systemd-modules-load.service.

Stefan Bader (smb) wrote :

I took a dump of another run that ended in the failed state. While not all messages that are in the journal show up in the dmesg log from a dump, there does not seem to be a message about systemd noticing the exit on systemd-modules-load, however I can not see the process listed in the ps output in crash. The ib_iser module is loaded.
So, with the previous info, it feels more like a communication issues between the systemd main process and the loader process. And also not always.

Stefan Bader (smb) wrote :

Still happened, even without any modules getting loaded by that service. Maybe the specific service is a red-herring and its something else which just shows there because the service is in the critical path. For making things more clear, here the messages related to it from boot -1 (a failed one) and 0 (a successful one):

ubuntu@s2lp4:~$ journalctl -b-1|grep modules-load
Sep 22 08:13:54 s2lp4 systemd[582]: systemd-modules-load.service: Executing: /lib/systemd/systemd-modules-load
Sep 22 08:13:54 s2lp4 systemd-modules-load[582]: apply: /etc/modules-load.d/modules.conf
Sep 22 13:15:44 s2lp4 systemd[1]: systemd-modules-load.service: Failed to send unit remove signal for systemd-modules-load.service: Transport endpoint is not connected
Sep 22 13:15:44 s2lp4 systemd[1]: systemd-modules-load.service: Changed dead -> exited
Sep 22 13:15:45 s2lp4 systemd[1]: systemd-modules-load.service: Failed to send unit change signal for systemd-modules-load.service: Transport endpoint is not connected

ubuntu@s2lp4:~$ journalctl -b0|grep modules-load
Sep 22 23:44:44 s2lp4 systemd[1613]: systemd-modules-load.service: Executing: /lib/systemd/systemd-modules-load
Sep 22 23:44:44 s2lp4 systemd-modules-load[1613]: apply: /etc/modules-load.d/modules.conf
Sep 22 23:44:48 s2lp4 systemd[1]: Got cgroup empty notification for: /system.slice/systemd-modules-load.service
Sep 22 23:44:48 s2lp4 systemd[1]: systemd-modules-load.service: cgroup is empty

Stefan Bader (smb) wrote :

Adding the setup of the zVM guest that shows the problem in case that helps to re-create the problem:

Memory: 4G, 8 vCPUs

Device Subchan. DevType CU Type Use PIM PAM POM CHPIDs
----------------------------------------------------------------------
0.0.e002 0.0.0000 1732/03 1731/03 yes 80 80 ff 20000000 00000000
0.0.e102 0.0.0001 1732/03 1731/03 yes 80 80 ff 21000000 00000000
0.0.0200 0.0.0002 3390/0c 3990/e9 yes 80 80 ff ff000000 00000000
0.0.0201 0.0.0003 3390/0c 3990/e9 yes 80 80 ff ff000000 00000000
0.0.0009 0.0.0004 0000/00 3215/00 yes 80 80 ff ff000000 00000000
0.0.000c 0.0.0005 0000/00 2540/00 80 80 ff ff000000 00000000
0.0.000d 0.0.0006 0000/00 2540/00 80 80 ff ff000000 00000000
0.0.000e 0.0.0007 0000/00 1403/00 80 80 ff ff000000 00000000
0.0.0600 0.0.0008 1732/01 1731/01 yes 80 80 ff 00000000 00000000
0.0.0601 0.0.0009 1732/01 1731/01 yes 80 80 ff 00000000 00000000
0.0.0602 0.0.000a 1732/01 1731/01 yes 80 80 ff 00000000 00000000

0.0.e002 and e102 are two zfcp adapter leading to one disk via two paths each. The resulting 4 paths are grouped by multipath and make up a lvm2 VG. But nothing is mounted from there by default.

lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sdd 8:48 0 10G 0 disk
`-36005076306ffd6b60000000000002a03 254:1 0 10G 0 mpath
dasda 94:0 0 6.9G 0 disk
`-dasda1 94:1 0 6.9G 0 part /
sdb 8:16 0 10G 0 disk
`-36005076306ffd6b60000000000002a03 254:1 0 10G 0 mpath
dasdb 94:4 0 20.6G 0 disk
`-dasdb1 94:5 0 20.6G 0 part
  `-datavg-home 254:0 0 20.6G 0 lvm /home
sdc 8:32 0 10G 0 disk
`-36005076306ffd6b60000000000002a03 254:1 0 10G 0 mpath
sda 8:0 0 10G 0 disk
`-36005076306ffd6b60000000000002a03 254:1 0 10G 0 mpath

Dimitri John Ledkov (xnox) wrote :

Sep 22 13:15:45 s2lp4 systemd[1]: systemd-modules-load.service: Failed to send unit change signal for systemd-modules-load.service: Transport endpoint is not connected

Pretty much means that pid1 stopped listening to /run/systemd/private. Does not receive exited notification, does not mark systemd-modules-load as started and everything fails from then on.

Output of: lvs & vgs would be helpful too.

Stefan Bader (smb) wrote :

vgs:
  VG #PV #LV #SN Attr VSize VFree
  datavg 1 1 0 wz--n- 20.63g 0
  zfcpvg 1 0 0 wz--n- 10.00g 10.00g

lvs:
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
  home datavg -wi-ao---- 20.63g

Stefan Bader (smb) wrote :

Changing the type of systemd-modules-load from oneshot to simple might work around the problem. At least this output looks like it would have been a hang:

* systemd-modules-load.service - Load Kernel Modules
   Loaded: loaded (/lib/systemd/system/systemd-modules-load.service; static; vendor preset: enabled)
   Active: active (exited) since Fri 2016-09-23 09:57:49 BST; 5h 12min ago
     Docs: man:systemd-modules-load.service(8)
           man:modules-load.d(5)
 Main PID: 546 (code=exited, status=0/SUCCESS)
    Tasks: 0 (limit: 4915)
   Memory: 0B
      CPU: 0
   CGroup: /system.slice/systemd-modules-load.service

Sep 23 09:57:57 kernel02 systemd[1]: systemd-modules-load.service: Failed to send unit remove signal for systemd-modules-load.service: Transport endpoint is not connected
Sep 23 09:57:57 kernel02 systemd[1]: systemd-modules-load.service: Changed dead -> exited

I can ssh into the system and netstat -an shows something listening on /run/systemd/provate

netstat -an|grep private
unix 2 [ ACC ] STREAM LISTENING 31143 /run/user/1000/systemd/private
unix 2 [ ACC ] STREAM LISTENING 9536 /run/systemd/private

Stefan Bader (smb) wrote :

Is that normal? However I also see those disconnects on the journal of a good boot...

journalctl -b0|grep private
Sep 23 09:57:51 kernel02 systemd-tmpfiles[1936]: Entry "/tmp/systemd-private-4f4f7561b59a4b06a720204adeaff628-*" does not match any include prefix, skipping.
Sep 23 09:57:51 kernel02 systemd-tmpfiles[1936]: Entry "/tmp/systemd-private-4f4f7561b59a4b06a720204adeaff628-*/tmp" does not match any include prefix, skipping.
Sep 23 09:57:51 kernel02 systemd-tmpfiles[1936]: Entry "/var/tmp/systemd-private-4f4f7561b59a4b06a720204adeaff628-*" does not match any include prefix, skipping.
Sep 23 09:57:51 kernel02 systemd-tmpfiles[1936]: Entry "/var/tmp/systemd-private-4f4f7561b59a4b06a720204adeaff628-*/tmp" does not match any include prefix, skipping.
Sep 23 09:57:54 kernel02 systemd[1]: Accepted new private connection.
Sep 23 09:57:54 kernel02 systemd[1]: Accepted new private connection.
Sep 23 09:57:54 kernel02 systemd[1]: Got disconnect on private connection.
Sep 23 09:57:54 kernel02 systemd[1]: Accepted new private connection.

Dimitri John Ledkov (xnox) wrote :

#15 -> those things look fine.

affects: udev (Ubuntu) → systemd (Ubuntu)
Frank Heimes (frank-heimes) wrote :

Looks like this can happen on VM guests and KVM vms, too.
(could be somehow related to zfs ...)

Frank Heimes (frank-heimes) wrote :

After further investigation the issue could be related to udev.

Would it be possible to start the systems with kernel parameters udev.log_priority and rd.udev.log_priority set to debug and share again the logs from a failed system?

Po-Hsu Lin (cypressyew) wrote :

Hello Frank, I have set udev.log_priority and rd.udev.log_priority to debug on kernel03 with smb's help. Let's wait for this happens.

Ian Thompson (ithompson) wrote :

I have the same problem with two Dell PowerEdge FC630.
Ubuntu 16.04.2 LTS/ 4.4.0-66-generic

Shutdown/reboot stalls at different points each attempt.

An IBM System x3650 M4 in the same rack, at the same software level, reboots cleanly every time.

On 13 March 2017 at 16:58, Ian Thompson <email address hidden> wrote:
> I have the same problem with two Dell PowerEdge FC630.
> Ubuntu 16.04.2 LTS/ 4.4.0-66-generic
>
> Shutdown/reboot stalls at different points each attempt.
>
> An IBM System x3650 M4 in the same rack, at the same software level,
> reboots cleanly every time.
>

Could you please open a new bug report using ubuntu-bug tool and
include systemd journal which includes failed boots?
Just in case if this is not the same issue. As so far we have only
reproduced this on s390x architecture.
Please enable persistent journal first by doing $ sudo mkdir -p /var/log/journal

--
Regards,

Dimitri.

Po-Hsu Lin (cypressyew) wrote :

Node s2lp6g003 (s390x.zKVM) boot to emergency mode. Please find attachment for the journalctl log.

Po-Hsu Lin (cypressyew) wrote :

journalctl log for kernel02 (s390x.zVM), which drops to emergency mode as well

Andrew Cloke (andrew-cloke) wrote :

Hi, we’re trying to explore different approaches to investigate, resolve and/or work around this issue, but regrettably there do not appear to be any simple solutions.

Would it be possible to record the failure rate during the testing for the next SRU cycle? Perhaps something simple like “5 failures in 50 redeployments over a 2 week period”?

Many thanks.

Po-Hsu Lin (cypressyew) wrote :

Here is the number from the beginning of this SRU cycle till today (since Jul. 3)

We have 6 jobs for each node, which requires at least 6 re-deployments (note that it's just a reboot for s390x, since we don't re-deploy them).

The failures here means how many times we need to reboot it manually.
The total jobs count on jenkins is larger than the sum of failures + 6, as one failure may cause all the other job to fail and need to be restarted.

s2lp4 (Ubuntu on LPAR) - 1 failures, jobs count: 17
kernel02 (zVM) - 2 failures, jobs count: 16
s2lp6g003 (zKVM) - 6 failures, jobs count: 9
kernel03 (zKVM) - 7 failures, jobs count: 21

kernel03 is still running. The number of failures and jobs count will change.

Frank Heimes (frank-heimes) wrote :

Okay suggestion is to:
- Update zKVM (which is based on IBM KVM on z) to the latest and last release 1.1.2 FP4.
  1.1.2 FP4 is the only one that is still supported. I can help on that.
- It might also be worth trying to limit the devices on the LPAR with the help of cio_ignore
  and see if that make a difference - in case of a potential race-conditions.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers