Activity log for bug #1587142

Date Who What changed Old value New value Message
2016-05-30 16:51:55 Benoît Thébaudeau bug added bug
2016-05-30 16:51:55 Benoît Thébaudeau attachment added shutdown-log-part1.png https://bugs.launchpad.net/bugs/1587142/+attachment/4673087/+files/shutdown-log-part1.png
2016-05-30 16:52:29 Benoît Thébaudeau attachment added shutdown-log-part2.png https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1587142/+attachment/4673088/+files/shutdown-log-part2.png
2016-05-31 08:42:10 Benoît Thébaudeau summary Reboot hangs once RAID1 resynced Shutdown hangs in md kworker after "Reached target Shutdown."
2016-05-31 08:46:42 Benoît Thébaudeau description I'm booting a fully patched 16.04 from an Intel Rapid Storage Technology enterprise RAID1 volume (ThinkServer TS140 with two SATA ST1000NM0033-9ZM drives, ext4 root partition, no LVM, UEFI mode). If the RAID volume is recovering or resyncing for whatever reason, then `sudo systemctl reboot` and `sudo systemctl poweroff` work fine (I had to `sudo systemctl --now disable lvm2-lvmetad lvm2-lvmpolld lvm2-monitor` in order to consistently get that). However, once the recovery/resync is complete and clean, the reboot and poweroff commands above hang forever after "Reached target Shutdown.". Note that issuing `sudo swapoff -a` beforehand (suggested in the bug #1464917) does not help. Then, if the server has been forcibly restarted with the power button, the Intel Matrix Storage Manager indicates a "Normal" status for the RAID1 volume, but Ubuntu then resyncs the volume anyway: [ 1.223649] md: bind<sda> [ 1.228426] md: bind<sdb> [ 1.230030] md: bind<sdb> [ 1.230738] md: bind<sda> [ 1.232985] usbcore: registered new interface driver usbhid [ 1.233494] usbhid: USB HID core driver [ 1.234022] md: raid1 personality registered for level 1 [ 1.234876] md/raid1:md126: not clean -- starting background reconstruction [ 1.234956] input: CHESEN USB Keyboard as /devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0/0003:0A81:0101.0001/input/input5 [ 1.236273] md/raid1:md126: active with 2 out of 2 mirrors [ 1.236797] md126: detected capacity change from 0 to 1000202043392 [ 1.246271] md: md126 switched to read-write mode. [ 1.246834] md: resync of RAID array md126 [ 1.247325] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 1.247503] md126: p1 p2 p3 p4 [ 1.248269] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. [ 1.248774] md: using 128k window, over a total of 976759940k. Note that the pain of "resync upon every (re)boot" cannot even be a bit relieved thanks to bitmaps because mdadm does not support them for IMSM containers: $ sudo mdadm --grow --bitmap=internal /dev/md126 mdadm: Cannot add bitmaps to sub-arrays yet I also get this in syslog during boot when the individual drives are detected, but this seems to be harmless: May 30 17:26:07 wssrv1 systemd-udevd[608]: Process '/sbin/mdadm --incremental /dev/sdb --offroot' failed with exit code 1. May 30 17:26:07 wssrv1 systemd-udevd[608]: Process '/lib/udev/hdparm' failed with exit code 1. May 30 17:26:07 wssrv1 systemd-udevd[606]: Process '/sbin/mdadm --incremental /dev/sda --offroot' failed with exit code 1. May 30 17:26:07 wssrv1 systemd-udevd[606]: Process '/lib/udev/hdparm' failed with exit code 1. During a resync, `sudo sh -c 'echo idle > /sys/block/md126/md/sync_action'` actually stops it as expected, but it restarts immediately though nothing seems to have triggered it: May 30 18:17:02 wssrv1 kernel: [ 3106.826710] md: md126: resync interrupted. May 30 18:17:02 wssrv1 kernel: [ 3106.836320] md: checkpointing resync of md126. May 30 18:17:02 wssrv1 kernel: [ 3106.836623] md: resync of RAID array md126 May 30 18:17:02 wssrv1 kernel: [ 3106.836625] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. May 30 18:17:02 wssrv1 kernel: [ 3106.836626] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. May 30 18:17:02 wssrv1 kernel: [ 3106.836627] md: using 128k window, over a total of 976759940k. May 30 18:17:02 wssrv1 kernel: [ 3106.836628] md: resuming resync of md126 from checkpoint. May 30 18:17:02 wssrv1 mdadm[982]: RebuildStarted event detected on md device /dev/md/Volume0 I attach screenshots of the hanging shutdown log after a `sudo sh -c 'echo 8 > /proc/sys/kernel/printk'`. The second screenshot shows that the kernel has deadlocked in md_write_start(). Note that `sudo systemctl start debug-shell` is unusable on this machine at this point because Ctrl+Alt+F9 brings tty9 without any keyboard. I have also tried with much lower values for vm.dirty_background_ratio and vm.dirty_ratio, but to no avail. Linux 4.6.0-040600-generic_4.6.0-040600.201605151930_amd64 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-yakkety/ did not help either. More information below: $ lsb_release -rd Description: Ubuntu 16.04 LTS Release: 16.04 $ uname -a Linux wssrv1 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux $ apt-cache policy systemd systemd: Installed: 229-4ubuntu6 Candidate: 229-4ubuntu6 Version table: *** 229-4ubuntu6 500 500 http://fr.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages 100 /var/lib/dpkg/status 229-4ubuntu4 500 500 http://fr.archive.ubuntu.com/ubuntu xenial/main amd64 Packages $ cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md126 : active raid1 sda[1] sdb[0] 976759808 blocks super external:/md127/0 [2/2] [UU] [>....................] resync = 3.3% (32651584/976759940) finish=85.9min speed=183164K/sec md127 : inactive sdb[1](S) sda[0](S) 5288 blocks super external:imsm unused devices: <none> $ sudo mdadm -D /dev/md127 /dev/md127: Version : imsm Raid Level : container Total Devices : 2 Working Devices : 2 UUID : e9bb2216:cb1bbc0f:96943390:bb65943c Member Arrays : /dev/md/Volume0 Number Major Minor RaidDevice 0 8 0 - /dev/sda 1 8 16 - /dev/sdb $ sudo mdadm -D /dev/md126 /dev/md126: Container : /dev/md/imsm0, member 0 Raid Level : raid1 Array Size : 976759808 (931.51 GiB 1000.20 GB) Used Dev Size : 976759940 (931.51 GiB 1000.20 GB) Raid Devices : 2 Total Devices : 2 State : clean, resyncing Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Resync Status : 5% complete UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090 Number Major Minor RaidDevice State 1 8 0 0 active sync /dev/sda 0 8 16 1 active sync /dev/sdb $ sudo mdadm -E /dev/sda /dev/sda: Magic : Intel Raid ISM Cfg Sig. Version : 1.1.00 Orig Family : 92b6e3e4 Family : 92b6e3e4 Generation : 00000075 Attributes : All supported UUID : e9bb2216:cb1bbc0f:96943390:bb65943c Checksum : 5ad6e3c8 correct MPB Sectors : 2 Disks : 2 RAID Devices : 1 Disk00 Serial : Z1W50P5E State : active Id : 00000000 Usable Size : 1953519880 (931.51 GiB 1000.20 GB) [Volume0]: UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090 RAID Level : 1 <-- 1 Members : 2 <-- 2 Slots : [UU] <-- [UU] Failed disk : none This Slot : 0 Array Size : 1953519616 (931.51 GiB 1000.20 GB) Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB) Sector Offset : 0 Num Stripes : 7630936 Chunk Size : 64 KiB <-- 64 KiB Reserved : 0 Migrate State : repair Map State : normal <-- normal Checkpoint : 201165 (512) Dirty State : dirty Disk01 Serial : Z1W519DN State : active Id : 00000001 Usable Size : 1953519880 (931.51 GiB 1000.20 GB) $ sudo mdadm -E /dev/sdb /dev/sdb: Magic : Intel Raid ISM Cfg Sig. Version : 1.1.00 Orig Family : 92b6e3e4 Family : 92b6e3e4 Generation : 00000075 Attributes : All supported UUID : e9bb2216:cb1bbc0f:96943390:bb65943c Checksum : 5ad6e3c8 correct MPB Sectors : 2 Disks : 2 RAID Devices : 1 Disk01 Serial : Z1W519DN State : active Id : 00000001 Usable Size : 1953519880 (931.51 GiB 1000.20 GB) [Volume0]: UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090 RAID Level : 1 <-- 1 Members : 2 <-- 2 Slots : [UU] <-- [UU] Failed disk : none This Slot : 1 Array Size : 1953519616 (931.51 GiB 1000.20 GB) Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB) Sector Offset : 0 Num Stripes : 7630936 Chunk Size : 64 KiB <-- 64 KiB Reserved : 0 Migrate State : repair Map State : normal <-- normal Checkpoint : 201165 (512) Dirty State : dirty Disk00 Serial : Z1W50P5E State : active Id : 00000000 Usable Size : 1953519880 (931.51 GiB 1000.20 GB) I'm booting a fully patched 16.04 from an Intel Rapid Storage Technology enterprise RAID1 volume (ThinkServer TS140 with two SATA ST1000NM0033-9ZM drives, ext4 root partition, no LVM, UEFI mode). If the RAID volume is recovering or resyncing for whatever reason, then `sudo systemctl reboot` and `sudo systemctl poweroff` work fine (I had to `sudo systemctl --now disable lvm2-lvmetad lvm2-lvmpolld lvm2-monitor` in order to consistently get that). However, once the recovery/resync is complete and clean, the reboot and poweroff commands above hang forever after "Reached target Shutdown.". Note that issuing `sudo swapoff -a` beforehand (suggested in the bug #1464917) does not help. [EDIT]Actually, the shutdown also hangs from time to time during a resync. But I've never seen it succeed once the resync is complete.[/EDIT] Then, if the server has been forcibly restarted with the power button, the Intel Matrix Storage Manager indicates a "Normal" status for the RAID1 volume, but Ubuntu then resyncs the volume anyway: [ 1.223649] md: bind<sda> [ 1.228426] md: bind<sdb> [ 1.230030] md: bind<sdb> [ 1.230738] md: bind<sda> [ 1.232985] usbcore: registered new interface driver usbhid [ 1.233494] usbhid: USB HID core driver [ 1.234022] md: raid1 personality registered for level 1 [ 1.234876] md/raid1:md126: not clean -- starting background reconstruction [ 1.234956] input: CHESEN USB Keyboard as /devices/pci0000:00/0000:00:14.0/usb3/3-10/3-10:1.0/0003:0A81:0101.0001/input/input5 [ 1.236273] md/raid1:md126: active with 2 out of 2 mirrors [ 1.236797] md126: detected capacity change from 0 to 1000202043392 [ 1.246271] md: md126 switched to read-write mode. [ 1.246834] md: resync of RAID array md126 [ 1.247325] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. [ 1.247503] md126: p1 p2 p3 p4 [ 1.248269] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. [ 1.248774] md: using 128k window, over a total of 976759940k. Note that the pain of "resync upon every (re)boot" cannot even be a bit relieved thanks to bitmaps because mdadm does not support them for IMSM containers: $ sudo mdadm --grow --bitmap=internal /dev/md126 mdadm: Cannot add bitmaps to sub-arrays yet I also get this in syslog during boot when the individual drives are detected, but this seems to be harmless: May 30 17:26:07 wssrv1 systemd-udevd[608]: Process '/sbin/mdadm --incremental /dev/sdb --offroot' failed with exit code 1. May 30 17:26:07 wssrv1 systemd-udevd[608]: Process '/lib/udev/hdparm' failed with exit code 1. May 30 17:26:07 wssrv1 systemd-udevd[606]: Process '/sbin/mdadm --incremental /dev/sda --offroot' failed with exit code 1. May 30 17:26:07 wssrv1 systemd-udevd[606]: Process '/lib/udev/hdparm' failed with exit code 1. During a resync, `sudo sh -c 'echo idle > /sys/block/md126/md/sync_action'` actually stops it as expected, but it restarts immediately though nothing seems to have triggered it: May 30 18:17:02 wssrv1 kernel: [ 3106.826710] md: md126: resync interrupted. May 30 18:17:02 wssrv1 kernel: [ 3106.836320] md: checkpointing resync of md126. May 30 18:17:02 wssrv1 kernel: [ 3106.836623] md: resync of RAID array md126 May 30 18:17:02 wssrv1 kernel: [ 3106.836625] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. May 30 18:17:02 wssrv1 kernel: [ 3106.836626] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for resync. May 30 18:17:02 wssrv1 kernel: [ 3106.836627] md: using 128k window, over a total of 976759940k. May 30 18:17:02 wssrv1 kernel: [ 3106.836628] md: resuming resync of md126 from checkpoint. May 30 18:17:02 wssrv1 mdadm[982]: RebuildStarted event detected on md device /dev/md/Volume0 I attach screenshots of the hanging shutdown log after a `sudo sh -c 'echo 8 > /proc/sys/kernel/printk'`. The second screenshot shows that the kernel has deadlocked in md_write_start(). Note that `sudo systemctl start debug-shell` is unusable on this machine at this point because Ctrl+Alt+F9 brings tty9 without any keyboard. [EDIT]But I can still switch back to tty1.[/EDIT] I have also tried with much lower values for vm.dirty_background_ratio and vm.dirty_ratio, but to no avail. Linux 4.6.0-040600-generic_4.6.0-040600.201605151930_amd64 from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-yakkety/ did not help either. More information below: $ lsb_release -rd Description: Ubuntu 16.04 LTS Release: 16.04 $ uname -a Linux wssrv1 4.4.0-22-generic #40-Ubuntu SMP Thu May 12 22:03:46 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux $ apt-cache policy systemd systemd:   Installed: 229-4ubuntu6   Candidate: 229-4ubuntu6   Version table:  *** 229-4ubuntu6 500         500 http://fr.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages         100 /var/lib/dpkg/status      229-4ubuntu4 500         500 http://fr.archive.ubuntu.com/ubuntu xenial/main amd64 Packages $ cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md126 : active raid1 sda[1] sdb[0]       976759808 blocks super external:/md127/0 [2/2] [UU]       [>....................] resync = 3.3% (32651584/976759940) finish=85.9min speed=183164K/sec md127 : inactive sdb[1](S) sda[0](S)       5288 blocks super external:imsm unused devices: <none> $ sudo mdadm -D /dev/md127 /dev/md127:         Version : imsm      Raid Level : container   Total Devices : 2 Working Devices : 2            UUID : e9bb2216:cb1bbc0f:96943390:bb65943c   Member Arrays : /dev/md/Volume0     Number Major Minor RaidDevice        0 8 0 - /dev/sda        1 8 16 - /dev/sdb $ sudo mdadm -D /dev/md126 /dev/md126:       Container : /dev/md/imsm0, member 0      Raid Level : raid1      Array Size : 976759808 (931.51 GiB 1000.20 GB)   Used Dev Size : 976759940 (931.51 GiB 1000.20 GB)    Raid Devices : 2   Total Devices : 2           State : clean, resyncing  Active Devices : 2 Working Devices : 2  Failed Devices : 0   Spare Devices : 0   Resync Status : 5% complete            UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090     Number Major Minor RaidDevice State        1 8 0 0 active sync /dev/sda        0 8 16 1 active sync /dev/sdb $ sudo mdadm -E /dev/sda /dev/sda:           Magic : Intel Raid ISM Cfg Sig.         Version : 1.1.00     Orig Family : 92b6e3e4          Family : 92b6e3e4      Generation : 00000075      Attributes : All supported            UUID : e9bb2216:cb1bbc0f:96943390:bb65943c        Checksum : 5ad6e3c8 correct     MPB Sectors : 2           Disks : 2    RAID Devices : 1   Disk00 Serial : Z1W50P5E           State : active              Id : 00000000     Usable Size : 1953519880 (931.51 GiB 1000.20 GB) [Volume0]:            UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090      RAID Level : 1 <-- 1         Members : 2 <-- 2           Slots : [UU] <-- [UU]     Failed disk : none       This Slot : 0      Array Size : 1953519616 (931.51 GiB 1000.20 GB)    Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)   Sector Offset : 0     Num Stripes : 7630936      Chunk Size : 64 KiB <-- 64 KiB        Reserved : 0   Migrate State : repair       Map State : normal <-- normal      Checkpoint : 201165 (512)     Dirty State : dirty   Disk01 Serial : Z1W519DN           State : active              Id : 00000001     Usable Size : 1953519880 (931.51 GiB 1000.20 GB) $ sudo mdadm -E /dev/sdb /dev/sdb:           Magic : Intel Raid ISM Cfg Sig.         Version : 1.1.00     Orig Family : 92b6e3e4          Family : 92b6e3e4      Generation : 00000075      Attributes : All supported            UUID : e9bb2216:cb1bbc0f:96943390:bb65943c        Checksum : 5ad6e3c8 correct     MPB Sectors : 2           Disks : 2    RAID Devices : 1   Disk01 Serial : Z1W519DN           State : active              Id : 00000001     Usable Size : 1953519880 (931.51 GiB 1000.20 GB) [Volume0]:            UUID : 3d724b1d:ac75cddb:600ac81a:ccdc2090      RAID Level : 1 <-- 1         Members : 2 <-- 2           Slots : [UU] <-- [UU]     Failed disk : none       This Slot : 1      Array Size : 1953519616 (931.51 GiB 1000.20 GB)    Per Dev Size : 1953519880 (931.51 GiB 1000.20 GB)   Sector Offset : 0     Num Stripes : 7630936      Chunk Size : 64 KiB <-- 64 KiB        Reserved : 0   Migrate State : repair       Map State : normal <-- normal      Checkpoint : 201165 (512)     Dirty State : dirty   Disk00 Serial : Z1W50P5E           State : active              Id : 00000000     Usable Size : 1953519880 (931.51 GiB 1000.20 GB)
2016-09-27 03:13:27 Launchpad Janitor systemd (Ubuntu): status New Confirmed
2016-09-27 22:55:56 Alberto Salvia Novella systemd (Ubuntu): importance Undecided High
2016-10-13 19:02:51 0perl bug added subscriber Oleg Perelstein
2016-10-23 03:44:03 John Center bug added subscriber John Center
2016-10-26 16:12:28 Benoît Thébaudeau attachment added reboot-md-dyn-debug-patch-4.8.4.png https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1587142/+attachment/4767751/+files/reboot-md-dyn-debug-patch-4.8.4.png
2017-01-14 00:35:51 Matt Wear bug added subscriber Matt Wear
2017-02-18 04:23:01 Michael Cain bug added subscriber Michael Cain
2017-03-16 11:16:09 Dimitri John Ledkov systemd (Ubuntu): importance High Critical
2017-03-16 11:16:12 Dimitri John Ledkov systemd (Ubuntu): assignee Dimitri John Ledkov (xnox)
2017-03-20 10:23:34 Emiliano Carmona bug added subscriber Emiliano Carmona
2017-04-04 05:03:54 Alexander Solovets bug added subscriber Alexander Solovets
2017-05-17 07:54:09 Jens bug added subscriber Jens Eich
2017-06-20 13:22:30 Alex Solovyev bug added subscriber Alex Solovyev
2017-08-02 20:17:54 Risto H. Kurppa bug added subscriber Risto H. Kurppa
2017-09-07 08:51:30 max moro attachment added screenshot https://bugs.launchpad.net/ubuntu/+source/systemd/+bug/1587142/+attachment/4945763/+files/reached_target.png
2017-09-08 14:26:19 Dr. Thomas Orgis bug watch added https://bugzilla.redhat.com/show_bug.cgi?id=752593
2017-09-08 14:26:19 Dr. Thomas Orgis bug watch added https://bugzilla.redhat.com/show_bug.cgi?id=785739
2017-09-08 14:33:19 Dr. Thomas Orgis bug added subscriber Dr. Thomas Orgis
2017-09-29 22:34:51 Marek bug watch added https://bugs.centos.org/view.php?id=13916
2019-06-20 09:39:58 Manoj Kumar bug added subscriber Manoj Kumar
2020-02-07 16:05:38 Joshua Diamant bug added subscriber Joshua Diamant
2022-09-24 21:21:21 Tammy Alexander bug watch removed https://bugzilla.redhat.com/show_bug.cgi?id=752593