Symptoms:
When the <Save and Quite> key pressed, the script hangs on systemctl start docker.
Actually the docker.service wait docker-storage storage creating LVM volume, which in his turn
hang up on the semaphore.
Here the CLI investigation
=========================================================================
(gdb) bt
#0 0x00007f7ce75884e7 in semop () from /lib64/libc.so.6
#1 0x00007f7ce7cb40cc in dm_udev_wait () from /lib64/libdevmapper.so.1.02
#2 0x00007f7ce88537bd in fs_unlock ()
#3 0x00007f7ce8860a05 in _file_lock_resource ()
#4 0x00007f7ce87f3878 in _lock_vol ()
#5 0x00007f7ce87f41c3 in lock_vol ()
#6 0x00007f7ce8808b6c in wipe_lv ()
#7 0x00007f7ce880c54f in _lv_create_an_lv ()
#8 0x00007f7ce880d55f in lv_create_single ()
#9 0x00007f7ce879e13a in lvcreate ()
#10 0x00007f7ce87a63d8 in lvm_run_command ()
#11 0x00007f7ce87a6e40 in lvm2_main ()
#12 0x00007f7ce74b1b15 in __libc_start_main () from /lib64/libc.so.6
#13 0x00007f7ce878de41 in _start ()
(gdb)
The docker-storage hangs on semaphore, because to udev has been stopped for processing incoming events in queue. Udev is stopped by puppet during executing lnx.rb
Root cause analysis:
Symptoms:
When the <Save and Quite> key pressed, the script hangs on systemctl start docker.
Actually the docker.service wait docker-storage storage creating LVM volume, which in his turn
hang up on the semaphore.
Here the CLI investigation ======= ======= ======= ======= ======= ======= ======= ======= ======= === libdevmapper. so.1.02
=======
(gdb) bt
#0 0x00007f7ce75884e7 in semop () from /lib64/libc.so.6
#1 0x00007f7ce7cb40cc in dm_udev_wait () from /lib64/
#2 0x00007f7ce88537bd in fs_unlock ()
#3 0x00007f7ce8860a05 in _file_lock_resource ()
#4 0x00007f7ce87f3878 in _lock_vol ()
#5 0x00007f7ce87f41c3 in lock_vol ()
#6 0x00007f7ce8808b6c in wipe_lv ()
#7 0x00007f7ce880c54f in _lv_create_an_lv ()
#8 0x00007f7ce880d55f in lv_create_single ()
#9 0x00007f7ce879e13a in lvcreate ()
#10 0x00007f7ce87a63d8 in lvm_run_command ()
#11 0x00007f7ce87a6e40 in lvm2_main ()
#12 0x00007f7ce74b1b15 in __libc_start_main () from /lib64/libc.so.6
#13 0x00007f7ce878de41 in _start ()
(gdb)
# ipcs -a
------ Message Queues --------
key msqid owner perms used-bytes messages
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
------ Semaphore Arrays --------
key semid owner perms nsems
0x0d4dbf3c 98304 root 600 1
# ipcs -sti 98304
Semaphore Array semid=98304
uid=0 gid=0 cuid=0 cgid=0
mode=0600, access_perms=0600
nsems = 1
otime = Wed Jul 6 12:29:04 2016
ctime = Wed Jul 6 12:29:04 2016
semnum value ncount zcount pid
0 1 0 0 25896
# ps xawf -eo pid,ppid, user,cgroup, args
1 0 root - /usr/lib/ systemd/ systemd --switched-root --system --deserialize 21 /system. slice/sys /usr/lib/ systemd/ systemd- journald /system. slice/lvm /usr/sbin/lvmetad -f /system. slice/sys /usr/lib/ systemd/ systemd- udevd /system. slice/aud /sbin/auditd -n /system. slice/irq /usr/sbin/ irqbalance --foreground /system. slice/dbu /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd- activation /system. slice/sys /usr/lib/ systemd/ systemd- logind /system. slice/cro /usr/sbin/crond -n /system. slice/wpa /usr/sbin/ wpa_supplicant -u -f /var/log/ wpa_supplicant. log -c /etc/wpa_ supplicant/ wpa_supplicant. conf -u -f /var/log/ wpa_supplica /system. slice/pol /usr/lib/ polkit- 1/polkitd --no-debug /system. slice/rsy /usr/sbin/rsyslogd -n /system. slice/ntp /usr/sbin/ntpd -u ntp:ntp -g /system. slice,1: n /usr/sbin/sshd -D systemd: /user.slice/ \_ sshd: root@pts/0 systemd: /user.slice/ | \_ -bash systemd: /user.slice/ | \_ lvdisplay systemd: /user.slice/ \_ sshd: root@pts/1 systemd: /user.slice/ | \_ -bash systemd: /user.slice/ | \_ gdb attach 25896 systemd: /user.slice/ \_ sshd: root@pts/2 systemd: /user.slice/ \_ -bash systemd: /user.slice/ \_ ps xawf -eo pid,ppid, user,cgroup, args /system. slice,1: n /bin/bash /usr/bin/ docker- storage- setup /system. slice,1: n \_ lvcreate -y -L 20472s -n docker-poolmeta docker ======= ======= ======= ======= ======= ======= ======= ======= ======= ===
491 1 root 7:devices:
505 1 root 7:devices:
512 1 root 7:devices:
662 1 root 7:devices:
686 1 root 7:devices:
687 1 dbus 7:devices:
692 1 root 7:devices:
698 1 root 7:devices:
715 1 root 7:devices:
716 1 polkitd 7:devices:
10693 1 root 7:devices:
11272 1 ntp 7:devices:
25796 1 root 7:devices:
26982 25796 root 1:name=
26986 26982 root 1:name=
27011 26986 root 1:name=
27039 25796 root 1:name=
27043 27039 root 1:name=
27279 27043 root 1:name=
29387 25796 root 1:name=
29391 29387 root 1:name=
29543 29391 root 1:name=
25861 1 root 7:devices:
25896 25861 root 7:devices:
=======
Udev at that particular moment doesn't process any messages.
Here the output of udevadm monitor: ======= ======= ======= ======= ======= ======= ======= ======= ======= ===
=======
# udevadm monitor
KERNEL[3666.987063] remove /devices/ virtual/ block/dm- 7 (block) slab/:t- 0000320 (slab) virtual/ bdi/253: 7 (bdi) virtual/ block/dm- 7 (block) virtual/ net/docker0 (net) virtual/ net/docker0/ queues/ rx-0 (queues) virtual/ net/docker0/ queues/ tx-0 (queues)
KERNEL[3666.999446] remove /kernel/
KERNEL[3667.000505] remove /devices/
KERNEL[3667.001290] remove /devices/
KERNEL[3667.075290] add /devices/
KERNEL[3667.076606] add /devices/
KERNEL[3667.076626] add /devices/
======= ======= ======= ======= ======= ======= ======= ======= ======= ======= ==
There is no any UDEV processing message, just events from the kernel.
Further investigation shows, that udev was stopped
/var/log/ puppet/ fuelmenu- puppet. log ======= ======= ======= ======= ======= ======= ======= ======= ======= == ======= ======= ======= ======= ======= ======= ======= ======= ======= ==
=======
2016-07-13 07:57:32 +0000 Puppet (debug): Executing '/sbin/udevadm control --stop-exec-queue'
=======
The reason is following peace of code, which stops udev execution for some reason:
https:/ /github. com/openstack/ fuel-library/ blob/master/ deployment/ puppet/ l23network/ lib/puppet/ provider/ disable_ hotplug/ lnx.rb# L31
Conclusion:
The docker-storage hangs on semaphore, because to udev has been stopped for processing incoming events in queue. Udev is stopped by puppet during executing lnx.rb