Comment 8 for bug 1593207

Revision history for this message
Albert Syriy (asyriy) wrote :

Root cause analysis:

Symptoms:
When the <Save and Quite> key pressed, the script hangs on systemctl start docker.
Actually the docker.service wait docker-storage storage creating LVM volume, which in his turn
hang up on the semaphore.

Here the CLI investigation
=========================================================================
(gdb) bt
#0 0x00007f7ce75884e7 in semop () from /lib64/libc.so.6
#1 0x00007f7ce7cb40cc in dm_udev_wait () from /lib64/libdevmapper.so.1.02
#2 0x00007f7ce88537bd in fs_unlock ()
#3 0x00007f7ce8860a05 in _file_lock_resource ()
#4 0x00007f7ce87f3878 in _lock_vol ()
#5 0x00007f7ce87f41c3 in lock_vol ()
#6 0x00007f7ce8808b6c in wipe_lv ()
#7 0x00007f7ce880c54f in _lv_create_an_lv ()
#8 0x00007f7ce880d55f in lv_create_single ()
#9 0x00007f7ce879e13a in lvcreate ()
#10 0x00007f7ce87a63d8 in lvm_run_command ()
#11 0x00007f7ce87a6e40 in lvm2_main ()
#12 0x00007f7ce74b1b15 in __libc_start_main () from /lib64/libc.so.6
#13 0x00007f7ce878de41 in _start ()
(gdb)

# ipcs -a

------ Message Queues --------
key msqid owner perms used-bytes messages

------ Shared Memory Segments --------
key shmid owner perms bytes nattch status

------ Semaphore Arrays --------
key semid owner perms nsems
0x0d4dbf3c 98304 root 600 1

# ipcs -sti 98304

Semaphore Array semid=98304
uid=0 gid=0 cuid=0 cgid=0
mode=0600, access_perms=0600
nsems = 1
otime = Wed Jul 6 12:29:04 2016
ctime = Wed Jul 6 12:29:04 2016
semnum value ncount zcount pid
0 1 0 0 25896

# ps xawf -eo pid,ppid,user,cgroup,args

    1 0 root - /usr/lib/systemd/systemd --switched-root --system --deserialize 21
  491 1 root 7:devices:/system.slice/sys /usr/lib/systemd/systemd-journald
  505 1 root 7:devices:/system.slice/lvm /usr/sbin/lvmetad -f
  512 1 root 7:devices:/system.slice/sys /usr/lib/systemd/systemd-udevd
  662 1 root 7:devices:/system.slice/aud /sbin/auditd -n
  686 1 root 7:devices:/system.slice/irq /usr/sbin/irqbalance --foreground
  687 1 dbus 7:devices:/system.slice/dbu /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
  692 1 root 7:devices:/system.slice/sys /usr/lib/systemd/systemd-logind
  698 1 root 7:devices:/system.slice/cro /usr/sbin/crond -n
  715 1 root 7:devices:/system.slice/wpa /usr/sbin/wpa_supplicant -u -f /var/log/wpa_supplicant.log -c /etc/wpa_supplicant/wpa_supplicant.conf -u -f /var/log/wpa_supplica
  716 1 polkitd 7:devices:/system.slice/pol /usr/lib/polkit-1/polkitd --no-debug
10693 1 root 7:devices:/system.slice/rsy /usr/sbin/rsyslogd -n
11272 1 ntp 7:devices:/system.slice/ntp /usr/sbin/ntpd -u ntp:ntp -g
25796 1 root 7:devices:/system.slice,1:n /usr/sbin/sshd -D
26982 25796 root 1:name=systemd:/user.slice/ \_ sshd: root@pts/0
26986 26982 root 1:name=systemd:/user.slice/ | \_ -bash
27011 26986 root 1:name=systemd:/user.slice/ | \_ lvdisplay
27039 25796 root 1:name=systemd:/user.slice/ \_ sshd: root@pts/1
27043 27039 root 1:name=systemd:/user.slice/ | \_ -bash
27279 27043 root 1:name=systemd:/user.slice/ | \_ gdb attach 25896
29387 25796 root 1:name=systemd:/user.slice/ \_ sshd: root@pts/2
29391 29387 root 1:name=systemd:/user.slice/ \_ -bash
29543 29391 root 1:name=systemd:/user.slice/ \_ ps xawf -eo pid,ppid,user,cgroup,args
25861 1 root 7:devices:/system.slice,1:n /bin/bash /usr/bin/docker-storage-setup
25896 25861 root 7:devices:/system.slice,1:n \_ lvcreate -y -L 20472s -n docker-poolmeta docker
=========================================================================

Udev at that particular moment doesn't process any messages.

Here the output of udevadm monitor:
=========================================================================
# udevadm monitor

KERNEL[3666.987063] remove /devices/virtual/block/dm-7 (block)
KERNEL[3666.999446] remove /kernel/slab/:t-0000320 (slab)
KERNEL[3667.000505] remove /devices/virtual/bdi/253:7 (bdi)
KERNEL[3667.001290] remove /devices/virtual/block/dm-7 (block)
KERNEL[3667.075290] add /devices/virtual/net/docker0 (net)
KERNEL[3667.076606] add /devices/virtual/net/docker0/queues/rx-0 (queues)
KERNEL[3667.076626] add /devices/virtual/net/docker0/queues/tx-0 (queues)

========================================================================

There is no any UDEV processing message, just events from the kernel.

Further investigation shows, that udev was stopped

/var/log/puppet/fuelmenu-puppet.log
========================================================================
2016-07-13 07:57:32 +0000 Puppet (debug): Executing '/sbin/udevadm control --stop-exec-queue'
========================================================================

The reason is following peace of code, which stops udev execution for some reason:

https://github.com/openstack/fuel-library/blob/master/deployment/puppet/l23network/lib/puppet/provider/disable_hotplug/lnx.rb#L31

Conclusion:

The docker-storage hangs on semaphore, because to udev has been stopped for processing incoming events in queue. Udev is stopped by puppet during executing lnx.rb