We found that the reason of puppet being stuck is hanging of sshd while reading output from update-master-node.sh script. There are some logs:
1) ps ouput with stucked sshd fork (pid 11866):
06:root 11866 0.1 0.0 136812 2404 ? Ss 09:16 0:03 sshd: root@notty 107-root 11892 0.0 0.0 52700 792 ? Ss 09:16 0:00 \_ /usr/libexec/openssh/sftp-server 108-root 17818 0.0 0.0 113128 1468 ? Ss 09:33 0:00 \_ /bin/bash /usr/share/fuel-utils/update-master-node.sh 109-root 17825 0.0 0.0 113124 824 ? S 09:33 0:00 \_ /bin/bash /usr/share/fuel-utils/update-master-node.sh 110-root 17827 0.0 0.0 107896 664 ? S 09:33 0:00 | \_ tee -i /var/log/puppet/update_master_node.log 111-root 17867 0.0 0.0 113128 1436 ? S 09:33 0:00 \_ bash -x /etc/puppet/modules/fuel/examples/deploy.sh 112-root 24787 0.1 1.8 749696 53604 ? Sl 09:35 0:02 \_ /usr/bin/ruby /usr/bin/puppet apply -d -v --color false --detailed-exitcodes /etc/puppet/modules/fuel/examples/keystone_token_disable.pp
2) There are file descriptors opened by sshd:
ls -la /proc/11866/fd/ total 0 dr-x------. 2 root root 0 лют 23 09:31 . dr-xr-xr-x. 9 root root 0 лют 23 09:16 .. lrwx------. 1 root root 64 лют 23 09:31 0 -> /dev/null lrwx------. 1 root root 64 лют 23 09:31 1 -> /dev/null lr-x------. 1 root root 64 лют 23 09:31 10 -> pipe:[34793] l-wx------. 1 root root 64 лют 23 09:35 11 -> pipe:[204872] lr-x------. 1 root root 64 лют 23 09:31 12 -> pipe:[34794] lr-x------. 1 root root 64 лют 23 09:35 13 -> pipe:[204873] lr-x------. 1 root root 64 лют 23 09:35 15 -> pipe:[204874] lrwx------. 1 root root 64 лют 23 09:31 2 -> /dev/null lrwx------. 1 root root 64 лют 23 09:31 3 -> socket:[34621] lrwx------. 1 root root 64 лют 23 09:31 4 -> socket:[34681] lr-x------. 1 root root 64 лют 23 09:31 5 -> pipe:[34687] l-wx------. 1 root root 64 лют 23 09:31 6 -> /run/systemd/sessions/2.ref l-wx------. 1 root root 64 лют 23 09:31 7 -> pipe:[34687] l-wx------. 1 root root 64 лют 23 09:31 9 -> pipe:[34792]
FD 13 is descriptor of pipe connected to update-master-node.sh stdout
FD 3 is socket of ssh connection
3) netstat output:
tcp 0 0 10.109.35.2:22 10.109.35.1:50157 ESTABLISHED
lsof -n | grep 34621 sshd 11866 root 3u IPv4 34621 0t0 TCP 10.109.35.2:ssh->10.109.35.1:50157 (ESTABLISHED)
4) Fragment of strace of sshd:
http://paste.openstack.org/show/600221/
As you can see for some reason FD13 disappeared from SSHD select() syscall. Compare lines 4 and 17 of the above fragment. The exact reason of such behavior is still unclear. So we decided to fix this in fuel-qa code.
We found that the reason of puppet being stuck is hanging of sshd while reading output from update- master- node.sh script. There are some logs:
1) ps ouput with stucked sshd fork (pid 11866):
06:root 11866 0.1 0.0 136812 2404 ? Ss 09:16 0:03 sshd: root@notty openssh/ sftp-server fuel-utils/ update- master- node.sh fuel-utils/ update- master- node.sh puppet/ update_ master_ node.log modules/ fuel/examples/ deploy. sh exitcodes /etc/puppet/ modules/ fuel/examples/ keystone_ token_disable. pp
107-root 11892 0.0 0.0 52700 792 ? Ss 09:16 0:00 \_ /usr/libexec/
108-root 17818 0.0 0.0 113128 1468 ? Ss 09:33 0:00 \_ /bin/bash /usr/share/
109-root 17825 0.0 0.0 113124 824 ? S 09:33 0:00 \_ /bin/bash /usr/share/
110-root 17827 0.0 0.0 107896 664 ? S 09:33 0:00 | \_ tee -i /var/log/
111-root 17867 0.0 0.0 113128 1436 ? S 09:33 0:00 \_ bash -x /etc/puppet/
112-root 24787 0.1 1.8 749696 53604 ? Sl 09:35 0:02 \_ /usr/bin/ruby /usr/bin/puppet apply -d -v --color false --detailed-
2) There are file descriptors opened by sshd:
ls -la /proc/11866/fd/ sessions/ 2.ref
total 0
dr-x------. 2 root root 0 лют 23 09:31 .
dr-xr-xr-x. 9 root root 0 лют 23 09:16 ..
lrwx------. 1 root root 64 лют 23 09:31 0 -> /dev/null
lrwx------. 1 root root 64 лют 23 09:31 1 -> /dev/null
lr-x------. 1 root root 64 лют 23 09:31 10 -> pipe:[34793]
l-wx------. 1 root root 64 лют 23 09:35 11 -> pipe:[204872]
lr-x------. 1 root root 64 лют 23 09:31 12 -> pipe:[34794]
lr-x------. 1 root root 64 лют 23 09:35 13 -> pipe:[204873]
lr-x------. 1 root root 64 лют 23 09:35 15 -> pipe:[204874]
lrwx------. 1 root root 64 лют 23 09:31 2 -> /dev/null
lrwx------. 1 root root 64 лют 23 09:31 3 -> socket:[34621]
lrwx------. 1 root root 64 лют 23 09:31 4 -> socket:[34681]
lr-x------. 1 root root 64 лют 23 09:31 5 -> pipe:[34687]
l-wx------. 1 root root 64 лют 23 09:31 6 -> /run/systemd/
l-wx------. 1 root root 64 лют 23 09:31 7 -> pipe:[34687]
l-wx------. 1 root root 64 лют 23 09:31 9 -> pipe:[34792]
FD 13 is descriptor of pipe connected to update- master- node.sh stdout
FD 3 is socket of ssh connection
3) netstat output:
tcp 0 0 10.109.35.2:22 10.109.35.1:50157 ESTABLISHED
lsof -n | grep 34621 35.2:ssh- >10.109. 35.1:50157 (ESTABLISHED)
sshd 11866 root 3u IPv4 34621 0t0 TCP 10.109.
4) Fragment of strace of sshd:
http:// paste.openstack .org/show/ 600221/
As you can see for some reason FD13 disappeared from SSHD select() syscall. Compare lines 4 and 17 of the above fragment. The exact reason of such behavior is still unclear. So we decided to fix this in fuel-qa code.