Comment 5 for bug 1664635

Revision history for this message
Nikita Karpin (mkarpin) wrote : Re: Puppet stuck on applying keystone_token_disable.pp

We found that the reason of puppet being stuck is hanging of sshd while reading output from update-master-node.sh script. There are some logs:

1) ps ouput with stucked sshd fork (pid 11866):

06:root 11866 0.1 0.0 136812 2404 ? Ss 09:16 0:03 sshd: root@notty
107-root 11892 0.0 0.0 52700 792 ? Ss 09:16 0:00 \_ /usr/libexec/openssh/sftp-server
108-root 17818 0.0 0.0 113128 1468 ? Ss 09:33 0:00 \_ /bin/bash /usr/share/fuel-utils/update-master-node.sh
109-root 17825 0.0 0.0 113124 824 ? S 09:33 0:00 \_ /bin/bash /usr/share/fuel-utils/update-master-node.sh
110-root 17827 0.0 0.0 107896 664 ? S 09:33 0:00 | \_ tee -i /var/log/puppet/update_master_node.log
111-root 17867 0.0 0.0 113128 1436 ? S 09:33 0:00 \_ bash -x /etc/puppet/modules/fuel/examples/deploy.sh
112-root 24787 0.1 1.8 749696 53604 ? Sl 09:35 0:02 \_ /usr/bin/ruby /usr/bin/puppet apply -d -v --color false --detailed-exitcodes /etc/puppet/modules/fuel/examples/keystone_token_disable.pp

2) There are file descriptors opened by sshd:

ls -la /proc/11866/fd/
total 0
dr-x------. 2 root root 0 лют 23 09:31 .
dr-xr-xr-x. 9 root root 0 лют 23 09:16 ..
lrwx------. 1 root root 64 лют 23 09:31 0 -> /dev/null
lrwx------. 1 root root 64 лют 23 09:31 1 -> /dev/null
lr-x------. 1 root root 64 лют 23 09:31 10 -> pipe:[34793]
l-wx------. 1 root root 64 лют 23 09:35 11 -> pipe:[204872]
lr-x------. 1 root root 64 лют 23 09:31 12 -> pipe:[34794]
lr-x------. 1 root root 64 лют 23 09:35 13 -> pipe:[204873]
lr-x------. 1 root root 64 лют 23 09:35 15 -> pipe:[204874]
lrwx------. 1 root root 64 лют 23 09:31 2 -> /dev/null
lrwx------. 1 root root 64 лют 23 09:31 3 -> socket:[34621]
lrwx------. 1 root root 64 лют 23 09:31 4 -> socket:[34681]
lr-x------. 1 root root 64 лют 23 09:31 5 -> pipe:[34687]
l-wx------. 1 root root 64 лют 23 09:31 6 -> /run/systemd/sessions/2.ref
l-wx------. 1 root root 64 лют 23 09:31 7 -> pipe:[34687]
l-wx------. 1 root root 64 лют 23 09:31 9 -> pipe:[34792]

FD 13 is descriptor of pipe connected to update-master-node.sh stdout

FD 3 is socket of ssh connection

3) netstat output:

tcp 0 0 10.109.35.2:22 10.109.35.1:50157 ESTABLISHED

lsof -n | grep 34621
sshd 11866 root 3u IPv4 34621 0t0 TCP 10.109.35.2:ssh->10.109.35.1:50157 (ESTABLISHED)

4) Fragment of strace of sshd:

http://paste.openstack.org/show/600221/

As you can see for some reason FD13 disappeared from SSHD select() syscall. Compare lines 4 and 17 of the above fragment. The exact reason of such behavior is still unclear. So we decided to fix this in fuel-qa code.