sshd stops reading puppet output on applying keystone.pp during fuel-qa auto tests

Bug #1664635 reported by Ivan Suzdal on 2017-02-14
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Nikita Karpin

Bug Description

Detailed bug description:
For some reason puppet stuck on applying /etc/puppet/modules/fuel/examples/keystone_token_disable.pp in our systests.
At the same time manually applying works fine.

Failed systest example https://packaging-ci.infra.mirantis.net/job/master-pkg-systest-centos/2206/console
As you can see, systest failed due to timeout.

Step to reproduce:
Do any change in any _centos_ package and send it to review.
In my case even version change was enough.

Expected results:
Systest will success.

Actual results:
Systest failed.

Reproducibility:
Always.

Ivan Suzdal (isuzdal) wrote :
Changed in fuel:
status: New → Confirmed
milestone: none → 11.0
tags: added: area-puppet
Nikita Karpin (mkarpin) wrote :

trying to strace puppet run, it hangs on

/usr/bin/python2 /usr/bin/openstack user show --format shell monitord --domain 058cf5ffa0e9418ab11f22cc1a9cfe16

trying to find the cause...

Nikita Karpin (mkarpin) on 2017-02-17
Changed in fuel:
status: Confirmed → In Progress
Nikita Karpin (mkarpin) wrote :

the problem is not in manifests or puppet or keystone, it is some magic between fuel-devops/qa + bash + puppet output, because when I redirected puppet logging to syslog, update-master-node.sh didn't get stuck - https://custom-ci.infra.mirantis.net/view/11.0/job/11.0.custom.system_test/128/console

We found that the reason of puppet being stuck is hanging of sshd while reading output from update-master-node.sh script. There are some logs:

1) ps ouput with stucked sshd fork (pid 11866):

06:root 11866 0.1 0.0 136812 2404 ? Ss 09:16 0:03 sshd: root@notty
107-root 11892 0.0 0.0 52700 792 ? Ss 09:16 0:00 \_ /usr/libexec/openssh/sftp-server
108-root 17818 0.0 0.0 113128 1468 ? Ss 09:33 0:00 \_ /bin/bash /usr/share/fuel-utils/update-master-node.sh
109-root 17825 0.0 0.0 113124 824 ? S 09:33 0:00 \_ /bin/bash /usr/share/fuel-utils/update-master-node.sh
110-root 17827 0.0 0.0 107896 664 ? S 09:33 0:00 | \_ tee -i /var/log/puppet/update_master_node.log
111-root 17867 0.0 0.0 113128 1436 ? S 09:33 0:00 \_ bash -x /etc/puppet/modules/fuel/examples/deploy.sh
112-root 24787 0.1 1.8 749696 53604 ? Sl 09:35 0:02 \_ /usr/bin/ruby /usr/bin/puppet apply -d -v --color false --detailed-exitcodes /etc/puppet/modules/fuel/examples/keystone_token_disable.pp

2) There are file descriptors opened by sshd:

ls -la /proc/11866/fd/
total 0
dr-x------. 2 root root 0 лют 23 09:31 .
dr-xr-xr-x. 9 root root 0 лют 23 09:16 ..
lrwx------. 1 root root 64 лют 23 09:31 0 -> /dev/null
lrwx------. 1 root root 64 лют 23 09:31 1 -> /dev/null
lr-x------. 1 root root 64 лют 23 09:31 10 -> pipe:[34793]
l-wx------. 1 root root 64 лют 23 09:35 11 -> pipe:[204872]
lr-x------. 1 root root 64 лют 23 09:31 12 -> pipe:[34794]
lr-x------. 1 root root 64 лют 23 09:35 13 -> pipe:[204873]
lr-x------. 1 root root 64 лют 23 09:35 15 -> pipe:[204874]
lrwx------. 1 root root 64 лют 23 09:31 2 -> /dev/null
lrwx------. 1 root root 64 лют 23 09:31 3 -> socket:[34621]
lrwx------. 1 root root 64 лют 23 09:31 4 -> socket:[34681]
lr-x------. 1 root root 64 лют 23 09:31 5 -> pipe:[34687]
l-wx------. 1 root root 64 лют 23 09:31 6 -> /run/systemd/sessions/2.ref
l-wx------. 1 root root 64 лют 23 09:31 7 -> pipe:[34687]
l-wx------. 1 root root 64 лют 23 09:31 9 -> pipe:[34792]

FD 13 is descriptor of pipe connected to update-master-node.sh stdout

FD 3 is socket of ssh connection

3) netstat output:

tcp 0 0 10.109.35.2:22 10.109.35.1:50157 ESTABLISHED

lsof -n | grep 34621
sshd 11866 root 3u IPv4 34621 0t0 TCP 10.109.35.2:ssh->10.109.35.1:50157 (ESTABLISHED)

4) Fragment of strace of sshd:

http://paste.openstack.org/show/600221/

As you can see for some reason FD13 disappeared from SSHD select() syscall. Compare lines 4 and 17 of the above fragment. The exact reason of such behavior is still unclear. So we decided to fix this in fuel-qa code.

Nikita Karpin (mkarpin) on 2017-02-23
summary: - Puppet stuck on applying keystone_token_disable.pp
+ sshd stops reading puppet output on applying keystone.pp
summary: - sshd stops reading puppet output on applying keystone.pp
+ sshd stops reading puppet output on applying keystone.pp during fuel-qa
+ auto tests

Change abandoned by Mykyta Karpin (<email address hidden>) on branch: master
Review: https://review.openstack.org/436982
Reason: in favor of https://review.openstack.org/#/c/437373/

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/437373
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=ee0bf1cfa424aca4c2eb65a5d8506661b8e3346a
Submitter: Jenkins
Branch: master

commit ee0bf1cfa424aca4c2eb65a5d8506661b8e3346a
Author: Mykyta Karpin <email address hidden>
Date: Thu Feb 23 15:22:10 2017 +0200

    Redirect update-master-node.sh stdout and stderr

    SSHD stops reading of update-master-node.sh stdout
    during puppet run, this causes puppet hanging on
    keystone tasks. In order to avoid this we need to
    redirect script's output to /dev/null.

    Change-Id: I99959cb72caeec33a91358af4b58fa858b9c22c8
    Closes-Bug: #1664635

Reviewed: https://review.openstack.org/438915
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=b431c985530508ec590d618e4381d5d8dd886dbf
Submitter: Jenkins
Branch: stable/newton

commit b431c985530508ec590d618e4381d5d8dd886dbf
Author: Mykyta Karpin <email address hidden>
Date: Thu Feb 23 15:22:10 2017 +0200

    Redirect update-master-node.sh stdout and stderr

    SSHD stops reading of update-master-node.sh stdout
    during puppet run, this causes puppet hanging on
    keystone tasks. In order to avoid this we need to
    redirect script's output to /dev/null.

    Change-Id: I99959cb72caeec33a91358af4b58fa858b9c22c8
    Closes-Bug: #1664635
    (cherry picked from commit ee0bf1cfa424aca4c2eb65a5d8506661b8e3346a)

tags: added: in-stable-newton
tags: added: in-stable-ocata

Reviewed: https://review.openstack.org/438914
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=cb4a6873534c8103a43d17209e19b475809d55e4
Submitter: Jenkins
Branch: stable/ocata

commit cb4a6873534c8103a43d17209e19b475809d55e4
Author: Mykyta Karpin <email address hidden>
Date: Thu Feb 23 15:22:10 2017 +0200

    Redirect update-master-node.sh stdout and stderr

    SSHD stops reading of update-master-node.sh stdout
    during puppet run, this causes puppet hanging on
    keystone tasks. In order to avoid this we need to
    redirect script's output to /dev/null.

    Change-Id: I99959cb72caeec33a91358af4b58fa858b9c22c8
    Closes-Bug: #1664635
    (cherry picked from commit ee0bf1cfa424aca4c2eb65a5d8506661b8e3346a)

This issue was fixed in the openstack/fuel-qa 11.0.0.0rc2 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers