sshd stops reading puppet output on applying keystone.pp during fuel-qa auto tests

Bug #1664635 reported by Ivan Suzdal
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Nikita Karpin

Bug Description

Detailed bug description:
For some reason puppet stuck on applying /etc/puppet/modules/fuel/examples/keystone_token_disable.pp in our systests.
At the same time manually applying works fine.

Failed systest example https://packaging-ci.infra.mirantis.net/job/master-pkg-systest-centos/2206/console
As you can see, systest failed due to timeout.

Step to reproduce:
Do any change in any _centos_ package and send it to review.
In my case even version change was enough.

Expected results:
Systest will success.

Actual results:
Systest failed.

Reproducibility:
Always.

Revision history for this message
Ivan Suzdal (isuzdal) wrote :
Changed in fuel:
status: New → Confirmed
milestone: none → 11.0
tags: added: area-puppet
Revision history for this message
Nikita Karpin (mkarpin) wrote :

trying to strace puppet run, it hangs on

/usr/bin/python2 /usr/bin/openstack user show --format shell monitord --domain 058cf5ffa0e9418ab11f22cc1a9cfe16

trying to find the cause...

Nikita Karpin (mkarpin)
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Nikita Karpin (mkarpin) wrote :

the problem is not in manifests or puppet or keystone, it is some magic between fuel-devops/qa + bash + puppet output, because when I redirected puppet logging to syslog, update-master-node.sh didn't get stuck - https://custom-ci.infra.mirantis.net/view/11.0/job/11.0.custom.system_test/128/console

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-library (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/436982

Revision history for this message
Nikita Karpin (mkarpin) wrote : Re: Puppet stuck on applying keystone_token_disable.pp

We found that the reason of puppet being stuck is hanging of sshd while reading output from update-master-node.sh script. There are some logs:

1) ps ouput with stucked sshd fork (pid 11866):

06:root 11866 0.1 0.0 136812 2404 ? Ss 09:16 0:03 sshd: root@notty
107-root 11892 0.0 0.0 52700 792 ? Ss 09:16 0:00 \_ /usr/libexec/openssh/sftp-server
108-root 17818 0.0 0.0 113128 1468 ? Ss 09:33 0:00 \_ /bin/bash /usr/share/fuel-utils/update-master-node.sh
109-root 17825 0.0 0.0 113124 824 ? S 09:33 0:00 \_ /bin/bash /usr/share/fuel-utils/update-master-node.sh
110-root 17827 0.0 0.0 107896 664 ? S 09:33 0:00 | \_ tee -i /var/log/puppet/update_master_node.log
111-root 17867 0.0 0.0 113128 1436 ? S 09:33 0:00 \_ bash -x /etc/puppet/modules/fuel/examples/deploy.sh
112-root 24787 0.1 1.8 749696 53604 ? Sl 09:35 0:02 \_ /usr/bin/ruby /usr/bin/puppet apply -d -v --color false --detailed-exitcodes /etc/puppet/modules/fuel/examples/keystone_token_disable.pp

2) There are file descriptors opened by sshd:

ls -la /proc/11866/fd/
total 0
dr-x------. 2 root root 0 лют 23 09:31 .
dr-xr-xr-x. 9 root root 0 лют 23 09:16 ..
lrwx------. 1 root root 64 лют 23 09:31 0 -> /dev/null
lrwx------. 1 root root 64 лют 23 09:31 1 -> /dev/null
lr-x------. 1 root root 64 лют 23 09:31 10 -> pipe:[34793]
l-wx------. 1 root root 64 лют 23 09:35 11 -> pipe:[204872]
lr-x------. 1 root root 64 лют 23 09:31 12 -> pipe:[34794]
lr-x------. 1 root root 64 лют 23 09:35 13 -> pipe:[204873]
lr-x------. 1 root root 64 лют 23 09:35 15 -> pipe:[204874]
lrwx------. 1 root root 64 лют 23 09:31 2 -> /dev/null
lrwx------. 1 root root 64 лют 23 09:31 3 -> socket:[34621]
lrwx------. 1 root root 64 лют 23 09:31 4 -> socket:[34681]
lr-x------. 1 root root 64 лют 23 09:31 5 -> pipe:[34687]
l-wx------. 1 root root 64 лют 23 09:31 6 -> /run/systemd/sessions/2.ref
l-wx------. 1 root root 64 лют 23 09:31 7 -> pipe:[34687]
l-wx------. 1 root root 64 лют 23 09:31 9 -> pipe:[34792]

FD 13 is descriptor of pipe connected to update-master-node.sh stdout

FD 3 is socket of ssh connection

3) netstat output:

tcp 0 0 10.109.35.2:22 10.109.35.1:50157 ESTABLISHED

lsof -n | grep 34621
sshd 11866 root 3u IPv4 34621 0t0 TCP 10.109.35.2:ssh->10.109.35.1:50157 (ESTABLISHED)

4) Fragment of strace of sshd:

http://paste.openstack.org/show/600221/

As you can see for some reason FD13 disappeared from SSHD select() syscall. Compare lines 4 and 17 of the above fragment. The exact reason of such behavior is still unclear. So we decided to fix this in fuel-qa code.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (master)

Fix proposed to branch: master
Review: https://review.openstack.org/437373

Nikita Karpin (mkarpin)
summary: - Puppet stuck on applying keystone_token_disable.pp
+ sshd stops reading puppet output on applying keystone.pp
summary: - sshd stops reading puppet output on applying keystone.pp
+ sshd stops reading puppet output on applying keystone.pp during fuel-qa
+ auto tests
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Mykyta Karpin (<email address hidden>) on branch: master
Review: https://review.openstack.org/436982
Reason: in favor of https://review.openstack.org/#/c/437373/

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/437373
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=ee0bf1cfa424aca4c2eb65a5d8506661b8e3346a
Submitter: Jenkins
Branch: master

commit ee0bf1cfa424aca4c2eb65a5d8506661b8e3346a
Author: Mykyta Karpin <email address hidden>
Date: Thu Feb 23 15:22:10 2017 +0200

    Redirect update-master-node.sh stdout and stderr

    SSHD stops reading of update-master-node.sh stdout
    during puppet run, this causes puppet hanging on
    keystone tasks. In order to avoid this we need to
    redirect script's output to /dev/null.

    Change-Id: I99959cb72caeec33a91358af4b58fa858b9c22c8
    Closes-Bug: #1664635

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/438914

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-qa (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/438915

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (stable/newton)

Reviewed: https://review.openstack.org/438915
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=b431c985530508ec590d618e4381d5d8dd886dbf
Submitter: Jenkins
Branch: stable/newton

commit b431c985530508ec590d618e4381d5d8dd886dbf
Author: Mykyta Karpin <email address hidden>
Date: Thu Feb 23 15:22:10 2017 +0200

    Redirect update-master-node.sh stdout and stderr

    SSHD stops reading of update-master-node.sh stdout
    during puppet run, this causes puppet hanging on
    keystone tasks. In order to avoid this we need to
    redirect script's output to /dev/null.

    Change-Id: I99959cb72caeec33a91358af4b58fa858b9c22c8
    Closes-Bug: #1664635
    (cherry picked from commit ee0bf1cfa424aca4c2eb65a5d8506661b8e3346a)

tags: added: in-stable-newton
tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-qa (stable/ocata)

Reviewed: https://review.openstack.org/438914
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=cb4a6873534c8103a43d17209e19b475809d55e4
Submitter: Jenkins
Branch: stable/ocata

commit cb4a6873534c8103a43d17209e19b475809d55e4
Author: Mykyta Karpin <email address hidden>
Date: Thu Feb 23 15:22:10 2017 +0200

    Redirect update-master-node.sh stdout and stderr

    SSHD stops reading of update-master-node.sh stdout
    during puppet run, this causes puppet hanging on
    keystone tasks. In order to avoid this we need to
    redirect script's output to /dev/null.

    Change-Id: I99959cb72caeec33a91358af4b58fa858b9c22c8
    Closes-Bug: #1664635
    (cherry picked from commit ee0bf1cfa424aca4c2eb65a5d8506661b8e3346a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-qa 11.0.0.0rc2

This issue was fixed in the openstack/fuel-qa 11.0.0.0rc2 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.