TLDR - when tasks take too long (likely due to ssh instability), tripleo-quickstart is prone to fail with "Timeout (12s) waiting for privilege escalation prompt" errors
Potential mitigations include:
- increasing the timeout via TQ:ansible.cfg.
- upgrade to ansible 2.3.1, to pick up https://github.com/ansible/ansible/pull/23710
---
Longer version...
Intermittently we have been having CI failures in RDO Phase 2, where we use tripleo-quickstart to run a variety of CI jobs to validate RDO on HA, bare metal, and other configurations. The CI debugging trello card is here:
- https://trello.com/c/e3zbRidd/261-rdophase2-ansible-ssh-timeouts-in-become-module-timeout-12s-waiting-for-privilege-escalation-prompt
Here's a few (concrete) examples:
===
- https://thirdparty.logs.rdoproject.org/jenkins-promote-rhel-pike-rdo_trunk-virtha-3ctlr_1comp_192gb-3/console.txt.gz
It actually happens a few times in tasks that are ignored during teardown, until failing on something (not ignored) here:
```
21:18:09 TASK [environment/teardown : Remove bridge whitelisting from qemu bridge helper] ***
21:18:09 task path: /home/rhos-ci/jenkins/workspace/promote-rhel-pike-rdo_trunk-virtha-3ctlr_1comp_192gb/tripleo-quickstart/roles/environment/teardown/tasks/main.yml:46
21:18:09 Tuesday 29 August 2017 21:18:09 +0000 (0:00:00.229) 0:04:37.703 ********
21:18:21 <haa-08.ha.lab.eng.bos.redhat.com> ssh_retry: attempt: 0, caught exception(Timeout (12s) waiting for privilege escalation prompt: ) from cmd (/bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-diprjprzcicizfblssoufgxasxhdlbna; /usr/bin/python'"'"' && sleep 0'...), pausing for 0 seconds
21:18:33 <haa-08.ha.lab.eng.bos.redhat.com> ssh_retry: attempt: 1, caught exception(Timeout (12s) waiting for privilege escalation prompt: ) from cmd (/bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-diprjprzcicizfblssoufgxasxhdlbna; /usr/bin/python'"'"' && sleep 0'...), pausing for 1 seconds
21:18:46 <haa-08.ha.lab.eng.bos.redhat.com> ssh_retry: attempt: 2, caught exception(Timeout (12s) waiting for privilege escalation prompt: ) from cmd (/bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-diprjprzcicizfblssoufgxasxhdlbna; /usr/bin/python'"'"' && sleep 0'...), pausing for 3 seconds
21:19:01 fatal: [haa-08.ha.lab.eng.bos.redhat.com]: FAILED! => {"failed": true, "msg": "Timeout (12s) waiting for privilege escalation prompt: "}
```
- https://thirdparty.logs.rdoproject.org/jenkins-oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans-27/console.txt.gz
```
TASK [repo-setup : Setup repos on live host] ***********************************
task path: /home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/tripleo-quickstart/roles/repo-setup/tasks/setup_repos.yml:1
Tuesday 29 August 2017 17:43:24 -0400 (0:00:00.247) 0:36:34.460 ********
Using module file /home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/lib/python2.7/site-packages/ansible/modules/core/commands/command.py
<undercloud> ESTABLISH SSH CONNECTION FOR USER: stack
<undercloud> SSH: EXEC ssh -vvv -F /home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/ssh.config.ansible -o StrictHostKeyChecking=no -o 'IdentityFile="/home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/id_rsa_undercloud"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=stack -o ConnectTimeout=10 -F /home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/ssh.config.ansible undercloud '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-pxlzdbjebzfjsbnptwrvskdngvvgbfld; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<undercloud> ssh_retry: attempt: 0, caught exception(Timeout (12s) waiting for privilege escalation prompt: ) from cmd (/bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-pxlzdbjebzfjsbnptwrvskdngvvgbfld; /usr/bin/python'"'"' && sleep 0'...), pausing for 0 seconds
<undercloud> ESTABLISH SSH CONNECTION FOR USER: stack
<undercloud> SSH: EXEC ssh -vvv -F /home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/ssh.config.ansible -o StrictHostKeyChecking=no -o 'IdentityFile="/home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/id_rsa_undercloud"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=stack -o ConnectTimeout=10 -F /home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/ssh.config.ansible undercloud '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-pxlzdbjebzfjsbnptwrvskdngvvgbfld; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<undercloud> ssh_retry: attempt: 1, caught exception(Timeout (12s) waiting for privilege escalation prompt: ) from cmd (/bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-pxlzdbjebzfjsbnptwrvskdngvvgbfld; /usr/bin/python'"'"' && sleep 0'...), pausing for 1 seconds
<undercloud> ESTABLISH SSH CONNECTION FOR USER: stack
<undercloud> SSH: EXEC ssh -vvv -F /home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/ssh.config.ansible -o StrictHostKeyChecking=no -o 'IdentityFile="/home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/id_rsa_undercloud"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=stack -o ConnectTimeout=10 -F /home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/ssh.config.ansible undercloud '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-pxlzdbjebzfjsbnptwrvskdngvvgbfld; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
<undercloud> ssh_retry: attempt: 2, caught exception(Timeout (12s) waiting for privilege escalation prompt: ) from cmd (/bin/sh -c 'sudo -H -S -n -u root /bin/sh -c '"'"'echo BECOME-SUCCESS-pxlzdbjebzfjsbnptwrvskdngvvgbfld; /usr/bin/python'"'"' && sleep 0'...), pausing for 3 seconds
<undercloud> ESTABLISH SSH CONNECTION FOR USER: stack
<undercloud> SSH: EXEC ssh -vvv -F /home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/ssh.config.ansible -o StrictHostKeyChecking=no -o 'IdentityFile="/home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/id_rsa_undercloud"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=stack -o ConnectTimeout=10 -F /home/rhos-ci/jenkins/workspace/oooq-pike-rdo_trunk-bmu-haa16-lab-float_nic_with_vlans/ssh.config.ansible undercloud '/bin/sh -c '"'"'sudo -H -S -n -u root /bin/sh -c '"'"'"'"'"'"'"'"'echo BECOME-SUCCESS-pxlzdbjebzfjsbnptwrvskdngvvgbfld; /usr/bin/python'"'"'"'"'"'"'"'"' && sleep 0'"'"''
fatal: [undercloud]: FAILED! => {
"failed": true,
"msg": "Timeout (12s) waiting for privilege escalation prompt: "
}
```
As this is intermittent, Importance is not 'critical' - however as this jams the production chain when it does occur, setting to 'high'.