ceph-ansible collect_uuid task times out during ssh

Bug #1745108 reported by John Fulton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Won't Fix
Low
John Fulton

Bug Description

When deploying an HCI two-node overcloud in an underpowered virtual environment the step2 execution of mistral fails on the task collect_nodes_uuid. Investigation via Mistral [2] showed that the collect_nodes_uuid task timed out with Ansible's "Timeout (12s) waiting for privilege escalation prompt:". Though ANSIBLE_SSH_RETRIES was set to a higher value to account for this env as per the docs [1], that parameter does not get to the collect_nodes_uuid task. It might make sense to merge some parameters from CephAnsibleEnvironmentVariables into all ansible playbooks in the ceph-ansible workbook where appropriate.

[1]

https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#override-ansible-run-options

[2] Mistral investigation details:

(undercloud) [stack@undercloud tripleo-ceph-ansible]$ mistral execution-list | grep ceph
| 12dad531-921c-4fb8-8401-a8352d54f41b | 412797f7-5435-4e0e-ac28-7b4c9ff2d3b9 | tripleo.storage.v1.ceph-install | sub-workflow execution | 6fe3c833-afea-4660-aebd-efd23c833965 | ERROR | Failure caused by error i... | 2018-01-22 17:08:14 | 2018-01-22 17:30:39 |
(undercloud) [stack@undercloud tripleo-ceph-ansible]$

UUID=6fe3c833-afea-4660-aebd-efd23c833965
mistral task-list $UUID

| f7df66b1-63ca-4776-9281-cffc1da3375a | set_ip_lists | tripleo.storage.v1.ceph-install | 12dad531-921c-4fb8-8401-a8352d54f41b | SUCCESS | None | 2018-01-22 17:21:07 | 2018-01-22 17:21:07 |
| 6cb7e439-1453-4a4c-a0c6-30560647f5bd | collect_nodes_uuid | tripleo.storage.v1.ceph-install | 12dad531-921c-4fb8-8401-a8352d54f41b | ERROR | Failed to run action [act... | 2018-01-22 17:21:08 | 2018-01-22 17:30:39 |

TASK_ID=6cb7e439-1453-4a4c-a0c6-30560647f5bd
mistral task-get-result $TASK_ID | jq . | sed -e 's/\\n/\n/g' -e 's/\\"/"/g'

...

Exit code: 2
Stdout: u'{\
    "plays": [\
        {\
            "play": {\
                "id": "003c73f7-022e-cecd-8908-000000000007", \
                "name": "overcloud"\
            }, \
            "tasks": [\
                {\
                    "hosts": {\
                        "192.168.24.15": {\
                            "failed": true, \
                            "msg": "Timeout (12s) waiting for privilege escalation prompt: "\
                        }, \
                        "192.168.24.9": {\
                            "failed": true, \
                            "msg": "Timeout (12s) waiting for privilege escalation prompt: "\
                        }\
                    }, \
                    "task": {\
                        "id": "003c73f7-022e-cecd-8908-000000000009", \
                        "name": "collect machine id"\
                    }\
                }\
            ]\
        }\
    ], \
    "stats": {\
        "192.168.24.15": {\
            "changed": 0, \
            "failures": 1, \
            "ok": 0, \
            "skipped": 0, \
            "unreachable": 0\
        }, \
        "192.168.24.9": {\
            "changed": 0, \
            "failures": 1, \
            "ok": 0, \
            "skipped": 0, \
            "unreachable": 0\
        }\
    }\
}\
'
Stderr: u''"
(undercloud) [stack@undercloud tripleo-ceph-ansible]$

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.openstack.org/537662

Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: queens-3 → queens-rc1
Revision history for this message
John Fulton (jfulton-org) wrote :

- Root cause of this bug is SSH server DNS lookups of incoming host
- This is still a good feature to have (consistent ansible behavior), but not as pressing.

Changed in tripleo:
importance: Medium → Low
milestone: queens-rc1 → rocky-3
Revision history for this message
John Fulton (jfulton-org) wrote :

Ansible behavior is consistent in Rocky with config-download. Cleaning up this old bug.

Changed in tripleo:
status: In Progress → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-common (master)

Change abandoned by John Fulton (<email address hidden>) on branch: master
Review: https://review.openstack.org/537662

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.