tripleo

ceph-ansible collect_uuid task times out during ssh

Bug #1745108 reported by John Fulton on 2018-01-24

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Won't Fix	Low	John Fulton	tripleo rocky-3

Bug Description

When deploying an HCI two-node overcloud in an underpowered virtual environment the step2 execution of mistral fails on the task collect_nodes_uuid. Investigation via Mistral [2] showed that the collect_nodes_uuid task timed out with Ansible's "Timeout (12s) waiting for privilege escalation prompt:". Though ANSIBLE_SSH_RETRIES was set to a higher value to account for this env as per the docs [1], that parameter does not get to the collect_nodes_uuid task. It might make sense to merge some parameters from CephAnsibleEnvironmentVariables into all ansible playbooks in the ceph-ansible workbook where appropriate.

[1]

https://docs.openstack.org/tripleo-docs/latest/install/advanced_deployment/ceph_config.html#override-ansible-run-options

[2] Mistral investigation details:

UUID=6fe3c833-afea-4660-aebd-efd23c833965
mistral task-list $UUID

TASK_ID=6cb7e439-1453-4a4c-a0c6-30560647f5bd
mistral task-get-result $TASK_ID | jq . | sed -e 's/\\n/\n/g' -e 's/\\"/"/g'

...

Exit code: 2
Stdout: u'{\
    "plays": [\
        {\
            "play": {\
                "id": "003c73f7-022e-cecd-8908-000000000007", \
                "name": "overcloud"\
            }, \
            "tasks": [\
                {\
                    "hosts": {\
                        "192.168.24.15": {\
                            "failed": true, \
                            "msg": "Timeout (12s) waiting for privilege escalation prompt: "\
                        }, \
                        "192.168.24.9": {\
                            "failed": true, \
                            "msg": "Timeout (12s) waiting for privilege escalation prompt: "\
                        }\
                    }, \
                    "task": {\
                        "id": "003c73f7-022e-cecd-8908-000000000009", \
                        "name": "collect machine id"\
                    }\
                }\
            ]\
        }\
    ], \
    "stats": {\
        "192.168.24.15": {\
            "changed": 0, \
            "failures": 1, \
            "ok": 0, \
            "skipped": 0, \
            "unreachable": 0\
        }, \
        "192.168.24.9": {\
            "changed": 0, \
            "failures": 1, \
            "ok": 0, \
            "skipped": 0, \
            "unreachable": 0\
        }\
    }\
}\
'
Stderr: u''"
(undercloud) [stack@undercloud tripleo-ceph-ansible]$

Tags:

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-01-24: Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.openstack.org/537662

Changed in tripleo:
status:	Triaged → In Progress

Emilien Macchi (emilienm) on 2018-01-26

Changed in tripleo:
milestone:	queens-3 → queens-rc1

Revision history for this message

John Fulton (jfulton-org) wrote on 2018-01-29:

- Root cause of this bug is SSH server DNS lookups of incoming host
- This is still a good feature to have (consistent ansible behavior), but not as pressing.

Changed in tripleo:
importance:	Medium → Low
milestone:	queens-rc1 → rocky-3

Revision history for this message

John Fulton (jfulton-org) wrote on 2018-05-03:

Ansible behavior is consistent in Rocky with config-download. Cleaning up this old bug.

Changed in tripleo:
status:	In Progress → Won't Fix

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-05-03: Change abandoned on tripleo-common (master)

Change abandoned by John Fulton (<email address hidden>) on branch: master
Review: https://review.openstack.org/537662

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.