queens: overcloud-deploy.sh fail with mistra timed out waiting for ping module test success: SSH Error: data could not be sent to remote host

Bug #1790144 reported by Sorin Sbarnea
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Alex Schultz

Bug Description

This happens only on queens with tripleo-ci-centos-7-scenario001-multinode-oooq-container jobs.
Same jobs do work on master, pike, rocky based on http://cistatus.tripleo.org/

2018-08-31 07:58:01Z [overcloud.AllNodesDeploySteps.WorkflowTasks_Step2_Execution]: CREATE_FAILED resources.WorkflowTasks_Step2_Execution: ERROR
2018-08-31 07:58:10 | 2018-08-31 07:58:01Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2018-08-31 07:58:10 | 2018-08-31 07:58:02Z [overcloud.AllNodesDeploySteps]: CREATE_FAILED resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2018-08-31 07:58:10 | 2018-08-31 07:58:02Z [overcloud]: CREATE_FAILED Resource CREATE failed: resources.AllNodesDeploySteps: Resource CREATE failed: resources.WorkflowTasks_Step2_Execution: ERROR
2018-08-31 07:58:10 | Stack overcloud CREATE_FAILED

ERROR mistral.executors.default_executor

Unexpected error while running command.
Command: ansible-playbook -vvvvv /tmp/ansible-mistral-actionm9Bj3V/playbook.yaml --user tripleo-admin --inventory-file /tmp/ansible-mistral-actionm9Bj3V/inventory.yaml --private-key /tmp/ansible-mistral-actionm9Bj3V/ssh_private_key

FAILED! => {\n "changed": false, \n "elapsed": 366, \n "failed": true, \n "msg": "timed out waiting for ping module test success: SSH Error: data could not be sent to remote host \\"192.168.24.3\\". Make sure this host can be reached over ssh"\n}\n\nPLAY RECAP *********************************************************************\n192.168.24.3 : ok=0 changed=0 unreachable=0 failed=1

http://logs.openstack.org/57/595357/1/check/tripleo-ci-centos-7-scenario001-multinode-oooq-container/a3f7856/logs/undercloud/var/log/mistral/executor.log.txt.gz#_2018-08-30_00_40_15_950

Revision history for this message
Alex Schultz (alex-schultz) wrote :

192.168.24.3 is the undercloud, so it seems kinda odd that mistral (on the undercloud) can't ssh to the undercloud.

Revision history for this message
Alex Schultz (alex-schultz) wrote :

So i spun up a reproducer and 192.168.24.3 is the second node but tripleo-admin is not being created

Revision history for this message
Alex Schultz (alex-schultz) wrote :

Looks like it may be improperly calling the create_admin_via_nova workflow when it should be calling create_admin_via_ssh

- Created at: '2018-08-31 21:22:32'
  Description: sub-workflow execution
  ID: 25fd9ae8-a792-4cdc-96f8-c3b326feb881
  State: ERROR
  State info: Failure caused by error i...
  Task Execution ID: 5c2e38fb-a840-4e81-ab42-a0c0001ba4d0
  Updated at: '2018-08-31 21:28:46'
  Workflow ID: 5149d6cf-9969-43db-b6c8-0533646f7383
  Workflow name: tripleo.storage.v1.ceph-install
  Workflow namespace: ''
- Created at: '2018-08-31 21:22:32'
  Description: Heat managed
  ID: 78344a31-6584-490e-84ed-b8087c02af72
  State: ERROR
  State info: Failure caused by error i...
  Task Execution ID: <none>
  Updated at: '2018-08-31 21:28:47'
  Workflow ID: 640dbffc-c619-46dc-a561-cdf9d85d1278
  Workflow name: tripleo.overcloud.workflow_tasks.step2
  Workflow namespace: ''
- Created at: '2018-08-31 21:22:34'
  Description: sub-workflow execution
  ID: 88d4ab5d-ecb4-4133-ab70-83c39ff1ce03
  State: ERROR
  State info: Failure caused by error i...
  Task Execution ID: 4e877976-7e16-412b-8209-a1a60f7d0a8a
  Updated at: '2018-08-31 21:28:45'
  Workflow ID: 64401d75-8044-4c89-859d-bebdce8cf5b2
  Workflow name: tripleo.access.v1.enable_ssh_admin
  Workflow namespace: ''
- Created at: '2018-08-31 21:22:35'
  Description: sub-workflow execution
  ID: 6f5a1a77-9eb2-4504-9c99-0105526bdcc6
  State: ERROR
  State info: Failure caused by error i...
  Task Execution ID: ab030502-3239-4616-9a31-0078ab734538
  Updated at: '2018-08-31 21:28:44'
  Workflow ID: d8c63d2f-b00a-44de-82d1-86c48cb1f3f6
  Workflow name: tripleo.access.v1.create_admin_via_nova
  Workflow namespace: ''

Revision history for this message
Alex Schultz (alex-schultz) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart (master)

Fix proposed to branch: master
Review: https://review.openstack.org/599088

Changed in tripleo:
assignee: nobody → Alex Schultz (alex-schultz)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart (master)

Reviewed: https://review.openstack.org/599088
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=a0aad6e280af998b7283b465c3d83f518971d16c
Submitter: Zuul
Branch: master

commit a0aad6e280af998b7283b465c3d83f518971d16c
Author: Alex Schultz <email address hidden>
Date: Fri Aug 31 16:21:56 2018 -0600

    Fix queens fs016 fs019

    In queens, the ceph-ansible workflow does not work with config download.
    These jobs should not be using config-download. This was improperly
    converted as part of I881f92e6cef4de58a9731f03669e42bc862964ec. This is
    a partial revert of that patch for featureset016 and featureset019. The
    featureset documentation indicates that these are not configured with
    config download.

    https://docs.openstack.org/tripleo-quickstart/latest/feature-configuration.html

    Change-Id: Ib6c3ca7886c5e4adb0883c501bfbb390e15d108f
    Closes-Bug: #1790144
    Related-Bug: #1789416

Changed in tripleo:
status: In Progress → Fix Released
Changed in tripleo:
milestone: rocky-rc2 → stein-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart 2.1.1

This issue was fixed in the openstack/tripleo-quickstart 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.