Pacemaker cluster fails to start in CI jobs

Bug #1867744 reported by Arx Cruz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Expired
Critical
Unassigned

Bug Description

2020-03-17 03:36:53 | FAILED - RETRYING: Wait for puppet host configuration to finish (3 retries left).
2020-03-17 03:36:54 | FAILED - RETRYING: Wait for puppet host configuration to finish (3 retries left).
2020-03-17 03:36:56 | FAILED - RETRYING: Wait for puppet host configuration to finish (2 retries left).
2020-03-17 03:36:57 | FAILED - RETRYING: Wait for puppet host configuration to finish (2 retries left).
2020-03-17 03:36:59 | FAILED - RETRYING: Wait for puppet host configuration to finish (1 retries left).
2020-03-17 03:37:00 | FAILED - RETRYING: Wait for puppet host configuration to finish (1 retries left).
2020-03-17 03:37:02 | fatal: [overcloud-controller-1]: FAILED! => changed=false
2020-03-17 03:37:02 | ansible_job_id: '921977500183.32607'
2020-03-17 03:37:02 | attempts: 1200
2020-03-17 03:37:02 | failed_when_result: true
2020-03-17 03:37:02 | finished: 0
2020-03-17 03:37:02 | started: 1
2020-03-17 03:37:03 | fatal: [overcloud-controller-2]: FAILED! => changed=false
2020-03-17 03:37:03 | ansible_job_id: '780626708625.32613'
2020-03-17 03:37:03 | attempts: 1200
2020-03-17 03:37:03 | failed_when_result: true
2020-03-17 03:37:03 | finished: 0
2020-03-17 03:37:03 | started: 1
2020-03-17 03:37:03 |
2020-03-17 03:37:03 | NO MORE HOSTS LEFT *************************************************************
2020-03-17 03:37:03 |
2020-03-17 03:37:03 | PLAY RECAP *********************************************************************
2020-03-17 03:37:03 | overcloud-controller-0 : ok=220 changed=126 unreachable=0 failed=1 skipped=13 rescued=0 ignored=0
2020-03-17 03:37:03 | overcloud-controller-1 : ok=214 changed=126 unreachable=0 failed=1 skipped=13 rescued=0 ignored=0
2020-03-17 03:37:03 | overcloud-controller-2 : ok=214 changed=126 unreachable=0 failed=1 skipped=38 rescued=0 ignored=0
2020-03-17 03:37:03 | overcloud-novacompute-0 : ok=190 changed=106 unreachable=0 failed=0 skipped=56 rescued=0 ignored=0
2020-03-17 03:37:03 | undercloud : ok=17 changed=10 unreachable=0 failed=0 skipped=7 rescued=0 ignored=0
2020-03-17 03:37:03 | Tuesday 17 March 2020 03:37:03 +0000 (1:04:18.882) 1:14:10.385 *********
2020-03-17 03:37:03 | ===============================================================================
2020-03-17 03:37:03 | Wait for puppet host configuration to finish ------------------------- 3858.88s
2020-03-17 03:37:03 | tripleo_container_image_prepare : Run tripleo_container_image_prepare logged to: /var/log/tripleo-container-image-prepare.log - 152.54s
2020-03-17 03:37:03 | Run NetworkConfig script ----------------------------------------------- 31.17s
2020-03-17 03:37:03 | Write kolla config json files ------------------------------------------ 27.32s
2020-03-17 03:37:03 | tripleo_firewall : Manage firewall rules ------------------------------- 25.40s
2020-03-17 03:37:03 | tripleo_container_tag : Pull 192.168.24.1:8787/tripleomaster/centos-binary-cinder-volume:a275f603d87e8669657c1bb34fed84de-updated-20200317011120 image -- 23.87s
2020-03-17 03:37:03 | Creating container startup configs for step_4 -------------------------- 21.74s
2020-03-17 03:37:03 | Creating container startup configs for step_3 -------------------------- 11.86s
2020-03-17 03:37:03 | Creating container startup configs for step_2 -------------------------- 10.73s
2020-03-17 03:37:03 | tripleo_container_tag : Pull 192.168.24.1:8787/tripleomaster/centos-binary-mariadb:a275f603d87e8669657c1bb34fed84de-updated-20200317011120 image -- 10.61s
2020-03-17 03:37:03 | tripleo_container_tag : Pull 192.168.24.1:8787/tripleomaster/centos-binary-rabbitmq:a275f603d87e8669657c1bb34fed84de-updated-20200317011120 image -- 10.05s
2020-03-17 03:37:03 | tripleo_container_tag : Pull 192.168.24.1:8787/tripleomaster/centos-binary-ovn-northd:a275f603d87e8669657c1bb34fed84de-updated-20200317011120 image --- 6.27s
2020-03-17 03:37:03 | tripleo_hieradata : Render hieradata from template ---------------------- 6.00s
2020-03-17 03:37:03 | tripleo_firewall : Manage firewall rules -------------------------------- 5.94s
2020-03-17 03:37:03 | Write container config scripts ------------------------------------------ 5.90s
2020-03-17 03:37:03 | tripleo_container_tag : Pull 192.168.24.1:8787/tripleomaster/centos-binary-haproxy:a275f603d87e8669657c1bb34fed84de-updated-20200317011120 image --- 5.57s
2020-03-17 03:37:03 | tripleo_kernel : Set extra sysctl options ------------------------------- 5.19s
2020-03-17 03:37:03 | Creating container startup configs for step_4 --------------------------- 3.47s
2020-03-17 03:37:03 | Gathering Facts --------------------------------------------------------- 3.36s
2020-03-17 03:37:03 | Render all_nodes data as group_vars for overcloud ----------------------- 3.26s
2020-03-17 03:37:04 | Ansible execution failed. playbook: /var/lib/mistral/overcloud/deploy_steps_playbook.yaml, Run Status: failed, Return Code: 2, To rerun the failed command manually execute the following script: /var/lib/mistral/overcloud/ansible-playbook-command.sh
2020-03-17 03:37:05 | Exception occured while running the command
2020-03-17 03:37:05 | Traceback (most recent call last):
2020-03-17 03:37:05 | File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 34, in run
2020-03-17 03:37:05 | super(Command, self).run(parsed_args)
2020-03-17 03:37:05 | File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
2020-03-17 03:37:05 | return super(Command, self).run(parsed_args)
2020-03-17 03:37:05 | File "/usr/lib/python3.6/site-packages/cliff/command.py", line 187, in run
2020-03-17 03:37:05 | return_code = self.take_action(parsed_args) or 0
2020-03-17 03:37:05 | File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 994, in take_action
2020-03-17 03:37:05 | in_flight_validations=parsed_args.inflight
2020-03-17 03:37:05 | File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 467, in config_download
2020-03-17 03:37:05 | tags=tags
2020-03-17 03:37:05 | File "/usr/lib/python3.6/site-packages/tripleoclient/utils.py", line 620, in run_ansible_playbook
2020-03-17 03:37:05 | raise RuntimeError(err_msg)
2020-03-17 03:37:05 | RuntimeError: Ansible execution failed. playbook: /var/lib/mistral/overcloud/deploy_steps_playbook.yaml, Run Status: failed, Return Code: 2, To rerun the failed command manually execute the following script: /var/lib/mistral/overcloud/ansible-playbook-command.sh
2020-03-17 03:37:05 | Ansible execution failed. playbook: /var/lib/mistral/overcloud/deploy_steps_playbook.yaml, Run Status: failed, Return Code: 2, To rerun the failed command manually execute the following script: /var/lib/mistral/overcloud/ansible-playbook-command.sh
2020-03-17 03:37:06 | + status_code=1
2020-03-17 03:37:06 | + openstack stack list
2020-03-17 03:37:06 | + grep -q overcloud
2020-03-17 03:37:10 | + openstack stack list
2020-03-17 03:37:10 | + grep -Eq '(CREATE|UPDATE)_COMPLETE'
2020-03-17 03:37:12 | + openstack overcloud status
2020-03-17 03:37:12 | + grep -Eq DEPLOY_SUCCESS
2020-03-17 03:37:17 | /usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py:551: ResourceWarning: unclosed <ssl.SSLSocket fd=8, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 55630), raddr=('192.168.24.2', 13989)>
2020-03-17 03:37:17 | status = get_deployment_status.run(context=context)
2020-03-17 03:37:17 | sys:1: ResourceWarning: unclosed <ssl.SSLSocket fd=7, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('192.168.24.2', 55012), raddr=('192.168.24.2', 13808)>
2020-03-17 03:37:17 | + openstack overcloud failures
2020-03-17 03:37:20 | + exit 1

Revision history for this message
Arx Cruz (arxcruz) wrote :
Changed in tripleo:
milestone: none → ussuri-3
Revision history for this message
Luke Short (ekultails) wrote :

The real error comes from the execution of this command: `/sbin/pcs cluster start --all`.
It the Pacemaker managed cluster is not starting.

http://paste.openstack.org/show/791085/

We need extra eyes from DFG:PIDONE to help out on this one.

summary: - Mistral failed command manually execute the following script:
- /var/lib/mistral/overcloud/ansible-playbook-command.sh
+ Pacemaker cluster fails to start in CI jobs
wes hayutin (weshayutin)
tags: added: promotion-blocker
tags: added: alert
removed: promotion-blocker
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-3 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Revision history for this message
Marios Andreou (marios-b) wrote :

This is an automated action. Bug status has been set to 'Incomplete' and target milestone has been removed due to inactivity. If you disagree please re-set these values and reach out to us on freenode #tripleo

Changed in tripleo:
milestone: xena-1 → none
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for tripleo because there has been no activity for 60 days.]

Changed in tripleo:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.