So after applying https://review.openstack.org/#/c/374791/ to noop the postdeployment, the major-upgrade-pacmaker-init step completes successfully. Interestingly after running the major-upgrade-pacemaker step:
openstack overcloud deploy --templates /home/stack/tripleo-heat-templates --libvirt-type qemu \
--control-flavor oooq_control --compute-flavor oooq_compute \
--ceph-storage-flavor oooq_ceph --timeout 75 \
--control-scale 3 --neutron-network-type vxlan --neutron-tunnel-types vxlan \
-e /home/stack/tripleo-heat-templates/overcloud-resource-registry-puppet.yaml \
-e /home/stack/tripleo-heat-templates/environments/network-isolation.yaml \
-e /home/stack/tripleo-heat-templates/environments/net-single-nic-with-vlans.yaml \
-e $HOME/network-environment.yaml \
-e /home/stack/tripleo-heat-templates/environments/puppet-pacemaker.yaml \
-e /home/stack/tripleo-heat-templates/environments/major-upgrade-pacemaker.yaml \
--ntp-server clock.redhat.com \
${DEPLOY_ENV_YAML:+-e $DEPLOY_ENV_YAML} || deploy_status=1
It seems that heat times out:
2016-09-22 09:54:49Z [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-09-22 09:54:51Z [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-09-22 09:54:56Z [2]: SIGNAL_COMPLETE Unknown
2016-09-22 09:54:58Z [ControllerDeployment]: SIGNAL_COMPLETE Unknown
2016-09-22 09:55:00Z [NetworkDeployment]: SIGNAL_COMPLETE Unknown
2016-09-22 11:04:25Z [UpdateWorkflow]: UPDATE_FAILED UPDATE aborted
2016-09-22 11:04:25Z [overcloud]: UPDATE_FAILED Timed out
2016-09-22 11:04:25Z [CephMonUpgradeDeployment]: CREATE_FAILED CREATE aborted
2016-09-22 11:04:25Z [overcloud-UpdateWorkflow-nqo3h2msfua6]: UPDATE_FAILED Operation cancelled
Stack overcloud UPDATE_FAILED
I have reproduced it twice now and *after* the NewtorkDeployment is COMPLETE nothing happens until a timeout kicks in. The resources are:
+--------------------------+-----------------------------------+--------------------+----------------------+
| resource_name |resource_type | resource_status | updated_time |
+--------------------------+-----------------------------------+--------------------+----------------------+
| UpdateWorkflow |OS::TripleO::Tasks::UpdateWorkflow | UPDATE_IN_PROGRESS | 2016-09-22T11:33:51Z |
| CephMonUpgradeDeployment |OS::Heat::SoftwareDeploymentGroup | UPDATE_IN_PROGRESS | 2016-09-22T11:33:52Z |
| Controller |OS::Heat::SoftwareDeployment | CREATE_IN_PROGRESS | 2016-09-22T11:33:52Z |
| get_param |OS::Heat::SoftwareDeployment | CREATE_IN_PROGRESS | 2016-09-22T11:33:53Z |
+--------------------------+-----------------------------------+--------------------+----------------------+
It seems heat is constantly stuck in this loop:
2016-09-22 11:44:37.014 1404 DEBUG heat.engine.scheduler [req-c89c0266-ed18-4287-ac26-3b88a2c05220 - - - - -] Task _run_update from SoftwareDeploymentGroup "CephMonUpgradeDeployment" [fcf70ad4-1338-4868-bf4e-e02a59a280df] Stack "overcloud-UpdateWorkflow-nqo3h2msfua6" [c59f2f0c-b30b-482d-a9c7-92e62667f658] running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:216
2016-09-22 11:44:37.637 1406 DEBUG heat.engine.scheduler [req-d53c4a24-aa62-41fa-8781-6c092c7478a9 - - - - -] Task update_task from Stack "overcloud-UpdateWorkflow-nqo3h2msfua6-CephMonUpgradeDeployment-j3pp67lv7kx6" [fcf70ad4-1338-4868-bf4e-e02a59a280df] running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:216
2016-09-22 11:44:37.638 1406 DEBUG heat.engine.scheduler [req-d53c4a24-aa62-41fa-8781-6c092c7478a9 - - - - -] Task Stack "overcloud-UpdateWorkflow-nqo3h2msfua6-CephMonUpgradeDeployment-j3pp67lv7kx6" [fcf70ad4-1338-4868-bf4e-e02a59a280df] Update running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:216
2016-09-22 11:44:37.638 1406 DEBUG heat.engine.scheduler [req-d53c4a24-aa62-41fa-8781-6c092c7478a9 - - - - -] Task _resource_update from Stack "overcloud-UpdateWorkflow-nqo3h2msfua6-CephMonUpgradeDeployment-j3pp67lv7kx6" [fcf70ad4-1338-4868-bf4e-e02a59a280df] Update running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:216
2016-09-22 11:44:37.668 1406 DEBUG heat.engine.scheduler [req-d53c4a24-aa62-41fa-8781-6c092c7478a9 - - - - -] Task _resource_update from Stack "overcloud-UpdateWorkflow-nqo3h2msfua6-CephMonUpgradeDeployment-j3pp67lv7kx6" [fcf70ad4-1338-4868-bf4e-e02a59a280df] Update running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:216
2016-09-22 11:44:37.692 1406 DEBUG heat.engine.scheduler [req-d53c4a24-aa62-41fa-8781-6c092c7478a9 - - - - -] Task update_task from Stack "overcloud-UpdateWorkflow-nqo3h2msfua6-CephMonUpgradeDeployment-j3pp67lv7kx6" [fcf70ad4-1338-4868-bf4e-e02a59a280df] sleeping _sleep /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:157
2016-09-22 11:44:38.020 1404 DEBUG heat.engine.scheduler [req-c89c0266-ed18-4287-ac26-3b88a2c05220 - - - - -] Task _run_update from SoftwareDeploymentGroup "CephMonUpgradeDeployment" [fcf70ad4-1338-4868-bf4e-e02a59a280df] Stack "overcloud-UpdateWorkflow-nqo3h2msfua6" [c59f2f0c-b30b-482d-a9c7-92e62667f658] running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:216
I have the system where this occurred twice and I am not touching it. If anyone wants to take a peek
So if I remove all references of CephMon in extraconfig/ tasks/major_ upgrade_ pacemaker. yaml I still get the issue and it all hangs on: :Tasks: :UpdateWorkflow | UPDATE_IN_PROGRESS | 2016-09- 23T08:35: 39Z | overcloud makerUpgradeDep loyment_ Step1 | OS::Heat: :SoftwareDeploy mentGroup | CREATE_IN_PROGRESS | 2016-09- 23T08:35: 43Z | overcloud- UpdateWorkflow- 6otltc4s2uij :SoftwareDeploy ment | CREATE_IN_PROGRESS | 2016-09- 23T08:35: 44Z | overcloud- UpdateWorkflow- 6otltc4s2uij- ControllerPacem akerUpgra Step1-v26irogdb skp :SoftwareDeploy ment | CREATE_IN_PROGRESS | 2016-09- 23T08:35: 44Z | overcloud- UpdateWorkflow- 6otltc4s2uij- ControllerPacem akerUpgra Step1-v26irogdb skp
UpdateWorkflow | OS::TripleO:
ControllerPace
Controller | OS::Heat:
deDeployment_
get_param | OS::Heat:
deDeployment_