Fuel for OpenStack

Mysql redeployment fails after you stop deployment of first controller

Series 6.1.x
Bug #1371471

Bug #1371471 reported by Dennis Dmitriev on 2014-09-19

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Committed	High	Vitaly Kramskikh	Fuel for OpenStack 6.1
5.1.x	Won't Fix	Medium	Registry Administrators	Fuel for OpenStack 5.1
6.0.x	Won't Fix	Medium	Registry Administrators	Fuel for OpenStack 6.0
6.1.x	Fix Committed	High	Vitaly Kramskikh	Fuel for OpenStack 6.1

Bug Description

Re-deployment has failed after action "Stop deploy".

Steps to reproduce:
    1) Start deploying cluster: Ubuntu HA, Neutron VLAN, Ceph for images and volumes, Murano, 3 Controllers, 1 Compute, 3 Cephs. Network bonding (balance slb) enabled, "Management", "Private" and "Storage" networks assigned to a bond interface.
    2) Wait until first controller became "Ready" and press "Stop deploy"
    3) Wait until nodes return to "Pending ..." state
    * At this moment, the first controller remains in "Ready" state and it wasn't bootstrapped or rebooted.
    4) Start deploy.

After Ubuntu was installed on the other nodes, the first controller started to re-deploing and failed with "Error" state.

Diagnostic snapshot attached. Second deployment on node-1 started at "2014-09-18 22:56:31" in the logs.

In the puppet log for node-1:
=====================
2014-09-18 23:05:22 ERR
(/Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/unless) Check "ceph mon stat | grep 192.168.20.3" exceeded timeout
...
2014-09-18 23:46:06 ERR
(/Stage[main]/Galera/Service[mysql-service]/ensure) change from stopped to running failed: execution expired
...
and so on
=====================

Tags:

Revision history for this message

Dennis Dmitriev (ddmitriev) wrote on 2014-09-19:

fuel-snapshot-2014-09-19_08-45-09.tgz Edit (56.3 MiB, application/x-tar)

Ihor Kalnytskyi (ikalnytskyi) on 2014-09-19

Changed in fuel:
assignee:	Fuel for Openstack (fuel) → Fuel Library Team (fuel-library)

Revision history for this message

Irina Povolotskaya (ipovolotskaya) wrote on 2014-09-19:

Should it be put into Release notes? Is there any information on workaround or everything is just in progress?

Vladimir Kuklin (vkuklin) on 2014-09-19

summary:

- Ubuntu HA: "Timeout of deployment is exceeded."
+ Mysql redeployment fails after you stop deployment of first controller

Revision history for this message

Vladimir Kuklin (vkuklin) wrote on 2014-09-19:

I would rather say that this issue is not high priority as stop of deployment may lead to intermittent issues on the environment and there is no guarantee that you redeploy will succeed. Actually, stop of the deployment should have rebooted the first controller and put it into bootstrapped state. But there is a workaround available - simply press "Reset" button.

Changed in fuel:
assignee:	Fuel Library Team (fuel-library) → Fuel Astute Team (fuel-astute)
tags:	added: release-notes
no longer affects:	fuel/6.0.x

Bogdan Dobrelya (bogdando) on 2014-09-19

Changed in fuel:
status:	New → Triaged
importance:	Undecided → Medium

Revision history for this message

Igor Zinovik (izinovik) wrote on 2014-12-12:

fuel-snapshot-2014-12-12_08-40-54.tgz Edit (19.1 MiB, application/x-tar)

Seems that yesterday I hitted this bug.

Steps to reproduce:
1. I've configured following cluster:
    3 controllers
    1 cinder node (VMDK backend)
    vCenter as compute option
2. Start deployment
3. Wait till first controller (primary) is ready
4. Press 'Stop deployment' button
5. Wait till nodes are ready for deployment
6. Hit 'Deploy changes' button

Expected result: working OpenStack environment with vCenter as hypervisor
Actual result: deployment failed

Seems that this issue is 100% reproducable. I made several tries (3 or 4) and
result was always the same. Further puppet runs does not solve the problem;
I mean that I run puppet after failed deployment.

Here is a link to test (vcenter_ha_stop_deployment) which I run:
https://review.openstack.org/#/c/125048/21/fuelweb_test/tests/test_vcenter.py

I'm attaching diagnostic snapshot.

Vladimir Sharshov (vsharshov) on 2015-04-01

tags:

added: module-astute

Dmitry Pyzhov (dpyzhov) on 2015-04-01

Changed in fuel:
milestone:	5.1 → 7.0
status:	Triaged → Confirmed

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2015-04-01:

Move it to 7.0. We still could not get guarantee that stop deployment operation will not damage cluster. And cluster reset operation still worked workaround.

Revision history for this message

Dmitry Pyzhov (dpyzhov) wrote on 2015-04-23:

Our possible solutions for this bug:
1) Warning message "stop deployment is dangerous during controllers setup and could lead to broken state of the cluster"
2) We could re-run puppet on controllers on deployment on stopped environments. This needs to be checked with library team because it can be unacceptable solution
3) We can drop controllers back to 'discovered' state on deployment stop

Vladimir K, we need your expertise here.

Changed in fuel:
milestone:	7.0 → 6.1
no longer affects:	fuel/7.0.x

Revision history for this message

Dmitry Pyzhov (dpyzhov) wrote on 2015-04-24:

After discussion with Sergey Golovatiuk we chose option (1). UI guys, could you implement a warning?

tags:

added: ui

Revision history for this message

Vitaly Kramskikh (vkramskikh) wrote on 2015-04-24:

How does this warning comply with the current message "Any progress will be lost and nodes will be reset to their pre-deployment state."? Shouldn't controllers be reset to the pre-deployment state?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-24: Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/177355

Changed in fuel:
assignee:	Fuel UI Team (fuel-ui) → Vitaly Kramskikh (vkramskikh)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-28: Change abandoned on fuel-web (master)

#10

Change abandoned by Vitaly Kramskikh (<email address hidden>) on branch: master
Review: https://review.openstack.org/177355
Reason: Done as a part of https://review.openstack.org/#/c/178214/