Mysql redeployment fails after you stop deployment of first controller

Bug #1371471 reported by Dennis Dmitriev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vitaly Kramskikh
5.1.x
Won't Fix
Medium
Registry Administrators
6.0.x
Won't Fix
Medium
Registry Administrators
6.1.x
Fix Committed
High
Vitaly Kramskikh

Bug Description

Re-deployment has failed after action "Stop deploy".

Steps to reproduce:
    1) Start deploying cluster: Ubuntu HA, Neutron VLAN, Ceph for images and volumes, Murano, 3 Controllers, 1 Compute, 3 Cephs. Network bonding (balance slb) enabled, "Management", "Private" and "Storage" networks assigned to a bond interface.
    2) Wait until first controller became "Ready" and press "Stop deploy"
    3) Wait until nodes return to "Pending ..." state
    * At this moment, the first controller remains in "Ready" state and it wasn't bootstrapped or rebooted.
    4) Start deploy.

After Ubuntu was installed on the other nodes, the first controller started to re-deploing and failed with "Error" state.

Diagnostic snapshot attached. Second deployment on node-1 started at "2014-09-18 22:56:31" in the logs.

In the puppet log for node-1:
=====================
2014-09-18 23:05:22 ERR
 (/Stage[main]/Ceph::Mon/Exec[ceph-deploy mon create]/unless) Check "ceph mon stat | grep 192.168.20.3" exceeded timeout
...
2014-09-18 23:46:06 ERR
 (/Stage[main]/Galera/Service[mysql-service]/ensure) change from stopped to running failed: execution expired
...
and so on
=====================

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Changed in fuel:
assignee: Fuel for Openstack (fuel) → Fuel Library Team (fuel-library)
Revision history for this message
Irina Povolotskaya (ipovolotskaya) wrote :

Should it be put into Release notes? Is there any information on workaround or everything is just in progress?

summary: - Ubuntu HA: "Timeout of deployment is exceeded."
+ Mysql redeployment fails after you stop deployment of first controller
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

I would rather say that this issue is not high priority as stop of deployment may lead to intermittent issues on the environment and there is no guarantee that you redeploy will succeed. Actually, stop of the deployment should have rebooted the first controller and put it into bootstrapped state. But there is a workaround available - simply press "Reset" button.

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Fuel Astute Team (fuel-astute)
tags: added: release-notes
no longer affects: fuel/6.0.x
Changed in fuel:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Igor Zinovik (izinovik) wrote :

Seems that yesterday I hitted this bug.

Steps to reproduce:
1. I've configured following cluster:
    3 controllers
    1 cinder node (VMDK backend)
    vCenter as compute option
2. Start deployment
3. Wait till first controller (primary) is ready
4. Press 'Stop deployment' button
5. Wait till nodes are ready for deployment
6. Hit 'Deploy changes' button

Expected result: working OpenStack environment with vCenter as hypervisor
Actual result: deployment failed

Seems that this issue is 100% reproducable. I made several tries (3 or 4) and
result was always the same. Further puppet runs does not solve the problem;
I mean that I run puppet after failed deployment.

Here is a link to test (vcenter_ha_stop_deployment) which I run:
https://review.openstack.org/#/c/125048/21/fuelweb_test/tests/test_vcenter.py

I'm attaching diagnostic snapshot.

tags: added: module-astute
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 5.1 → 7.0
status: Triaged → Confirmed
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Move it to 7.0. We still could not get guarantee that stop deployment operation will not damage cluster. And cluster reset operation still worked workaround.

Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

Our possible solutions for this bug:
1) Warning message "stop deployment is dangerous during controllers setup and could lead to broken state of the cluster"
2) We could re-run puppet on controllers on deployment on stopped environments. This needs to be checked with library team because it can be unacceptable solution
3) We can drop controllers back to 'discovered' state on deployment stop

Vladimir K, we need your expertise here.

Changed in fuel:
milestone: 7.0 → 6.1
no longer affects: fuel/7.0.x
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

After discussion with Sergey Golovatiuk we chose option (1). UI guys, could you implement a warning?

tags: added: ui
Revision history for this message
Vitaly Kramskikh (vkramskikh) wrote :

How does this warning comply with the current message "Any progress will be lost and nodes will be reset to their pre-deployment state."? Shouldn't controllers be reset to the pre-deployment state?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/177355

Changed in fuel:
assignee: Fuel UI Team (fuel-ui) → Vitaly Kramskikh (vkramskikh)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (master)

Change abandoned by Vitaly Kramskikh (<email address hidden>) on branch: master
Review: https://review.openstack.org/177355
Reason: Done as a part of https://review.openstack.org/#/c/178214/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.