Redeployment of reset environement (after previous, non-related error) failed.

Bug #1453031 reported by Marek Zawadzki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Library (Deprecated)

Bug Description

Redeployment of reset environement (after previous, non-related error) failed.
Deployment failed with the following error: http://172.16.48.59/show/368/

Step to reproduce:

1) Create env (iso fuel-6.1-361):
- Ubuntu HA Neutron with VLAN segmentation
- 1x Controller, Storage - Ceph OSD
- 2x Compute, Storage - Ceph OSD
(proably any configuration including ceph will be affected)

2) Start deployment.

3) Interrupt deployment shortly before the end (Ubuntu installed, disk partitioned) - e.g. by rebooting nodes.

4) Reset environment, redeploy.

Expected result:
- deployed cluster

Actual result:
- deployment error

Suspected reason:
- previous deployment paritions disk for ceph and resetting environemnt doesn't remove these partitions which is a problem for the installer during next deployment

Manual FIX:
- remove all paritions on nodes manually, reboot nodes, reset environment and redeploy

This seems to be still non fixed bug reported here:
https://bugs.launchpad.net/fuel/+bug/1323343
https://bugs.launchpad.net/fuel/+bug/1398096

Revision history for this message
Marek Zawadzki (mzawadzki-f) wrote :
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Marek, this is really, really strange. We do wipe partitions on env reset. Are you 100% sure that your manual fix is working? If so, we need to investigate it further.

Changed in fuel:
assignee: nobody → Fuel Astute Team (fuel-astute)
milestone: none → 6.1
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Marek Zawadzki (mzawadzki-f) wrote :

I am sure - I reset my env and redeployed - hit the same error.

After that I logged in to nodes, removed all partitions manually with parted (there were ~7 of them), rebooted nodes and only then the redeployment would be succesfull.

Changed in fuel:
assignee: Fuel Astute Team (fuel-astute) → Vladimir Sharshov (vsharshov)
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Guys, looks really strange.
In fact, we wipe disk twice: using mcollective agent erase_node and after reboot fuel-agent again wipe and create partition.

Both of this stage was finished successfully.

Try to reproduce using fresh iso #393

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Could not reproduce using ISO #393.

I've checked filesystem after reset and gotten:

[root@bootstrap ~]# lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
NAME FSTYPE SIZE MOUNTPOINT LABEL
vda 50G

NAME FSTYPE SIZE MOUNTPOINT LABEL
vda 50G

[root@bootstrap ~]# lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL
NAME FSTYPE SIZE MOUNTPOINT LABEL
vda 50G

Changed in fuel:
status: Confirmed → Incomplete
Revision history for this message
Ryan Moe (rmoe) wrote :

I think there might be two issues here.

First is the disks. Is it possible you ran into this bug https://bugs.launchpad.net/fuel/+bug/1437511?

The second is the deployment failure. I don't think this is related to the disks. The deployment failed because ceph-deploy osd prepare failed. The reason it failed is because it could not find the bootstrap-osd key. This key is created by a call to ceph-deploy gatherkeys. ceph-deploy gatherkeys was successful (or at least it had an exit code of 0). However, in the diagnostic snapshot I don't see these bootstrap keys in /root. So somehow gatherkeys "succeeded" but the bootstrap keys (and only the bootstrap keys) were missing. gatherkeys was successful on node-2 and node-3.

Changed in fuel:
status: Incomplete → Confirmed
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Fuel library guys, please have a look if new logs with such problem will come.

I move it to incomplete because i could not reproduce it by myself and no new confirmation with such problem anymore.

Changed in fuel:
status: Confirmed → Incomplete
assignee: Vladimir Sharshov (vsharshov) → Fuel Library Team (fuel-library)
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

As for https://bugs.launchpad.net/fuel/+bug/1437511, i agree that i potentially can help to prevent this, but still think that this bug should be set as Incomplete without new bug confirmation.

Revision history for this message
Ryan Moe (rmoe) wrote :

I was not able to reproduce the Ceph issue on ISO 412.

Andrey Maximov (maximov)
summary: - Reployment error:
- /Stage[main]/Ceph::Osds/Ceph::Osds::Osd[/dev/vda5]/Exec[ceph-deploy osd
- prepare node-1:/dev/vda5]/returns) change from notrun to 0 failed: ceph-
- deploy osd prepare node-1:/dev/vda5 returned 1 instead of one of [0]
+ Redeployment of reset environement (after previous, non-related error)
+ failed.
tags: added: on verification
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

I tried to do next steps: 1) Start deployment of the same cluster as described in bug 2) Model failure of deployment on the stage of controller deployment 3) Reset cluster 4) Re-deploy cluster
And cluster was deployed successfully so issue wasn't reproduced

tags: removed: on verification
Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.