Fuel for OpenStack

[astute] OS install failure blocks environment

Bug #1374376 reported by Damia Pastor on 2014-09-26

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	High	Vladimir Sharshov	Fuel for OpenStack 6.1

Bug Description

Scenario:

While deploying multiple nodes using Ubuntu, one of the nodes has a hardware failure and restarts. While Fuel tries to reinstall again, either stop nor reset will work, leaving the process in a loop.

Steps to reproduce:

- Deploy multiple nodes
- During OS installation (tested with Ubuntu), hard reset the node simulating a hardware failure
- Either by using GUI or CLI, stop the process.

Expected behaviour:

- Deployment is stopped and Fuel goes back to the pre-deploy status for the environment

Actual behaviour:

- The deployment blocks itself, repeating in a loop the OS installation. There is no cancel option.

Notes:

- We also tried a "reset" from fuel cli and deleting the environment.

Tags:

Aleksey Kasatkin (alekseyk-ru) on 2014-09-26

Changed in fuel:
assignee:	nobody → Fuel Library Team (fuel-library)
importance:	Undecided → High
milestone:	none → 6.0

Revision history for this message

Damia Pastor (magradallegir) wrote on 2014-09-26:

Hi Aleksey,

I think that this was caused by an idle user syndrome: The user gets nervous as the task takes too long, so we start trying things:

- Cancel, Reset, Delete

I will retest if I can still reproduce it, if not, I will open a new ticket to block Reset and delete actions when cancel is ongoing.

Thanks and sorry for the inconvenience.

Revision history for this message

Stanislaw Bogatkin (sbogatkin) wrote on 2014-09-30:

I can't reproduce that.

Tried on CentOS HA with 3 controllers and force reboot one of them, then stop deployment - all stopped ok.
And on Ubuntu simple with 5 nodes and force reboot two of them, then stop deployment - all stopped ok.

Vladimir Kuklin (vkuklin) on 2014-10-01

Changed in fuel:
status:	New → Confirmed
assignee:	Fuel Library Team (fuel-library) → Fuel Astute Team (fuel-astute)

Revision history for this message

Vladimir Sharshov (vsharshov) wrote on 2014-10-06:

This behavior really very strange, because in case of Stop Deployment we kill main process (prevent conflicting commands), then try to access to all cluster's nodes via ssh and erase them. This step can take 5 minutes by default (60 seconds, 5 retries) and after it we remove nodes from Cobbler and after it send nodes to reboot via ssh.

After the above i think that idea from Damià Pastor is closest to the truth.

Unfortunately without logs i could not say more specifically.

Changed in fuel:
status:	Confirmed → Incomplete
assignee:	Fuel Astute Team (fuel-astute) → Vladimir Sharshov (vsharshov)
tags:	added: astute
summary:	- OS install failure blocks environment + [astute]OS install failure blocks environment
summary:	- [astute]OS install failure blocks environment + [astute] OS install failure blocks environment

Revision history for this message

Damia Pastor (magradallegir) wrote on 2014-10-06:

No problem Vladimir,

I will try to work better my case and provide logs. I think there are two points of interest:

a) What happens if Fuel can't reach one of the nodes, scheduled for deployment.

b) Should we block any interaction with the cluster (delete, reset) once it is in stop process.

I will work further on these two scenarios and create a new bug if necessary. :)

Revision history for this message

Matthew Mosesohn (raytrac3r) wrote on 2014-11-13:

Is there an update on this bug?

Revision history for this message

Damia Pastor (magradallegir) wrote on 2014-11-14:

Hi Matthew,

I need to differentiate the two cases, and re-test them again:

a) Nodes are not coming back, user only does stop. In 1 of 3 it might happen the operation gets stuck (reseted after 24h).
b) Nodes are not coming back, user stops, then tries to delete the environment. Got stuck in 3 of 3 cases (reseted after 24h).

I will check if it is possible to kill the task through CLI as a feasible workaround, based on support tickets.

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2014-11-21:

Vladimir,
any update on this bug?

Vladimir Sharshov (vsharshov) on 2014-12-11

Changed in fuel:
milestone:	6.0 → 6.1

Oleksiy Molchanov (omolchanov) on 2015-01-09

Changed in fuel:
status:	Incomplete → Invalid

Revision history for this message

Oleksiy Molchanov (omolchanov) wrote on 2015-01-12:

This bug was incomplete for more than 4 weeks. We cannot investigate it further so we are setting the status to Invalid. If you think it is not correct, please feel free to provide requested information and reopen the bug, and we will look into it further.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.