Sometimes Fuel fails to install Operating Systems

Bug #1330551 reported by Cristian A Sanchez
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Matthew Mosesohn

Bug Description

Sometimes when deploying a cloud, Fuel fails to boot or install the operating system on nodes.
I will gather some log files to post in this bug.

Changed in fuel:
status: New → Incomplete
Revision history for this message
Mike Scherbakov (mihgen) wrote :

Cristian, thanks for posting a bug. Status was set as incomplete because there is not enough information to do anything with an issue. On the Support tab in Fuel UI you can find Diagnostic Snapshot button, please generate a bundle and attach it here - it will contain all the logs from your env.
Also, steps to reproduce will shorten the time for initial bug classification.

Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 5.1
Revision history for this message
Cristian A Sanchez (cristian-a-sanchez) wrote :

These are the steps I followed to reproduce:

I have 7 nodes in my environment.

1. Create a Multinode + HA cloud. Called "HA OpenStack"
2. Select KVM, Neutron with GRE segmentation and all the rest by default
3. Then I added the 7 nodes to the deployment: 3 controller, 3 compute, 1 storage
4. Configure networks
4. Click on deploy changes

After a while, 4 out of 7 completed the Ubuntu installation (1 controller and 3 compute). But the other three did not until they appeared as OFFLINE in the dashboard. Moreover, I removed them from the deployment, which worked. But now I have a cloud with four nodes where Ubuntu is installed but I see no options to continue openstack deployment with the ones that worked.

I've generated the Diagnostic Snapshot but it's a 160mb file. Is there some server where I can upload it. I have tried to upload to launchpad but it failed.

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Did you check the console of those which went offline?
It can happen if they didn't boot over PXE for some reason.

After you removed offline nodes you had to be able to click on Deploy button to use existing nodes - was not it available for some reason?

For diagnostic snapshot, can you use google or some public service then?

Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

Cristian, thanks for helping us. There are many reasons why nodes became offline. Something could happen after reboot. As a sample wrong boot order in BIOS. This case requires some analysis over your installation. Upload snapshot to any public services so we'll be able to help you.

Revision history for this message
Cristian A Sanchez (cristian-a-sanchez) wrote :

More information:
I've seen this behaviour to be random. The same node with the same boot order sometimes gets the OS installed sometimes it does not.

Revision history for this message
Cristian A Sanchez (cristian-a-sanchez) wrote :
tags: added: customer-found
Changed in fuel:
importance: Undecided → Medium
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Matthew Mosesohn (raytrac3r)
status: Incomplete → In Progress
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Hi Cristian,

I'm looking into the issue now. There's quite a lot of data in the logs, so it's quite a challenge to get the exact piece of useful information regarding what failed. Is there any sort of error on the screen when Ubuntu fails to install? Or do the nodes fail to PXE boot at all and then boot from a local disk?

Changed in fuel:
status: In Progress → Incomplete
Changed in fuel:
status: Incomplete → Invalid
tags: added: in progress
tags: added: agree qa
removed: in progress
tags: added: qa-agree
removed: agree qa
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.