Fuel for OpenStack

[VirtualBox]: Slave node cannot boot: Fatal: No bootable medium found. System halted

Bug #1401851 reported by Anastasia Palkina on 2014-12-12

This bug affects 1 person

	Status	Importance	Assigned to	Milestone
Fuel for OpenStack	Fix Released	High	Serhii Ovsianikov	Fuel for OpenStack 6.0
6.0.x	Fix Released	High	Serhii Ovsianikov	Fuel for OpenStack 6.0
6.1.x	Fix Released	High	Serhii Ovsianikov	Fuel for OpenStack 6.1

Bug Description

On VBox master node installed successful.

But often 1 of 3 slave nodes cannot boot (see screen).
If you restart this node, it is booting successfully

Tags:

Roman Prykhodchenko (romcheg) on 2014-12-12

Changed in fuel:
milestone:	none → 6.1

Revision history for this message

Stanislav Makar (smakar) wrote on 2014-12-12:

please provide vbox scripts and version of fuel

Roman Prykhodchenko (romcheg) on 2014-12-12

tags:	added: virtualbox
summary:	- Slave node cannot boot: Fatal: No bootable medium found. System halted + [VirtualBox]: Slave node cannot boot: Fatal: No bootable medium found. + System halted

Stanislav Makar (smakar) on 2014-12-12

Changed in fuel:
status:	New → Incomplete
milestone:	6.1 → 6.0

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2014-12-12:

Stas,
you can use latest scripts from fuel-main.
I've seen this issue too. You can try 5.1.1 & 6.0.

I'm pretty sure it happens on loaded systems, such as laptops, and our expect script in vbox script do not use the right source of information about Fuel Master node installation finishing.
I believe cobbler container is not yet initialized fully, or networking stack (forwards/etc.) on Fuel master node to pass PXE traffic at the time when slave nodes are started.

Changed in fuel:
status:	Incomplete → Confirmed

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2014-12-12:

Moving to critical, as it significantly affects user experience. Installation of Fuel and OpenStack has to be very smooth.

Revision history for this message

Roman Prykhodchenko (romcheg) wrote on 2014-12-12:

@Stanislav: due to HCF it cannot be targeted to 6.1

Changed in fuel:
milestone:	6.0 → 6.1

Mike Scherbakov (mihgen) on 2014-12-12

Changed in fuel:
importance:	High → Critical

Revision history for this message

Roman Prykhodchenko (romcheg) wrote on 2014-12-12:

g/6.1/6.0/

Revision history for this message

Roman Prykhodchenko (romcheg) wrote on 2014-12-12:

Now, since it was set to Critical, it can be merged to 6.0.*
I'm letting someone to re-target it properly to avoid another race-condition.

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-12-12:

Screenshot from 2014-12-12 14:22:06.png Edit (33.8 KiB, image/png)

Revision history for this message

Miroslav Anashkin (manashkin) wrote on 2014-12-12:

Probable root cause - there is not enough CPU cores on master node.

On all my machines with VBox the number of VMs I may boot with PXE at the same time matches almost exactly with the number of physical CPU cores I have.
PXE boot process 100% loads 2 cores during the image transfer by TFTP - one on server and one on client.
If there are too many client nodes - these extra clients fail to boot from the first time and need restart.

Since 6.0 uses image-based provisioning - master node may be even more CPU bound.

So, I usually set the CPU number for master node to the number of physically available cores.
If CPU supports hyper-threading - I anyway set the number of CPU to the number of physical cores, not virtual.

I propose the following modification to all version of VBox scripts, not for 6.0 only:
Set default CPU number for master node = number of physical CPU cores.

Later we may implement sequential one-by-one node boot.
Say, by introduce some configurable delay parameter before new VM creation.

BTW, this issue appears on bare-metal installations as well, but against the several times greater number of simultaneously booting machines. In case of bare-metal clients have own CPUs and network intrerfaces do some offloading.

Changed in fuel:
status:	Confirmed → Triaged
assignee:	Miroslav Anashkin (manashkin) → Serhiy Ovsianikov (sovsianikov)

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2014-12-15:

#10

Miroslav,
> Since 6.0 uses image-based provisioning - master node may be even more CPU bound.
how is it related? Who is eating CPU on the master node?

I just experienced this issue again. 1 VM booted fine, and 2 didn't. Previously, I've been running fine. I believe the problem is how we check that Master node have finished provisioning.

This issue can be very easily mitigated by provisioning slaves with slight delay. Also, it would be better to do 1-by-1 with slight delay between runs (let's say 1 sec).

Revision history for this message

Tomasz 'Zen' Napierala (tzn) wrote on 2014-12-15:

#11

From my debugging it all depends on how we check master node readiness. We only search for line in logs. This line does not mean that all containers are ready.
I had discussion with Mike, we can somply introduce another ugly "wait" and fix it in 6.1

Revision history for this message

Bogdan Dobrelya (bogdando) wrote on 2014-12-16:

#12

We should not consider any kind of an issues with virtual box as a critical ones. Virtual box is never used for production enviromnents deployments so it cannot be critical

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-16: Fix proposed to fuel-main (master)

#13

Fix proposed to branch: master
Review: https://review.openstack.org/142142

Changed in fuel:
status:	Triaged → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-16: Fix proposed to fuel-main (stable/6.0)

#14

Fix proposed to branch: stable/6.0
Review: https://review.openstack.org/142143

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-16: Fix proposed to fuel-main (stable/5.1)

#15

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/142161

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-17: Fix merged to fuel-main (master)

#16

Reviewed: https://review.openstack.org/142142
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=65a9bcdbd228dc8478091999689c7c81bcd7d762
Submitter: Jenkins
Branch: master

commit 65a9bcdbd228dc8478091999689c7c81bcd7d762
Author: Serhiy Ovsianikov <email address hidden>
Date: Tue Dec 16 18:10:57 2014 +0200

The delay required for downloading tftp boot image

The delay allows to increase time for obtaining a boot image by
fuel-slave nodes and decreasing CPU load on fuel-master node.

Change-Id: I1da48a2374c4d4f0783b12bb9f736e54098fcb41
Closes-Bug: #1401851

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-17: Fix merged to fuel-main (stable/6.0)

#17

Reviewed: https://review.openstack.org/142143
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=93af8119dcc5758ead2a17ab48f729a26a15d7fd
Submitter: Jenkins
Branch: stable/6.0

commit 93af8119dcc5758ead2a17ab48f729a26a15d7fd
Author: Serhiy Ovsianikov <email address hidden>
Date: Tue Dec 16 18:10:57 2014 +0200

The delay required for downloading tftp boot image

The delay allows to increase time for obtaining a boot image by
fuel-slave nodes and decreasing CPU load on fuel-master node.

    Change-Id: I1da48a2374c4d4f0783b12bb9f736e54098fcb41
    Closes-Bug: #1401851
    (cherry picked from commit 65a9bcdbd228dc8478091999689c7c81bcd7d762 )

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-12-17: Change abandoned on fuel-main (stable/5.1)

#18

Change abandoned by Serhiy Ovsianikov (<email address hidden>) on branch: stable/5.1
Review: https://review.openstack.org/142161
Reason: necessary to clarify

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2014-12-19:

#19

Verified on ISO #56

"build_id": "2014-12-18_01-32-01", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "56", "auth_required": true, "api": "1.0", "nailgun_sha": "5f91157daa6798ff522ca9f6d34e7e135f150a90", "production": "docker", "fuelmain_sha": "45caacadb878abfbd9d60e134d72229698b469c9", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-12-18_01-32-01", "ostf_sha": "a9afb68710d809570460c29d6c3293219d3624d4", "build_number": "56", "api": "1.0", "nailgun_sha": "5f91157daa6798ff522ca9f6d34e7e135f150a90", "production": "docker", "fuelmain_sha": "45caacadb878abfbd9d60e134d72229698b469c9", "astute_sha": "16b252d93be6aaa73030b8100cf8c5ca6a970a91", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "73332192a257ea02c40a39885c502ad1ebdf3eda"}}}, "fuellib_sha": "73332192a257ea02c40a39885c502ad1ebdf3eda"

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2015-01-15:

#20

Reproduced on ISOs #58, 61 for 6.1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-23: Fix proposed to fuel-main (master)

#21

Fix proposed to branch: master
Review: https://review.openstack.org/149575

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-10: Fix merged to fuel-main (stable/5.1)

#22

Reviewed: https://review.openstack.org/142161
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=fb31eaaecd59fc0e681878e77f3b1b543fe12021
Submitter: Jenkins
Branch: stable/5.1

commit fb31eaaecd59fc0e681878e77f3b1b543fe12021
Author: Serhiy Ovsianikov <email address hidden>
Date: Tue Dec 16 18:10:57 2014 +0200

The delay required for downloading tftp boot image

The delay allows to increase time for obtaining a boot image by
fuel-slave nodes and decreasing CPU load on fuel-master node.

    Change-Id: I1da48a2374c4d4f0783b12bb9f736e54098fcb41
    Closes-Bug: #1401851
    (cherry picked from commit 65a9bcdbd228dc8478091999689c7c81bcd7d762 )

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-11: Fix merged to fuel-main (master)

#23

Reviewed: https://review.openstack.org/149575
Committed: https://git.openstack.org/cgit/stackforge/fuel-main/commit/?id=0a96b4a59bf8ae28e1f88db7485b5ea425b0a6b5
Submitter: Jenkins
Branch: master

commit 0a96b4a59bf8ae28e1f88db7485b5ea425b0a6b5
Author: Serhiy Ovsianikov <email address hidden>
Date: Fri Jan 23 12:26:27 2015 +0200

Added the number of processor cores to fuel master node

Adding the number of processor cores increases the productivity of
fuel master node

Change-Id: Idd7933579b5309fc038488d1a808024d3132e829
Closes-Bug: #1401851

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

Anastasia Palkina (apalkina) wrote on 2015-02-25:

#24

Cannot reproduced on latest ISOs

Revision history for this message

Ali Jabbar (jabbar-ali) wrote on 2016-03-23:

#25

Yes, If you rest or reboot your slave node one by one with some delay in between them then this problem will not occur.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Screenshot from 2014-12-12 14:22:06.png Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.