Some nodes don't boot over pxe after cluster deletion

Bug #1319869 reported by Andrey Sledzinskiy
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Dmitry Ilyin

Bug Description

reproduced on {"build_id": "2014-05-14_01-10-31", "mirantis": "yes", "build_number": "203", "ostf_sha": "ef970b442437072bdfa4ea99a7b2971215b2de18", "nailgun_sha": "155acff248aed9a8295d03d58346daa4851d49b4", "production": "docker", "api": "1.0", "fuelmain_sha": "9df792985f8984063979f16dc94b4df24ef40c2d", "astute_sha": "80e60b66e3cb4e3e61b22c61c4acfa127ba1bf7e", "release": "5.0", "fuellib_sha": "89ccab7ee76980e38c4d9a5fbcdf7df87e35d61f"}

Steps:
1. Create next cluster - Ubuntu, HA, KVM, Neutron GRE, Ceph for images
2. Add 3 controllers, 1 compute, 1 cinder, 3 ceph nodes
3. Deploy cluster
4. After successful deployment delete cluster

Actual result - some nodes failed to boot over pxe with error http://ipxe.org/err/040ee119 (see attached screen)

Snapshot is attached

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

this looks like related to https://bugs.launchpad.net/fuel/+bug/1317213 as Andrey's environment should have dhcrelay performance problems. may be, we could overcome this by introducing fuzzy delay into node rebooting task in nailgun.

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Fuel Python Team (fuel-python)
Mike Scherbakov (mihgen)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Dmitry Ilyin (idv1985)
Revision history for this message
Dmitry Ilyin (idv1985) wrote :

On KVM i never had any problem with booting 5 or even more nodes simultaniously as before docker+dhclprelay was introduced as with it never having problems with dhcp server performance.

On VBOX booting more then one node hangs every time and it was so even before docker.

Perhaps it also has something to do with the version of KVM and ipxe of our systems.

Revision history for this message
Dmitry Ilyin (idv1985) wrote :

It's also possible that host records were not deleted from cobbler when the evironment was deleted. If you can reproduce this please check in cobber web ui that these nodes have their host profile records removed and they would recieve the blue menu.

Revision history for this message
Dmitry Ilyin (idv1985) wrote :

We found out that this is most likly caused by starting slave nodes when either dhcp server on master node or dhcrelay is not ready yet. We decided to insert dhcpcheck to determine that master node is ready before starting the slaves.

Changed in fuel:
status: New → Triaged
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :
Changed in fuel:
status: Triaged → Fix Committed
Revision history for this message
Egor Kotko (ykotko) wrote :

Verified on:
{"build_id": "2014-05-20_01-10-31", "mirantis": "yes", "build_number": "213", "ostf_sha": "353f918197ec53a00127fd28b9151f248a2a2d30", "nailgun_sha": "ab7f7dfddadfe0e08a39693c6d33aa0250f20142", "production": "docker", "api": "1.0", "fuelmain_sha": "68c62519bc788fd8ff27e4576a6cdf7e7fac14c0", "astute_sha": "a3432e6e31ffd6f1c56386b2eb54afeacb74750b", "release": "5.0", "fuellib_sha":
"3d92142a5643af82596f0450e39282550a45e5db"}

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.