Fuel for OpenStack

Only 100 of 200 nodes booted successfully with Ubuntu based bootstrap

Bug #1481721 reported by Alexei Sheplyakov on 2015-08-05

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	High	MOS Scale	Fuel for OpenStack 7.0

Bug Description

Other 100 nodes fail to PXE boot.

Presumably the link gets saturated by HTTP traffic, so DHCP requests time out due to a high collisions rate.

See original description

Tags:

Alexei Sheplyakov (asheplyakov) on 2015-08-05

description:

updated

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-05: Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/209486

Changed in fuel:
status:	New → In Progress

Revision history for this message

Alexander Gordeev (a-gordeev) wrote on 2015-08-05:

may it be related with the size of bootstrap image?

what're the actual sizes of them?

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2015-08-05:

> may it be related with the size of bootstrap image?

Ubuntu based bootstrap image is slightly smaller than CentOS one:

[root@fuel /]# ls -lRh /var/www/nailgun/bootstrap/
/var/www/nailgun/bootstrap/:
total 233M
-rwxr-xr-x. 1 root root 228M Aug 4 08:22 initramfs.img
-rwxr-xr-x. 1 root root 4.7M Aug 4 08:22 linux
drwxr-xr-x. 2 root root 4.0K Aug 4 11:26 ubuntu

/var/www/nailgun/bootstrap/ubuntu:
total 229M
-rwxr-xr-x 1 root root 16M Aug 4 11:24 initramfs.img
-rwxr-xr-x 1 root root 5.6M Jul 29 12:35 linux
-rwxr-xr-x 1 root root 209M Aug 4 11:26 root.squashfs

The problem has nothing to do with the image size.
The tftp server we use, tftpd-hpa, is extremely dumb and spawns a process to handle each client,
so it's unable to saturate a 10Gb link. On the other nginx uses proper IO multiplexing (and TCP)
and is much more efficient, so it can easily saturate the link. By the way, this is why astute limits
the number of the nodes being provisioned concurrently [1].

[1] https://github.com/stackforge/fuel-astute/blob/master/lib/astute/config.rb#L81

Revision history for this message

Alexander Gordeev (a-gordeev) wrote on 2015-08-05:

> The tftp server we use, tftpd-hpa, is extremely dumb and spawns a process to handle each client,
so it's unable to saturate a 10Gb link.

RC is that we use tftp servers to distribute large files (in terms of tftp).

ideally, we should send only initial PXE bootloader and configs through TFTP. Total size of them is less than 100 KB per node.
Then, the images will be downloaded though HTTP. Eg.: iPXE is able to work with HTTP http://ipxe.org/

BTW, is it possible to monitor how many tftpd-hpa processes were spawned and figure out where's the bottleneck? It might be a lack of CPU time due to tftpd server implementation inefficiency.

Revision history for this message

Alexei Sheplyakov (asheplyakov) wrote on 2015-08-17:

Can not reproduce the bug any more, marking as Incomplete

Changed in fuel:
status:	In Progress → Incomplete
assignee:	Alexei Sheplyakov (asheplyakov) → MOS Scale (mos-scale)

Revision history for this message

Dina Belova (dbelova) wrote on 2015-08-18:

Marking as invalid till it'll be reproduced again

Changed in fuel:
status:	Incomplete → Invalid

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-19: Change abandoned on fuel-library (master)

Change abandoned by Alexei Sheplyakov (<email address hidden>) on branch: master
Review: https://review.openstack.org/209486
Reason: The patch does not really solve the referenced bug.
Also Ubuntu based bootstrap won't be shipped with MOS 7.0

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.