50 nodes bootstrap slow (tftp server problems)

Bug #1330938 reported by Timur Nurlygayanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Medium
Vladimir Sharshov
5.0.x
Fix Committed
Medium
Vladimir Sharshov

Bug Description

Steps To Reproduce:
1. Create environment with 50 servers.
2. Deploy the OpenStack.
3. Delete this environment.

Expected Result:
All 50 nodes will return to Fuel master node as slaves nodes.

Observed Result:
We lost several servers and can see only 33 or 48 servers. If we will reboot other servers, they will be added to Fuel again.
Looks like we have some problem with tftp performance, and all servers can't bootstrap from one Fuel master node in parallel.

Note:
this issue doesn't reproduced with small environments or if we will remove servers from environment with several steps (for example, remove 10 servers on each step).

In production environmnent we have more than 50 servers and this issue will be critical for administrators, which have large clouds.

How we can fix it:
1. We can remove nodes 1 by one with timeout (1 second per server) in the result tftp service should work fine.
2. We can improve performance of tftp service.

Mike Scherbakov (mihgen)
Changed in fuel:
assignee: nobody → Vladimir Sharshov (vsharshov)
Revision history for this message
Evgeniy L (rustyrobot) wrote :

My suggestion is to remove 5-10 nodes in cycle without sleeps.
Also need to make this parameter configurable as it was done for nodes deployment
https://github.com/stackforge/fuel-astute/blob/master/lib/astute/config.rb#L74

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

If we try to remove nodes without sleep, we will get a same problem, because remove operation take very little time and Cobbler is going to be loaded as well as before.

I suggest also add second parameter: remove_interval in seconds between remove operations and set it to 10 second as default. For the fifth dozen it will get 40 second difference which can solve this problem.

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

https://review.openstack.org/#/c/103518/

Timur Nurlygayanov, can you help with a test?

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

we will test it when we will use Fuel 5.0 with MOX project.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/103518
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=2b3a8592cc71c6883d82f5bc4820641fba9292a2
Submitter: Jenkins
Branch: master

commit 2b3a8592cc71c6883d82f5bc4820641fba9292a2
Author: Vladimir Sharshov <email address hidden>
Date: Mon Jun 30 13:56:42 2014 +0400

    Avoid high load for Cobbler TFTP when delete many nodes.

    Delete operation does not have limit for nodes, but it
    erase disks and reboot nodes after which they try to boot
    using network. Cobbler installed at master node provide
    ability to boot using network and have limited resources.

    To prevent high load for Cobbler this changes add two things:
    * split all nodes to groups (default - 10) which process
    in series;
    * wait some time (default - 10 sec) between such groups.

    Both of this parameters can be changed in config.

    Closes-Bug: #1330938

    Change-Id: I9c3af6e8ab3c7c610e31baa6e58ec86aae20708d

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (stable/5.0)

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/105459

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (stable/5.0)

Reviewed: https://review.openstack.org/105459
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=741b4fa9b964d73d9ab8f2fbcf5bb02836c98da1
Submitter: Jenkins
Branch: stable/5.0

commit 741b4fa9b964d73d9ab8f2fbcf5bb02836c98da1
Author: Vladimir Sharshov <email address hidden>
Date: Mon Jun 30 13:56:42 2014 +0400

    Avoid high load for Cobbler TFTP when delete many nodes.

    Delete operation does not have limit for nodes, but it
    erase disks and reboot nodes after which they try to boot
    using network. Cobbler installed at master node provide
    ability to boot using network and have limited resources.

    To prevent high load for Cobbler this changes add two things:
    * split all nodes to groups (default - 10) which process
    in series;
    * wait some time (default - 10 sec) between such groups.

    Both of this parameters can be changed in config.

    Closes-Bug: #1330938
    Closes-Bug: #1339024

    Backport from 5.1

    Change-Id: I9c3af6e8ab3c7c610e31baa6e58ec86aae20708d

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/178119

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (master)

Change abandoned by Igor Shishkin (<email address hidden>) on branch: master
Review: https://review.openstack.org/178119
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Fuel DevOps Robot (<email address hidden>) on branch: master
Review: https://review.openstack.org/178119
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.