RPM catalogs wasn't retrieved from masternode under high load

Bug #1384510 reported by Aleksandr Shaposhnikov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Matthew Mosesohn

Bug Description

Problem: during 100 nodes deployment not all the nodes was able to retrieve RPM catalog from MOS/FUEL masternode.
Screenshot from one of the nodes is in attachments.
Diagnostic snapshot is also in attachments.

Revision history for this message
Aleksandr Shaposhnikov (alashai8) wrote :
Revision history for this message
Aleksandr Shaposhnikov (alashai8) wrote :
Changed in fuel:
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Fuel Library Team (fuel-library)
Changed in fuel:
milestone: none → 6.0
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

We could see if it's possible to modify yum config in %pre section of kickstart to set timeout=240. The default is 30. I know that if the server doesn't send any data before timeout, the connection is considered bad and results in an error.

Revision history for this message
Sergii Golovatiuk (sgolovatiuk) wrote :

One option we can tune nginx on master node. At the moment it has pretty standard options. They are not set for scale and parralel downloads.

tags: added: low-hanging-fruit
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/130523

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Matthew Mosesohn (raytrac3r)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/130523
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=5c6320638e993e37f4914185ed693c2b80be1438
Submitter: Jenkins
Branch: master

commit 5c6320638e993e37f4914185ed693c2b80be1438
Author: Matthew Mosesohn <email address hidden>
Date: Thu Oct 23 17:01:23 2014 +0400

    Add timeout for anaconda yum conf

    Raises timeout from 30s to 240s so that installs
    do not fail on yum operations during install under
    heavy load.

    Change-Id: If2a67b15beb6371979f1638d2c49ef935bfbed50
    Partial-Bug: #1384510

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/131672
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=9ea5104a432b0303f35606a3d7dc880f48f76d2f
Submitter: Jenkins
Branch: master

commit 9ea5104a432b0303f35606a3d7dc880f48f76d2f
Author: Sergii Golovatiuk <email address hidden>
Date: Wed Oct 29 09:56:02 2014 +0100

    Set nginx default settings for master node

    - increase number of workers from 2 to half available CPUs.
    - enable epoll by default
    - enable tcpnopush
    - enable tcpnodelay
    - disable server_token

    These settings allow to bootstrap many nodes in parallel allowing
    anaconda and debian installer to get all files without timeout.

    Partial-Bug: 1384510
    Partial-Bug: 1349360

    Change-Id: I44054cf2afa1fa5a70ba064f4957b964947b2b2c
    Implements: blueprint 100-nodes-support
    Signed-off-by: Sergii Golovatiuk <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/5.1)

Fix proposed to branch: stable/5.1
Review: https://review.openstack.org/134223

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/5.1)

Reviewed: https://review.openstack.org/134223
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=9ee34175ffd0070fd5041ef3bc052d452f009654
Submitter: Jenkins
Branch: stable/5.1

commit 9ee34175ffd0070fd5041ef3bc052d452f009654
Author: Sergii Golovatiuk <email address hidden>
Date: Wed Oct 29 09:56:02 2014 +0100

    Set nginx default settings for master node

    - increase number of workers from 2 to half available CPUs.
    - enable epoll by default
    - enable tcpnopush
    - enable tcpnodelay
    - disable server_token

    These settings allow to bootstrap many nodes in parallel allowing
    anaconda and debian installer to get all files without timeout.

    Partial-Bug: 1384510
    Partial-Bug: 1349360

    Change-Id: I44054cf2afa1fa5a70ba064f4957b964947b2b2c
    Implements: blueprint 100-nodes-support
    Signed-off-by: Sergii Golovatiuk <email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.