Bug #1274614 “nailgund bogs down on large number of nodes” : Bugs : Fuel for OpenStack

Dmitry Borodaenko (angdraug) on 2014-01-30

Changed in fuel:
milestone:	none → 4.1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-31: Related fix proposed to fuel-web (master)

#1

Related fix proposed to branch: master
Review: https://review.openstack.org/70264

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-31: Fix proposed to fuel-library (master)

#2

Fix proposed to branch: master
Review: https://review.openstack.org/70270

Revision history for this message

Evgeniy L (rustyrobot) wrote on 2014-01-31: Re: nailgund needs to be running multiple threads

#3

I don't think that it's ok to make several instances of nailgun, it may cause several problems
1. in each instance of nailgun we have keep_alive thread, we don't need 8 instances of this thread
2. in each instance of nailgun we have rpc thread which listen on rabbit and receive messages from orchestrator

So, it's ok to configure several instances of nailgun as a fast hack.

How to solve it
1. refactor nodes collection handler, e.g. make separate handler for agent and not update db state if node wasn't changed
2. to reduce overhead which I've described above we can use this patch from services (but we need to test it and fix puppet manifests) https://review.openstack.org/#/c/54930/

On the first item our new Engineer already started to work, the status you cat track here https://blueprints.launchpad.net/fuel/+spec/nailgun-agent-handler

Revision history for this message

Ryan Moe (rmoe) wrote on 2014-01-31:

#4

I agree completely. This is just a short-term workaround that has been successfully used on two large deployments. At least now the workaround is documented somewhere.

Roman Alekseenkov (ralekseenkov) on 2014-02-06

tags:	added: customer-found
Changed in fuel:
importance:	Undecided → Medium
summary:	- nailgund needs to be running multiple threads + nailgund bogs down on large number of nodes

Revision history for this message

Mike Scherbakov (mihgen) wrote on 2014-02-18:

#5

Assigning this to Dmitry, as he is doing the "right" implementation for this issue now. I hope we can get it in 4.1. Dmitry - please make sure your patch contains "Closes-Bug: #1274614" in git commit message.

Changed in fuel:
assignee:	Ryan Moe (rmoe) → Dmitry Sokolov (demon-mhm)

Mike Scherbakov (mihgen) on 2014-02-21

Changed in fuel:
milestone:	4.1 → 5.0

Revision history for this message

Andrew Woodward (xarses) wrote on 2014-02-21:

#6

The patch (even if fixed) is too massive to merge this late in the cycle. Since the handlers where separated that caused https://review.openstack.org/#/c/70270/ to be -1'd I've asked Ryan to look too see if we can add this back in so that we can have a usable workaround for the load in 5.0

tags:

removed: multi-l3

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-02-27: Fix proposed to fuel-web (master)

#7

Fix proposed to branch: master
Review: https://review.openstack.org/76831

Revision history for this message

Dmitry Sokolov (demon-mhm) wrote on 2014-03-06:

#8

I've performed some stress tests for my upcoming patch https://review.openstack.org/#/c/76831/ which introduces dedicated agents handler for Nailgun and some improvements for agent code. It sad to say but new handler didn't show speed growth. Conversely speed even with caching lowered for 10% in comparison with old handler. We have only one improvement with this patch - agents will try to update node state first, then register if update attempt returns 404. This will reduce number of requests to master node approx 2 times. Also separated handler for agents will allow us to optimize agent requests processing with no fear to harm other handlers.

Changed in fuel:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-03-19: Fix merged to fuel-web (master)

#9

Reviewed: https://review.openstack.org/76831
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=59ed8c081d5065046ee638ebfc74e4dab2ff0677
Submitter: Jenkins
Branch: master

commit 59ed8c081d5065046ee638ebfc74e4dab2ff0677
Author: demon.mhm <email address hidden>
Date: Thu Feb 27 15:38:21 2014 +0400

Reduced database overhead from agents

     - added dedicated handler for node agents update only requests
     - caching data from agents to avoid db update with same data
     - nailgun responses with appropriate http statuses
     - changed agent update logic. Now it tries to update first and respects
       nailgun response statuses

    Change-Id: I2658cf7561cd8c9116acced2443d072d471f3bdb
    Implements: blueprint nailgun-agent-handler
    Closes-Bug: #1274614

Dmitry Pyzhov (dpyzhov) on 2014-03-27

tags:

added: backports-4.1.1

Dmitry Pyzhov (dpyzhov) on 2014-03-27

Changed in fuel:
milestone:	5.0 → 4.1.1
status:	Fix Committed → Triaged
assignee:	Dmitry Sokolov (demon-mhm) → Fuel Python Team (fuel-python)

Revision history for this message

Ihor Kalnytskyi (ikalnytskyi) wrote on 2014-04-02:

#10

Does someone profile our WSGI instance? Have we know bottleneck of our app?
The Werkzeug project has some tools to profile it. I can perform some test, if we don't know the slowest part of the node handler.

Revision history for this message

Dmitry Pyzhov (dpyzhov) wrote on 2014-04-07:

#11

Postponed till 4.1.2

Mike Scherbakov (mihgen) on 2014-05-08

tags:

added: release-notes

Dmitry Pyzhov (dpyzhov) on 2014-05-15

Changed in fuel:
assignee:	Fuel Python Team (fuel-python) → Nikolay Markov (nmarkov)

Revision history for this message

Meg McRoberts (dreidellhasa) wrote on 2014-05-17:

#12

Marked as Fixed Issue in 5.0 Release Notes

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-19: Fix proposed to fuel-web (stable/4.1)

#13

Fix proposed to branch: stable/4.1
Review: https://review.openstack.org/94178

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-05-20: Fix merged to fuel-web (stable/4.1)

#14

Reviewed: https://review.openstack.org/94178
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=318af37b9d9b1b14fc2e938029bb2cdc1421c4ef
Submitter: Jenkins
Branch: stable/4.1

commit 318af37b9d9b1b14fc2e938029bb2cdc1421c4ef
Author: demon.mhm <email address hidden>
Date: Thu Feb 27 15:38:21 2014 +0400

Reduced database overhead from agents

     - added dedicated handler for node agents update only requests
     - caching data from agents to avoid db update with same data
     - nailgun responses with appropriate http statuses
     - changed agent update logic. Now it tries to update first and respects
       nailgun response statuses

Implements: blueprint nailgun-agent-handler
Closes-Bug: #1274614

Conflicts:
nailgun/nailgun/test/unit/test_node_nic_handler.py

Change-Id: I2658cf7561cd8c9116acced2443d072d471f3bdb

Dmitry Pyzhov (dpyzhov) on 2014-05-21

Changed in fuel:
status:	Triaged → Fix Committed

Fuel for OpenStack

nailgund bogs down on large number of nodes

Bug Description

Other bug subscribers

Remote bug watches