Parallel periodic instance power state reporting from compute nodes has high impact on conductors and message broker

Bug #1388077 reported by James Page
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned
nova (Ubuntu)
Fix Released
Undecided
Unassigned
Utopic
Fix Released
Undecided
Unassigned
Vivid
Fix Released
Undecided
Unassigned

Bug Description

Environment: OpenStack Juno release/Ubuntu 14.04/480 compute nodes/8 cloud controllers/40,000 instances +

The change made in:

  https://github.com/openstack/nova/commit/baabab45e0ae0e9e35872cae77eb04bdb5ee0545

switches power state reporting from being a serial process for each instance on a hypervisor to being a parallel thread for every instance; for clouds running high instance counts, this has quite an impact on the conductor processes as they try to deal with N instance refresh calls in parallel where N is the number of instances running on the cloud.

It might be better to throttle this to a configurable parallel level so that period RPC load can be managed effectively in a larger cloud, or to continue todo this process in series but outside of the main thread.

The net result of this activity is that it places increase demands on the message broker, which has to deal with more parallel connections, and the conductors as they try to consume all of the RPC requests; if the message broker hits its memory high water mark it will stop publishers publishing any more messages until the memory usage drops below the high water mark again - this might not be achievable if all conductor processes are tied up with existing RPC calls try to send replies, resulting in a message broker lockup and collapse of all RPC in the cloud.

James Page (james-page)
description: updated
description: updated
summary: Parallel periodic power state reporting from compute nodes has high
- impact on conductors
+ impact on conductors and message broker
description: updated
James Page (james-page)
summary: - Parallel periodic power state reporting from compute nodes has high
- impact on conductors and message broker
+ Parallel periodic instance power state reporting from compute nodes has
+ high impact on conductors and message broker
description: updated
James Page (james-page)
tags: added: juno scale-testing
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/132263

Changed in nova:
assignee: nobody → James Page (james-page)
status: New → In Progress
Chuck Short (zulcss)
Changed in nova (Ubuntu):
status: New → In Progress
James Page (james-page)
Changed in nova (Ubuntu Utopic):
status: New → In Progress
status: In Progress → Fix Committed
James Page (james-page)
Changed in nova (Ubuntu Vivid):
status: In Progress → Fix Released
Changed in nova (Ubuntu Utopic):
status: Fix Committed → Fix Released
Changed in nova:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Joe Gordon (<email address hidden>) on branch: master
Review: https://review.openstack.org/132263
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

James Page (james-page)
Changed in nova:
assignee: James Page (james-page) → nobody
Changed in nova:
status: In Progress → Confirmed
Revision history for this message
Markus Zoeller (markus_z) (mzoeller) wrote : Cleanup EOL bug report

This is an automated cleanup. This bug report has been closed because it
is older than 18 months and there is no open code change to fix this.
After this time it is unlikely that the circumstances which lead to
the observed issue can be reproduced.

If you can reproduce the bug, please:
* reopen the bug report (set to status "New")
* AND add the detailed steps to reproduce the issue (if applicable)
* AND leave a comment "CONFIRMED FOR: <RELEASE_NAME>"
  Only still supported release names are valid (LIBERTY, MITAKA, OCATA, NEWTON).
  Valid example: CONFIRMED FOR: LIBERTY

Changed in nova:
importance: Medium → Undecided
status: Confirmed → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.