Several concurent scheduling requests for CPU pinning may fail due to racy host_state handling
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
Nikola Đipanov | ||
Kilo |
Fix Released
|
Medium
|
Nikola Đipanov |
Bug Description
The issue happens when multiple scheduling attempts that request CPU pinning are done in parallel.
015-03-
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
2015-03-25 14:18:00.221 34127 TRACE oslo.messaging.
What is likely happening is:
* nova-scheduler is handling several RPC calls to select_destinations at the same time, in multiple greenthreads
* greenthread 1 runs the NUMATopologyFilter and selects a cpu on a particular compute node, updating host_state.
* greenthread 1 then blocks for some reason
* greenthread 2 runs the NUMATopologyFilter and selects the same cpu on the same compute node, updating host_state.
* greenthread 2 then blocks for some reason
* greenthread 1 gets scheduled and calls consume_
* greenthread 1 completes the scheduling operation
* greenthread 2 gets scheduled and calls consume_
Changed in nova: | |
milestone: | none → kilo-rc1 |
tags: | added: kilo-rc-potential |
tags: | added: kilo-backport-potential |
tags: | removed: kilo-backport-potential kilo-rc-potential |
Changed in nova: | |
milestone: | none → liberty-1 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | liberty-1 → 12.0.0 |
Can we get a reproduce test for it?