Periodic tasks are performed on all workers
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Magnum |
Fix Released
|
Undecided
|
Spyros Trigazis |
Bug Description
Magnum conductor performs periodic tasks in order to synchronise magnum state with that of other services, such as heat. This occurs with interval as set by [DEFAULT] periodic_
The magnum conductor service forks a number of worker processes to handle RPC requests. The number of workers is determined by [conductor] workers, or the number of CPU cores if unset.
Based on the magnum conductor logs, each of the workers runs periodic tasks, and they all seem to be temporally aligned. For example, if a cluster fails during creation due to the famous 'no valid hosts found' failure, there will generally be many of these messages in the logs:
2017-07-04 19:41:04.454 53 ERROR magnum.
Clearly this is not ideal, as there are many workers performing the same task at the same time, all with the same outcome, which includes spamming the logs with multiple identical error messages.
The periodic task execution framework should be updated to distribute tasks among conductor services on multiple hosts, and multiple worker processes within each conductor service.
Changed in magnum: | |
assignee: | nobody → Spyros Trigazis (strigazi) |
status: | New → In Progress |
This does not occur to the same degree on ocata, as multi-worker support[1] was added during the pike development cycle. In ocata, conductor services running on multiple hosts will still all run the same periodic update jobs, which seems inefficient, but not nearly as bad as having multiple workers running periodic updates.
A simple solution that would at least revert to the ocata behaviour would be to only execute periodic updates in the parent process, and not the workers.
[1] https:/ /blueprints. launchpad. net/magnum/ +spec/magnum- multiple- process- workers