Sensible cap to worker-multiplier is needed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Charm Helpers |
Fix Released
|
Undecided
|
Nobuto Murata | ||
OpenStack API Layer |
Fix Released
|
Undecided
|
Nobuto Murata | ||
OpenStack Charm Guide |
Fix Released
|
High
|
Nobuto Murata | ||
OpenStack Nova Compute Charm |
Fix Released
|
Wishlist
|
Nobuto Murata |
Bug Description
worker-multiplier is a common option across multiple OpenStack charms. Most of the control plane related charms will be deployed into LXD containers because of higher density and better separations. However, nova-compute cannot be deployed into LXD of course by nature, so no cap will be applied to worker-multiplier if no value is set.
https:/
> worker-multiplier
> (float) The CPU core multiplier to use when configuring worker processes for this services e.g. metadata-api. By default, the number of workers for each daemon is set to twice the number of CPU cores a service unit has. When deployed in a LXD container, this default value will be capped to 4 workers unless this configuration option is set.
One example was that a customer had 150+ workers which ate almost all memory of the system and led to OOM killer. While users can set an explicit value through the charm option, sensible default cap is nice to have even for bare metal.
Changed in charm-nova-compute: | |
status: | New → Triaged |
importance: | Undecided → Wishlist |
tags: | added: sts |
Changed in charm-helpers: | |
assignee: | nobody → Nobuto Murata (nobuto) |
status: | New → Fix Committed |
Changed in charm-guide: | |
assignee: | nobody → Nobuto Murata (nobuto) |
Changed in charm-guide: | |
status: | Triaged → In Progress |
Changed in layer-openstack-api: | |
milestone: | none → 21.04 |
Changed in charm-nova-compute: | |
status: | Fix Committed → Fix Released |
Changed in layer-openstack-api: | |
status: | Fix Committed → Fix Released |
We have encountered this in a deployment using reserved hugepages.
350 1G pages were reserved, leaving 22G for hypervisor processes.
Around 14G of this was consumed by metadata-api processes which in combination with the other system processes was leaving little free memory and resulting in memory fragmentation.
The outcome was that qemu was unable to allocate order 6 pages when launching VMs (which were to be backed by hugepages).
I expect in setups using reserved hugepages the effects of this bug would be more pronounced and occur more often.