Comment 8 for bug 1837426

Revision history for this message
Gerry Kopec (gerry-kopec) wrote :

Implemented changes to increase rabbitmq pod probe period from 10s to 30s per reviews in comments 6 & 7. These should recover about 20% cpu (out of 200%) for the two platform CPUs.

Other suggested areas of investigation:
- Increase period of cinder-volume-usage-audit and heat-engine-cleaner. These currently run every 5 minutes.
- Saw number of erlang beam.smp threads in openstack rabbitmq container drop from 631 after initial install to 151 after subsequent application remove/apply. This corresponded with a decrease in cpu usage for those threads. Commit https://review.opendev.org/#/c/676035/ may address this but that should be confirmed.
- Saw the cpu usage of kubelet process slowly increasing over time (10% to 14% of cpu0&1 over a week).
- Subsequent application-apply's may fail due to nova-db-sync job failing due to system overload and then being unable to create tables on subsequent retries as they already exist. Have to drop the nova databases to recover. Could smooth out compute-kit (libvirt, nova, nova-api-proxy, neutron, placement) startup by not running all charts in parallel.