cpu_shared_set and cpu_dedicated_set values are wrongly set
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Thiago Paiva Brito |
Bug Description
Brief Description
-----------------
After installing Openstack I tried to launch a VM in a test, it creates a VM but it was impossible to ping to that VM, and after that is not possible to access the host without rebooting. Alarms are raised for cpu's usage that has reached 98% of the usage and we can see that the kvm-qemu processes are being scheduled for platform cores using "top -c".
Severity
--------
Critical: It is not possible to launch vms
Steps to Reproduce
------------------
Install stx-openstack
Launch VMs
Expected Behavior
-----------------
Create VM and make it available/reachable
Actual Behavior
---------------
after creating a VM it is not possible to ping it and the controller stops responding because it reaches 98% of cpus usage
Reproducibility
---------------
Reproducible. State if the issue is 100% reproducible. It might take a few attempts since the VM can be scheduled at any other core.
System Configuration
-------
AIO-SX
Branch/Pull Time/Commit
-------
Last Pass
---------
Unknown
Timestamp/Logs
--------------
openstack version if necessary
[sysadmin@
openstack 4.0.0
-------
Alarm Reason Text Entity ID Severity Time Stamp
ID
-------
400. Service group web-services has no active service_
002 members available; expected 1 active member service_
105148
800. Potential data loss. No available OSDs in cluster=
010 storage replication group group-0: OSDs are cc389fdc39e6.
down controller-0 306562
800. Storage Alarm Condition: HEALTH_WARN. Please cluster=
001 check 'ceph -s' for more details. cc389fdc39e6 02:25:32.
118213
100. Platform CPU threshold exceeded ; threshold host=controller-0 critical 2021-05-17T
101 95.00%, actual 99.96% 01:42:32.
217693
-------
controller-0:~$ grep "platform cpu usage" /var/log/daemon.log |tail -1
2021-05-
controller-0:~$ lscpu | grep -e 'CPU(s)'
CPU(s): 28
On-line CPU(s) list: 0-27
NUMA node0 CPU(s): 0-13
NUMA node1 CPU(s): 14-27
controller-0:~$
* nova.conf on nova-compute-0 container
[root@controller-0 /]# head -40 /etc/nova/nova.conf
[DEFAULT]
allow_resize_
block_device_
block_device_
compute_driver = libvirt.
compute_monitors = cpu.virt_driver
cpu_allocation_
cpu_dedicated_set = "4-27"
cpu_shared_set = "4-27"
default_
default_
disk_allocation
enable_new_services = false
firewall_driver = nova.virt.
instance_
instance_
linuxnet_
log_config_append = /etc/nova/
long_rpc_timeout = 400
map_new_hosts = false
metadata_host = ::
metadata_listen = ::
metadata_port = 80
metadata_workers = 1
mkisofs_cmd = /usr/bin/
my_ip = 192.168.206.2
network_
notify_
osapi_compute_
osapi_compute_
osapi_compute_
ram_allocation_
remove_
reserved_
reserved_huge_pages = node:0,
reserved_huge_pages = node:0,
reserved_huge_pages = node:1,
reserved_huge_pages = node:1,
resume_
running_
Test Activity
-------------
Feature Testing
Workaround
----------
No workaround
Changed in starlingx: | |
status: | New → In Progress |
Changed in starlingx: | |
importance: | Undecided → High |
tags: | added: stx.5.0 stx.distro.openstack |
tags: |
added: in-r-stx50 removed: stx.cherrypickneeded |
I figured that that change introduced the cpu_shared_set and cpu_dedicated_set in the wrong section of nova.conf. It should be on the [compute] section, not on [DEFAULT]: https:/ /github. com/openstack/ nova/blob/ stable/ ussuri/ nova/conf/ compute. py#L317
Opened a review to put those configs on the right config section, remove the `shared_pcpu_map` that is a legacy config that is not in use anymore and also fix the value for `cpu_dedicated_set` that was using the wrong variable: https:/ /review. opendev. org/c/starlingx /openstack- armada- app/+/791526
Already tested with a custom build and the VMs are now being scheduled on the right cores.
This probably will need to be cherry-picked to stx.5.0.