StarlingX

Platform CPU threshold exceeded in compute-1 after lock/unlock member of quorum compute-0

Bug #1840831 reported by Wendy Mitchell on 2019-08-20

This bug report is a duplicate of: Bug #1839181: Platform CPU threshold exceeded in compute after lock/unlock a different compute host (Storage System). Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	StarlingX	Invalid	Medium	Lin Shuicheng

Bug Description

Brief Description
-----------------
Platform CPU threshold on compute-1 100% after unlock of the other compute node (compute-0)

Severity
--------
Major: The compute-1 remain in the degraded state after unlock

Steps to Reproduce
------------------
1. system host-lock compute-0 (member of quorum so 800.001 alarm raised)
2. system host-unlock compute-0
3. confirm 800.001 alarm clears and confirm other worker host does not get critical CPU alarm

Expected Behavior
------------------
The compute-0 unlock and 800.001 alarm clear without effecting CPU on other worker

Actual Behavior
----------------
critical CPU threshold alarm on her worker (compute-1)
100.101 Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00% host=compute-1 critical 2019-08-20T13:33:27

Reproducibility
---------------
CPU critical threshold reached and did not clear when a different compute was locked/unlocked

System Configuration
--------------------
standard System

Branch/Pull Time/Commit
-----------------------
20190819T033000Z"
wcp 63-66
nova/test_cpu_thread.py::TestHTDisabled::test_boot_vm_cpu_thread_ht_disabled[2-require-None-CPUThreadErr.HT_HOST_UNAVAIL]

Last Pass
---------

Timestamp/Logs
--------------

fm-event.log
2019-08-20T16:26:06.000 controller-0 fmManager: info { "event_log_id" : "200.001", "reason_text" : "compute-0 was administratively locked to take it out-of-service.", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-0", "severity" : "warning", "state" : "clear", "timestamp" : "2019-08-20 16:26:06.862116" }
2019-08-20T16:26:06.000 controller-0 fmManager: info { "event_log_id" : "200.021", "reason_text" : "compute-0 manual 'unlock' request", "entity_instance_id" : "host=compute-0.command=unlock", "severity" : "not-applicable", "state" : "msg", "timestamp" : "2019-08-20 16:26:06.867437" }
2019-08-20T16:26:06.000 controller-0 fmManager: info { "event_log_id" : "750.004", "reason_text" : "Application Apply In Progress", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.k8s_application=stx-openstack", "severity" : "warning", "state" : "set", "timestamp" : "2019-08-20 16:26:06.767427" }
2019-08-20T16:26:11.000 controller-0 fmManager: info { "event_log_id" : "200.022", "reason_text" : "compute-0 is now 'offline'", "entity_instance_id" : "host=compute-0.status=offline", "severity" : "not-applicable", "state" : "msg", "timestamp" : "2019-08-20 16:26:11.876915" }
2019-08-20T16:26:37.000 controller-0 fmManager: info { "event_log_id" : "750.004", "reason_text" : "Application Apply In Progress", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.k8s_application=stx-openstack", "severity" : "warning", "state" : "clear", "timestamp" : "2019-08-20 16:26:37.485687" }
2019-08-20T16:26:37.000 controller-0 fmManager: info { "event_log_id" : "250.001", "reason_text" : "controller-0 Configuration is out-of-date.", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=controller-0", "severity" : "major", "state" : "set", "timestamp" : "2019-08-20 16:26:37.414867" }
2019-08-20T16:26:37.000 controller-0 fmManager: info { "event_log_id" : "250.001", "reason_text" : "controller-1 Configuration is out-of-date.", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=controller-1", "severity" : "major", "state" : "set", "timestamp" : "2019-08-20 16:26:37.535612" }
2019-08-20T16:26:51.000 controller-0 fmManager: info { "event_log_id" : "250.001", "reason_text" : "controller-0 Configuration is out-of-date.", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=controller-0", "severity" : "major", "state" : "clear", "timestamp" : "2019-08-20 16:26:51.309396" }
2019-08-20T16:26:51.000 controller-0 fmManager: info { "event_log_id" : "250.001", "reason_text" : "controller-1 Configuration is out-of-date.", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=controller-1", "severity" : "major", "state" : "clear", "timestamp" : "2019-08-20 16:26:51.513353" }
2019-08-20T16:32:10.000 controller-0 fmManager: info { "event_log_id" : "200.022", "reason_text" : "compute-0 is now 'online'", "entity_instance_id" : "host=compute-0.status=online", "severity" : "not-applicable", "state" : "msg", "timestamp" : "2019-08-20 16:32:10.442120" }
2019-08-20T16:32:43.000 controller-0 fmManager: info { "event_log_id" : "800.001", "reason_text" : "Storage Alarm Condition: HEALTH_WARN. Please check 'ceph -s' for more details.", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.cluster=b6d66f65-4a9b-4af2-83a8-5f83898abb47", "severity" : "warning", "state" : "clear", "timestamp" : "2019-08-20 16:32:43.254150" }
2019-08-20T16:33:06.000 controller-0 fmManager: info { "event_log_id" : "200.006", "reason_text" : "compute-0 is degraded due to the failure of its 'pci-irq-affinity-agent' process. Auto recovery of this major process is in progress.", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-0.process=pci-irq-affinity-agent", "severity" : "major", "state" : "set", "timestamp" : "2019-08-20 16:33:06.085508" }
2019-08-20T16:33:08.000 controller-0 fmManager: info { "event_log_id" : "200.022", "reason_text" : "compute-0 is now 'enabled'", "entity_instance_id" : "host=compute-0.state=enabled", "severity" : "not-applicable", "state" : "msg", "timestamp" : "2019-08-20 16:33:08.286408" }
2019-08-20T16:33:12.000 controller-0 fmManager: info { "event_log_id" : "275.001", "reason_text" : "Host compute-0 hypervisor is now unlocked-disabled", "entity_instance_id" : "host=compute-0.hypervisor=79bcdc89-28a2-4925-86fa-9f1e9cb93510", "severity" : "critical", "state" : "msg", "timestamp" : "2019-08-20 16:33:11.910350" }
2019-08-20T16:34:46.000 controller-0 fmManager: info { "event_log_id" : "200.006", "reason_text" : "compute-0 is degraded due to the failure of its 'pci-irq-affinity-agent' process. Auto recovery of this major process is in progress.", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-0.process=pci-irq-affinity-agent", "severity" : "major", "state" : "clear", "timestamp" : "2019-08-20 16:34:46.721902" }
2019-08-20T16:35:23.000 controller-0 fmManager: info { "event_log_id" : "275.001", "reason_text" : "Host compute-0 hypervisor is now unlocked-enabled", "entity_instance_id" : "host=compute-0.hypervisor=79bcdc89-28a2-4925-86fa-9f1e9cb93510", "severity" : "critical", "state" : "msg", "timestamp" : "2019-08-20 16:35:23.061089" }
2019-08-20T16:35:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 99.85%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 16:35:27.367770" }
2019-08-20T16:37:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.01%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 16:37:27.366975" }
2019-08-20T16:39:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 16:39:27.367986" }
2019-08-20T16:41:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.01%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 16:41:27.368017" }
2019-08-20T16:43:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 16:43:27.367715" }
2019-08-20T16:47:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.01%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 16:47:27.367035" }
2019-08-20T16:49:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 16:49:27.367982" }
2019-08-20T16:57:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.01%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 16:57:27.368275" }
2019-08-20T16:59:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 16:59:27.367815" }
2019-08-20T17:01:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.01%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 17:01:27.367033" }
2019-08-20T17:03:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 17:03:27.369314" }
2019-08-20T17:05:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 99.99%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 17:05:27.374079" }
2019-08-20T17:07:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 17:07:27.379294" }
2019-08-20T17:11:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.02%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 17:11:27.368355" }
2019-08-20T17:13:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 17:13:27.374143" }
2019-08-20T17:15:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.01%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 17:15:27.383858" }
2019-08-20T17:17:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.00%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 17:17:27.367882" }
2019-08-20T17:23:27.000 controller-0 fmManager: info { "event_log_id" : "100.101", "reason_text" : "Platform CPU threshold exceeded ; threshold 95.00%, actual 100.01%", "entity_instance_id" : "region=RegionOne.system=yow-cgcs-ironpass-33_36.host=compute-1", "severity" : "critical", "state" : "set", "timestamp" : "2019-08-20 17:23:27.367097" }

Test Activity
-------------
regression

Tags: