Comment 0 for bug 1856064

Revision history for this message
Peng Peng (ppeng) wrote :

Brief Description
-----------------
After lock/unlock one compute node, the active controller became degraded. 200.006 alarm raised.
After active controller force reboot, the system was recovered and alarm was cleared.

Severity
--------
Major

Steps to Reproduce
------------------
as description

TC-name: mtc/test_lock_unlock_host.py::test_lock_unlock_host[compute]

Expected Behavior
------------------

Actual Behavior
----------------

Reproducibility
---------------
Unknown - first time this is seen in sanity, will monitor

System Configuration
--------------------
Multi-node system
IPv4

Lab-name: WCP_3-6

Branch/Pull Time/Commit
-----------------------
2019-12-10_20-00-00

Last Pass
---------
2019-12-10_20-00-00 on (WP_8-12)

Timestamp/Logs
--------------
[2019-12-11 08:58:20,124] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2019-12-11 08:58:21,300] 433 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | available |
| 2 | compute-0 | worker | unlocked | enabled | available |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+

[2019-12-11 08:58:22,661] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-lock compute-0'

[2019-12-11 08:59:40,320] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-unlock compute-0'

[2019-12-11 09:05:59,264] 311 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-list'
[2019-12-11 09:06:00,442] 433 DEBUG MainThread ssh.expect :: Output:
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | degraded |
| 2 | compute-0 | worker | unlocked | enabled | available |
| 3 | compute-1 | worker | unlocked | enabled | available |
| 4 | controller-1 | controller | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
[sysadmin@controller-0 ~(keystone_admin)]$

[2019-12-11 09:11:08,717] 311 DEBUG MainThread ssh.send :: Send 'fm --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne alarm-list --nowrap --uuid'
[2019-12-11 09:11:09,693] 433 DEBUG MainThread ssh.expect :: Output:
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------------+--------------------------------+----------+----------------------------+
| UUID | Alarm ID | Reason Text | Entity ID | Severity | Time Stamp |
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------------+--------------------------------+----------+----------------------------+
| 26e10dab-15dd-45ee-b5ac-4ae73bb5db8d | 200.006 | controller-0 is degraded due to the failure of its 'ceph' process. Auto recovery of this major process is in progress. | host=controller-0.process=ceph | major | 2019-12-11T09:00:12.697608 |
+--------------------------------------+----------+------------------------------------------------------------------------------------------------------------------------+--------------------------------+----------+----------------------------+
[sysadmin@controller-0 ~(keystone_admin)]$

controller-0:~$
[2019-12-11 09:16:45,554] 166 INFO MainThread host_helper.reboot_hosts:: Rebooting controller-0
[2019-12-11 09:16:45,554] 311 DEBUG MainThread ssh.send :: Send 'sudo reboot -f'

Test Activity
-------------
Sanity