Bug Description : After pulling the cable for management and infra on active controller(controller-1) it was swact but once the cable was put back within 30 seconds still controller was in offline state 15 to 20 min then got rebooted. While system host-list was showing offline able to ping controller-1 from controller-0 and ssh .
2019-02-11T13:45:55.886 [8316.00249] controller-0 mtcAgent |-| nodeClass.cpp (2473) start_offline_handler : Info : controller-1 starting offline handler (unlocked-disabled-failed) (stage:0)
2019-02-11T13:45:55.886 [8316.00250] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (1903) recovery_handler : Warn : controller-1 cannot issue Reset
2019-02-11T13:45:55.886 [8316.00251] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (1904) recovery_handler : Warn : controller-1 ... board management not provisioned or accessible
2019-02-11T13:45:55.886 [8316.00252] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (1925) recovery_handler : Info : controller-1 Graceful Recovery Wait (1200 secs) (uptime was 0)
system host-list
+----+--------------+-------------+----------------+-------------+--------------+
| id | hostname | personality | administrative | operational | availability |
+----+--------------+-------------+----------------+-------------+--------------+
| 1 | controller-0 | controller | unlocked | enabled | degraded |
| 2 | controller-1 | controller | unlocked | disabled | offline |
| 3 | storage-0 | storage | unlocked | enabled | available |
| 4 | storage-1 | storage | unlocked | enabled | available |
| 5 | compute-0 | worker | unlocked | enabled | available |
| 6 | compute-1 | worker | unlocked | enabled | available |
| 7 | compute-2 | worker | unlocked | enabled | available |
| 8 | compute-3 | worker | unlocked | enabled | available |
| 9 | compute-4 | worker | unlocked | enabled | available |
+----+--------------+-------------+----------------+-------------+--------------+
system service-parameter-list
+--------------------------------------+----------+-------------+-----------------------------+-------+-------------+----------+
| uuid | service | section | name | value | personality | resource |
+--------------------------------------+----------+-------------+-----------------------------+-------+-------------+----------+
| ba7a62bc-a38f-40a9-bbd9-b4566b33b87d | identity | assignment | driver | sql | None | None |
| 71fc8993-3c5a-4f2b-95d8-129ffb9c6945 | horizon | auth | lockout_retries | 3 | None | None |
| e91fabbc-ec67-4000-8c34-2b95bd9c3eaf | horizon | auth | lockout_seconds | 300 | None | None |
| 1ad073ae-3cff-4ea7-a792-7cef6fced0e6 | swift | config | fs_size_mb | 25 | None | None |
| 778b7be0-41eb-484e-922a-ebe241f0e275 | http | config | http_port | 8080 | None | None |
| 8fce8fa0-8b45-4a55-839c-91d5368fc027 | http | config | https_port | 8443 | None | None |
| 5112802a-005d-414d-9222-c95709130081 | swift | config | service_enabled | false | None | None |
| 493f0d08-b88a-4bf7-abf6-117af08fc6c0 | identity | config | token_expiration | 3600 | None | None |
| c401d443-7694-4927-9525-84f49741e949 | aodh | database | alarm_history_time_to_live | 86400 | None | None |
| c083339b-fd3e-4000-920f-a724b20b392e | panko | database | event_time_to_live | 86400 | None | None |
| 456d240f-d251-4457-a16d-2ab60f01e372 | cinder | emc_vnx | enabled | false | None | None |
| 20569cc2-dd68-46a7-a6d3-176cd2443e8e | cinder | hpe3par | enabled | false | None | None |
| 6dcd8ec9-6df3-4055-a9e3-ca2b6c894aae | cinder | hpe3par10 | enabled | false | None | None |
| 55af06dc-db63-4109-a20f-44a84551f6b9 | cinder | hpe3par11 | enabled | false | None | None |
| fe880c3d-23cb-4b35-9848-c8f40aa86f5d | cinder | hpe3par12 | enabled | false | None | None |
| dd1e2864-7367-4fce-b03c-02c29f20e4bb | cinder | hpe3par2 | enabled | false | None | None |
| 3c5b0081-cb53-4ee9-8b7a-64b3f7ce80f0 | cinder | hpe3par3 | enabled | false | None | None |
| 539cfcc4-51fb-439a-9b5a-0e6f2a302924 | cinder | hpe3par4 | enabled | false | None | None |
| 29fca92a-e954-45b5-bdb4-1d290e8ddf49 | cinder | hpe3par5 | enabled | false | None | None |
| e911b5b9-9fdd-496e-9a69-f4df9a33b416 | cinder | hpe3par6 | enabled | false | None | None |
| 14c3fe2c-017c-4901-a568-d2b81278d9d2 | cinder | hpe3par7 | enabled | false | None | None |
| 3a15fc60-e54f-4c34-bead-3d26fad00e7d | cinder | hpe3par8 | enabled | false | None | None |
| 65711f82-3b09-4a03-83dc-469200227c5c | cinder | hpe3par9 | enabled | false | None | None |
| 1392c4a3-ea35-4e1c-af3b-86d6da51be71 | cinder | hpelefthand | enabled | false | None | None |
| e234748c-0c00-4cb6-b937-fda03694de09 | identity | identity | driver | sql | None | None |
| c68b8e30-8b28-4497-9b98-a15eada95ed7 | platform | maintenance | controller_boot_timeout | 1200 | None | None |
| dfc1b687-0dd8-4302-a50f-c5468956679d | platform | maintenance | heartbeat_degrade_threshold | 6 | None | None |
| 98f371fc-433e-48c8-b39b-7d55894c0f46 | platform | maintenance | heartbeat_failure_action | fail | None | None |
| f1d330df-bfe7-426c-8f99-1f5711886431 | platform | maintenance | heartbeat_failure_threshold | 10 | None | None |
| c3e4e458-f564-4d06-9ab1-8ea75a39aaf1 | platform | maintenance | heartbeat_period | 100 | None | None |
| bba069d0-f3f7-4036-a490-c49752a2119e | platform | maintenance | mnfa_threshold | 2 | None | None |
| 07d05bf3-352e-47c4-9036-c3f7fbbc5ce5 | platform | maintenance | mnfa_timeout | 0 | None | None |
| b9e1be7c-536e-4d2d-9de8-16e247439e36 | platform | maintenance | worker_boot_timeout | 720 | None | None |
Severity
--------
Major
Steps to Reproduce
------------------
1. Pull the Management cable on active controller(controller-1) that have vlan shared management and infra network.
2. Verify swact to controller-0
3. Verify controller-1 host state in system host-list as description it was in offline state but able to ping and ssh
Expected Behavior
------------------
Reboot after putting back the cable and getting to correct state.
Actual Behavior
----------------
As per description wrong state for controller-1 on host-display
Reproducibility
---------------
100% reproduceable
System Configuration
--------------------
storage system
Branch/Pull Time/Commit
-----------------------
StarlingX_Upstream_build release branch build as of 2019-02-10_20-18-00"
Timestamp/Logs
--------------
11 13:46:41 UTC 2019
Marking as release gating; requires further investigation