In looking at the logs I can see that the second reboot occurred as a result of the Cluster network heartbeat failure immediately following the recovery of controller-1 after the initial reboot. Here is the log analysis Heartbeat Loss due to cable pull 2019-09-19T17:26:37.186 [3163376.00210] controller-0 mtcAgent hbs nodeClass.cpp (4696) manage_heartbeat_failure:Error : controller-1 Mgmnt *** Heartbeat Loss *** 2019-09-19T17:26:37.188 [3163376.00219] controller-0 mtcAgent hbs nodeClass.cpp (4696) manage_heartbeat_failure:Error : controller-1 Clstr *** Heartbeat Loss *** 2019-09-19T17:26:42.189 [3163376.00304] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (1846) recovery_handler : Warn : controller-1 Loss Of Communication for 5 seconds ; disabling host 2019-09-19T17:26:42.189 [3163376.00305] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (1847) recovery_handler : Warn : controller-1 ... stopping host services 2019-09-19T17:26:42.189 [3163376.00306] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (1848) recovery_handler : Warn : controller-1 ... continuing with graceful recovery 2019-09-19T17:26:42.189 [3163376.00310] controller-0 mtcAgent hbs nodeClass.cpp (1672) alarm_enabled_failure :Error : controller-1 critical enable failure 2019-09-19T17:26:42.205 [3163376.00318] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (1954) recovery_handler : Info : controller-1 Graceful Recovery Wait (1200 secs) (uptime was 0) Cable reinserted resulting in observed mtcAlive. 2019-09-19T17:27:19.267 [3163376.00338] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (2016) recovery_handler : Info : controller-1 regained MTCALIVE from host that did not reboot (uptime:9231) 2019-09-19T17:27:19.267 [3163376.00339] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (2017) recovery_handler : Info : controller-1 ... uptimes before:0 after:9231 2019-09-19T17:27:19.267 [3163376.00340] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (2018) recovery_handler : Info : controller-1 ... exiting graceful recovery 2019-09-19T17:27:19.267 [3163376.00341] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (2019) recovery_handler : Info : controller-1 ... forcing full enable with reset 2019-09-19T17:27:19.267 [3163376.00342] controller-0 mtcAgent |-| nodeClass.cpp (7325) force_full_enable : Info : controller-1 Forcing Full Enable Sequence 2019-09-19T17:27:19.267 [3163376.00343] controller-0 mtcAgent hbs nodeClass.cpp (5941) allStateChange : Info : controller-1 unlocked-disabled-failed (seq:12) 2019-09-19T17:27:19.277 [3163376.00344] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp ( 560) enable_handler :Error : controller-1 Main Enable FSM (from failed) 2019-09-19T17:27:19.277 [3163376.00345] controller-0 mtcAgent msg mtcCtrlMsg.cpp ( 878) send_hbs_command : Info : controller-1 stop host service sent to controller-0 hbsAgent 2019-09-19T17:27:19.277 [3163376.00346] controller-0 mtcAgent msg mtcCtrlMsg.cpp ( 878) send_hbs_command : Info : controller-1 stop host service sent to controller-1 hbsAgent 2019-09-19T17:27:19.283 [107366.01375] controller-0 hbsAgent hbs nodeClass.cpp (7505) mon_host : Info : controller-1 stopping heartbeat service mtcAgent is running on both controllers at this point. The peer re-enabled heartbeat, undesirable but with no real consequence. 2019-09-19T17:27:29.108 [995970.00106] controller-1 mtcAgent msg mtcCtrlMsg.cpp ( 878) send_hbs_command : Info : controller-1 add host sent to controller-0 hbsAgent 2019-09-19T17:27:29.108 [107366.01485] controller-0 hbsAgent hbs nodeClass.cpp (7489) mon_host : Info : controller-1 starting heartbeat service 2019-09-19T17:27:29.325 [3163376.00389] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (1022) enable_handler : Info : controller-1 Booting (timeout: 1200 secs) (0 Due to the unexpected heartbeat restart we see another Loss failure which adds on eto the retry count 2019-09-19T17:28:38.035 [3163376.00477] controller-0 mtcAgent |-| nodeClass.cpp (2523) stop_offline_handler : Info : controller-1 stopping offline handler (unlocked-disabled-failed) (stage:3) 2019-09-19T17:28:38.035 [3163376.00478] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (1846) recovery_handler : Warn : controller-1 Loss Of Communication for 5 seconds ; disabling host 2019-09-19T17:28:38.035 [3163376.00479] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (1847) recovery_handler : Warn : controller-1 ... stopping host services 2019-09-19T17:28:38.035 [3163376.00480] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (1848) recovery_handler : Warn : controller-1 ... continuing with graceful recovery After the boot we regain mtcAlive and execute the Enable sequence 2019-09-19T17:32:37.937 [3163376.00557] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (2047) recovery_handler : Info : controller-1 regained MTCALIVE from host that has rebooted (uptime curr:188 save:0) 2019-09-19T17:32:37.937 [3163376.00558] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (2050) recovery_handler : Info : controller-1 ... continuing with graceful recovery 2019-09-19T17:32:37.937 [3163376.00559] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (2052) recovery_handler : Info : controller-1 ... without additional reboot 2019-09-19T17:32:37.937 [3163376.00560] controller-0 mtcAgent inv mtcInvApi.cpp (1079) mtcInvApi_update_state : Info : controller-1 intest (seq:33) 2019-09-19T17:32:37.937 [3163376.00561] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (2133) recovery_handler : Info : controller-1 waiting for GOENABLED ; with 600 sec timeout 2019-09-19T17:33:23.101 [3163376.00562] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (2173) recovery_handler : Info : controller-1 got GOENABLED (Graceful Recovery) 2019-09-19T17:33:23.111 [3163376.00563] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (2197) recovery_handler : Info : controller-1 Starting Host Services 2019-09-19T17:33:23.111 [3163376.00564] controller-0 mtcAgent hbs nodeClass.cpp (7438) launch_host_services_cmd: Info : controller-1 start controller host services launch 2019-09-19T17:33:23.537 [3163376.00565] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (2667) host_services_handler : Info : controller-1 start controller host services completed 2019-09-19T17:33:23.547 [3163376.00566] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (2291) recovery_handler : Info : controller-1-worker configured 2019-09-19T17:33:23.557 [3163376.00567] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (2314) recovery_handler : Info : controller-1-worker running out-of-service tests We start the 11 second heartbeat soak 2019-09-19T17:34:12.479 [3163376.00605] controller-0 mtcAgent msg mtcCtrlMsg.cpp ( 878) send_hbs_command : Info : controller-1 start host service sent to controller-0 hbsAgent 2019-09-19T17:34:12.480 [3163376.00606] controller-0 mtcAgent msg mtcCtrlMsg.cpp ( 878) send_hbs_command : Info : controller-1 start host service sent to controller-1 hbsAgent 2019-09-19T17:34:12.480 [3163376.00607] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (2470) recovery_handler : Info : controller-1 Starting 11 sec Heartbeat Soak (with ready event) ... and immediately get another heartbeat failure ; but on the Cluster network Only 2019-09-19T17:34:13.533 [3163376.00616] controller-0 mtcAgent hbs nodeClass.cpp (4696) manage_heartbeat_failure:Error : controller-1 Clstr *** Heartbeat Loss *** 2019-09-19T17:34:13.533 [3163376.00617] controller-0 mtcAgent hbs nodeClass.cpp (4707) manage_heartbeat_failure:Error : controller-1 Clstr network heartbeat failure 2019-09-19T17:34:13.533 [3163376.00618] controller-0 mtcAgent inv mtcInvApi.cpp (1079) mtcInvApi_update_state : Info : controller-1 failed (seq:36) 2019-09-19T17:34:13.533 [3163376.00619] controller-0 mtcAgent hbs nodeClass.cpp (4716) manage_heartbeat_failure: Warn : controller-1 restarting graceful recovery 2019-09-19T17:34:13.533 [3163376.00620] controller-0 mtcAgent |-| mtcNodeHdlrs.cpp (1637) recovery_handler : Info : controller-1 Graceful Recovery (uptime was 284) 2019-09-19T17:34:13.534 [3163376.00621] controller-0 mtcAgent msg mtcCtrlMsg.cpp ( 878) send_hbs_command : Info : controller-1 stop host service sent to controller-0 hbsAgent 2019-09-19T17:34:13.534 [3163376.00622] controller-0 mtcAgent msg mtcCtrlMsg.cpp ( 878) send_hbs_command : Info : controller-1 stop host service sent to controller-1 hbsAgent 2019-09-19T17:34:13.534 [3163376.00623] controller-0 mtcAgent hdl mtcNodeHdlrs.cpp (1665) recovery_handler :Error : controller-1 Graceful Recovery Failed (retries=3) 2019-09-19T17:34:13.534 [3163376.00624] controller-0 mtcAgent |-| nodeClass.cpp (7325) force_full_enable : Info : controller-1 Forcing Full Enable Sequence That's the extra reboot. Issue is why the clsuter networks was failing heartbeat at that point. This is a mgmnt and cluster vlans on top of a physical management_interface=enp24s0f0.186 cluster_host_interface=enp24s0f0.187