Comment 3 for bug 1833609

Revision history for this message
Lin Shuicheng (shuicheng) wrote :

Hi Peng,
There is controller-0 log only in the "ALL_NODES" log file. Is the controller-1 log still available?
The issue should be due to there is failure in controller-1, and system auto swact to controller-0. And lead to the stuck.

When the issue occur, controller-0 is just unlocked, so it should be in standby status. But from sm.log, there is a Uncontrolled swact, which cause active controller switched from controller-1 to controller-0.
swact will cause sysinv process stopped in controller-1, and start in controller-0. The application-apply should be run at controller-1's sysinv process. So the stuck is expected, since sysinv process is killed.

But the LOG attached contains controller-0 only, so not sure what cause the swact.

swact log as below:
2019-06-20T17:37:54.000 controller-0 sm: debug time[191.633] log<77> INFO: sm[92936]: sm_service_domain_neighbor_fsm.c(732): Neighbor (controller-1) received event (exchange-message) was in the exchange state and is now in the full.
2019-06-20T17:37:57.000 controller-0 sm: debug time[194.562] log<78> ERROR: sm[92936]: sm_service_domain_waiting_state.c(236): Service domain (controller) neighbor (controller-0) not found.
2019-06-20T17:37:57.000 controller-0 sm: debug time[194.562] log<79> INFO: sm[92936]: sm_service_domain_fsm.c(308): Set state waiting->leader
2019-06-20T17:37:57.000 controller-0 sm: debug time[194.562] log<80> INFO: sm[92936]: sm_service_domain_fsm.c(493): Service Domain (controller) received event (wait-expired) was in the waiting state and is now in the leader state.
2019-06-20T17:37:57.000 controller-0 sm: debug time[194.563] log<4> INFO: sm_alarm[92950]: sm_alarm_thread.c(1224): Managing alarms for domain (controller).
2019-06-20T17:37:57.000 controller-0 sm: debug time[194.563] log<81> INFO: sm[92936]: sm_service_domain_filter.c(338): Uncontrolled swact start
2019-06-20T17:37:57.000 controller-0 sm: debug time[194.563] log<82> INFO: sm[92936]: sm_node_swact_monitor.cpp(29): Swact has started, host will be active