Comment 3 for bug 1840176

Revision history for this message
Bin Qian (bqian20) wrote : Re: After host-swact completed, unexpected swact occurred

Logs show that the controller-1 was shutdown, due to ceph-mon fatal failure (exceed max transition failures). Then controller-0 went active as expected.

| 2019-08-14T08:54:00.609 | 501 | service-scn | ceph-mon | enabling | disabling | enable failed
| 2019-08-14T08:56:01.831 | 502 | service-scn | ceph-mon | disabling | disabling-failed | disable failed
| 2019-08-14T08:56:01.831 | 503 | service-group-scn | controller-services | go-active | go-active-failed | ceph-mon(disabling, failed)
| 2019-08-14T08:56:02.256 | 504 | service-scn | ceph-mon | disabling-failed | enabling-failed | enabled-active state requested
| 2019-08-14T08:56:05.362 | 505 | service-group-scn | vim-services | active | disabling |
| 2019-08-14T08:56:05.363 | 506 | service-group-scn | cloud-services | active | disabling |
| 2019-08-14T08:56:05.364 | 507 | service-group-scn | controller-services | go-active-failed | disabling-failed | ceph-mon(enabling, failed)

2019-08-14T09:01:13.000 controller-1 sm: debug time[1513.951] log<1946> INFO: sm[97172]: sm_service_audit.c(364): Service (dbmon) already running audit-disabled action.
2019-08-14T09:01:13.000 controller-1 sm: debug time[1514.051] log<1947> INFO: sm[97172]: sm_service_engine.c(157): Service (ceph-mon) has had a fatal failure and is unrecoverable.
2019-08-14T09:01:14.000 controller-1 sm: debug time[1514.555] log<1948> INFO: sm[97172]: sm_service_engine.c(157): Service (ceph-mon) has had a fatal failure and is unrecoverable.
2019-08-14T09:01:14.000 controller-1 sm: debug time[1515.056] log<1949> INFO: sm[97172]: sm_service_engine.c(157): Service (ceph-mon) has had a fatal failure and is unrecoverable.
2019-08-14T09:01:15.000 controller-1 sm: debug time[1515.558] log<1950> INFO: sm[97172]: sm_service_engine.c(157): Service (ceph-mon) has had a fatal failure and is unrecoverable.
2019-08-14T09:01:15.000 controller-1 sm: debug time[1515.917] log<1951> INFO: sm[97172]: sm_service_group_api.c(212): Service group (controller-services) recovery from fatal condition escalated to a reboot.
2019-08-14T09:01:15.000 controller-1 sm: debug time[1515.917] log<1952> INFO: sm[97172]: sm_node_api.cpp(964): Reboot of controller-1 requested, reason=service group (controller-services) recovery from fatal condition escalated to a reboot..
2019-08-14T09:01:15.000 controller-1 sm: debug time[1515.918] log<1953> INFO: sm[97172]: sm_troubleshoot.c(150): Troubleshoot process (401470) created.

ceph-mon errors:
2019-08-14T09:00:01.762 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon still Primary, demoting.
2019-08-14T09:00:01.768 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon is still mounted via /dev/drbd9
2019-08-14T09:00:01.771 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: Waiting to drop drbd-cephmon
2019-08-14T09:00:02.777 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon still Primary, demoting.
2019-08-14T09:00:02.783 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon is still mounted via /dev/drbd9
2019-08-14T09:00:02.786 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: Waiting to drop drbd-cephmon
2019-08-14T09:00:03.792 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon still Primary, demoting.
2019-08-14T09:00:03.798 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon is still mounted via /dev/drbd9
2019-08-14T09:00:03.800 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: Waiting to drop drbd-cephmon
2019-08-14T09:00:04.806 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon still Primary, demoting.
2019-08-14T09:00:04.811 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon is still mounted via /dev/drbd9
2019-08-14T09:00:04.814 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: Waiting to drop drbd-cephmon
2019-08-14T09:00:05.819 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon still Primary, demoting.
2019-08-14T09:00:05.824 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon is still mounted via /dev/drbd9
2019-08-14T09:00:05.827 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: Waiting to drop drbd-cephmon
2019-08-14T09:00:06.833 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon still Primary, demoting.
2019-08-14T09:00:06.839 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: drbd-cephmon is still mounted via /dev/drbd9
2019-08-14T09:00:06.842 controller-1 OCF_drbd(drbd-cephmon:warning 1)[380462]: WARNING: Waiting to drop drbd-cephmon