mariadb fails when standby controller rebooted in AIO-DX
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
StarlingX |
Fix Released
|
High
|
Bin Qian |
Bug Description
Brief Description
-----------------
After force rebooting the standby controller in an AIO-DX (two node) system, the mariadb was unavailable until the standy controller came back into service (which can take about 10 minutes).
Severity
--------
Critical: If the standby controller had been powered down instead of rebooted, this would result in a complete outage of all openstack services.
Steps to Reproduce
------------------
Install an AIO-DX system and the stx-openstack application. Force reboot the standby controller.
Expected Behavior
------------------
When the standby controller is rebooted, the mariadb pod on the active controller should continue to function (after a brief disruption when the peer is lost).
Actual Behavior
----------------
The mariadb pod on the active controller (controller-0) did not become primary so the database was unaccessible. The mariadb pod on controller-0 shows 0/1 containers ready:
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mariadb-
mariadb-
mariadb-
mariadb-
mariadb-server-0 1/1 Terminating 0 29m 172.16.167.35 controller-1 <none> <none>
mariadb-server-1 0/1 Running 0 10m 172.16.192.112 controller-0 <none> <none>
The /var/log/
Reproducibility
---------------
Intermittent: I have seen this several times now.
System Configuration
-------
AIO-DX (two node) system
Branch/Pull Time/Commit
-------
Designer load:
BUILD_DATE=
Last Pass
---------
Unsure
Timestamp/Logs
--------------
The collect logs for controller-0 and controller-1 will be attached. Some times:
2019-07-23T17:08:11 - force reboot of controller-1 - mariadb did not recover until controller-1 recovered
2019-07-23T18:53:30 - deleted mariadb pod on controller-0 (with kubectl) and pod recovered
2019-07-23T19:01:26 - force reboot of controller-1 - mariadb did not recover until controller-1 recovered
Test Activity
-------------
Developer testing
tags: | added: in-r-stx20 |
Marking as stx.2.0 / high priority given the system impact