Multinode MariaDB 10.5 (galera 4) deployments may fail on WSREP
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla-ansible |
Fix Released
|
High
|
Radosław Piliszek | ||
Wallaby |
Fix Released
|
High
|
Radosław Piliszek | ||
Xena |
Fix Released
|
High
|
Radosław Piliszek | ||
Yoga |
Fix Released
|
High
|
Radosław Piliszek |
Bug Description
There seems to be a bug in Galera that causes TASK [mariadb : Check MariaDB service WSREP sync status] to fail.
One (in case of 3-node cluster) or more (possible with more-than-3-node clusters) nodes may "lose the race" and get stuck in the "initialized" state of WSREP. This is entirely random as is the case with most race issues.
MariaDB service restart on that node will fix the situation but it's unwieldy.
The above may happen because Kolla Ansible starts and waits for all new nodes at once.
This did not bother the old galera (galera 3) which figured out the ordering for itself and let each node join the cluster properly.
The proposed workaround is to start and wait for nodes serially.
tags: | added: mariadb |
tags: | added: galera |
tags: | added: multinode |
In Wallaby affects only Debian as only Debian uses MariaDB 10.5 and galera 4.