New MariaDB instance deployment fails due to missing container or possibly breaks the cluster
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla-ansible |
Fix Released
|
Medium
|
Radosław Piliszek | ||
Rocky |
Won't Fix
|
Medium
|
Radosław Piliszek | ||
Stein |
Won't Fix
|
Medium
|
Radosław Piliszek | ||
Train |
Fix Released
|
Medium
|
Radosław Piliszek | ||
Ussuri |
Fix Released
|
Medium
|
Radosław Piliszek |
Bug Description
Scenario:
- have a working k-a deployment with mariadb on n hosts
- add (n+1)th host to the mariadb group and make it first (easily achievable with ini, random with yml)
- k-a Stein will fail complaining about missing container:
RUNNING HANDLER [mariadb : remove restart policy from master mariadb]
Error response from daemon: No such container: mariadb
This has been reported by users but the AIO case had a different root cause - already existing volume but not the container (see https:/
Now neither the volume nor the container exist so it *should* work, yet it fails.
The culprit is that the lookup_cluster finally does *not* register the master mariadb server properly, keeping the first one instead as master. This breaks Stein deploy action completely.
In other series (Rocky, Train) this violates handler logic which may break the mariadb cluster if container images are about to change in the same action. Otherwise, they are affected only cosmetically where k-a claims it starts master on the new node. :-)
PS: Looks like it has been broken since the refactoring in pike: https:/
tags: | added: galera wsrep |
Fix proposed to branch: master /review. opendev. org/700785
Review: https:/