mariadb recovery required too often/pc.recovery not working

Bug #1636302 reported by Mark Casey
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
Triaged
Wishlist
Unassigned

Bug Description

This is imho why the mariadb container so often complains about not being able to find the primary view when the entire cluster is shut down relatively cleanly and all at once.

For upstream when MariaDB with Galera was added to systemd platforms pc.recovery was not working [1] because the pc.recovery feature is not entirely contained in the mysqld binary; it requires support from its init scripts/units. This patch fixes it for upstream [2]

I believe Kolla's custom daemon-start process would have to adopt some of these aspects for pc.recovery to work with Kolla, and that if this is done mariadb-recovery (which will always either be risky or burdensome on the Operator by its requirement to choose a CORRECT node to [re]bootstrap from) would be required much less often.

OR, alternatively, because there is already some discussion of running 'mysqld --wsrep-recover' when choosing [re]bootstrap node, because pc.recovery also requires running this, and because Kolla's mysqld startup is already custom, perhaps we could just implement these required actions for pc.recovery inside mariadb-recovery and make the decision on whether to (re)bootstrap there (i.e.: pc.recovery worked? no rebootstrap needed).

[1]: https://jira.mariadb.org/browse/MDEV-10004
[2]: http://lists.askmonty.org/pipermail/commits/2016-May/009384.html

A quote from [2]:
> Galera recovery process works in two phases. In the first
> phase, mysqld is started as non-daemon with --wsrep-recover
> to recover and fetch the last logged global transaction ID.
> This ID is then used in second phase as the start position
> (--wsrep-start-position=XX) to start mysqld as daemon.

> As this process was implemented in mysqld_safe script, the
> recovery did not work when server was started using systemd.

> Fixed by introducing a shell script (wsrep_recovery.sh) that
> mimics the first phase of the recovery process.

Changed in kolla:
status: New → Triaged
importance: Undecided → Wishlist
Changed in kolla:
milestone: none → ocata-3
Changed in kolla:
milestone: ocata-3 → ocata-rc1
Changed in kolla:
milestone: ocata-rc1 → pike-1
Changed in kolla:
milestone: pike-2 → pike-3
Changed in kolla:
milestone: pike-3 → pike-rc1
Changed in kolla:
milestone: pike-rc1 → queens-1
Changed in kolla:
milestone: queens-2 → queens-3
Mark Goddard (mgoddard)
affects: kolla → kolla-ansible
Changed in kolla-ansible:
milestone: queens-3 → none
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.