seeded file is missing caused by SST due to data inconsistency

Bug #2000107 reported by Hua Zhang
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Percona Cluster Charm
Fix Committed
Undecided
Unassigned

Bug Description

The lp bug 1868326 handles two situations where seeded file is lost, but it doesn't cover another situation. The following customer logs show that SST can be triggered due to data inconsistency, SST will remove the seeded file and the charm won't ever recreate it again.

2022-10-18T04:10:55.963214Z 48 [ERROR] Slave SQL: Could not execute Delete_rows event on table workloadmgr.setting_metadata; Can't find record in 'setting_metadata', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 352, Error_code: 1032
2022-10-18T04:10:55.963233Z 48 [Warning] WSREP: RBR event 3 Delete_rows apply warning: 120, 1426184637
......

2022-10-18T04:10:55.965096Z 48 [ERROR] WSREP: Failed to apply trx 1426184637 4 times
2022-10-18T04:10:55.965107Z 48 [ERROR] WSREP: Node consistency compromised, aborting...
2022-10-18T04:10:55.965217Z 48 [Note] WSREP: turning isolation on
......

2022-10-18T04:11:00.966995Z 48 [Note] WSREP: /usr/sbin/mysqld: Terminated.
Aborted
2022-10-18T04:11:01.269881Z mysqld_safe Number of processes running now: 0
2022-10-18T04:11:01.274788Z mysqld_safe WSREP: sleeping 15 seconds before restart
2022-10-18T04:11:16.280616Z mysqld_safe mysqld restarted
2022-10-18T04:11:16.308401Z mysqld_safe WSREP: Running position recovery with --log_error='/var/lib/percona-xtradb-cluster/wsrep_recovery.9kU4kC' --pid-file='/var/lib/percona-xtradb-cluster/juju-de2b34-26-lxd-7-recover.pid'
2022-10-18T04:11:25.664909Z mysqld_safe WSREP: Recovered position ea5dc9d5-351f-11eb-9431-aa4bc52e10af:1426184636
Log of wsrep recovery (--wsrep-recover):
2022-10-18T04:11:16.759233Z 0 [Note] /usr/sbin/mysqld (mysqld 5.7.20-18-18-log) starting as process 368837 ...

tags: added: sts
Revision history for this message
Rafael Lopez (rafael.lopez) wrote (last edit ):

This is reproducible by triggering an SST to a secondary node. One way to do this:

1. Deploy charm using 3 units with min-size 2, eg.
juju deploy -n 3 --series bionic percona-cluster --config min-cluster-size=2

2. Log into a non-leader node, stop mysql, delete /var/lib/percona-xtradb-cluster/grastate.dat, and restart
juju ssh {non leader}
sudo systemctl stop mysql
sudo rm /var/lib/percona-xtradb-cluster/grastate.dat
sudo systemctl start mysql

This will trigger an SST which juju does not know about, which wipes out most of /var/lib/percona-xtradb-cluster/ (normal operation, as documented in [1]), including the 'seeded' file.

3. After the database completes the SST the mysql percona cluster is completely recovered. Checking /var/lib/percona-xtradb-cluster/seeded on the non-leader node, it is missing and juju status will show the unit stuck waiting to bootstrap as follows:
Unit Workload Agent Machine Public address Ports Message
percona-cluster/0* active idle 37 10.133.201.64 3306/tcp Unit is ready
percona-cluster/1 waiting idle 38 10.133.201.185 3306/tcp Unit waiting to bootstrap ('seeded' file missing)
percona-cluster/2 active idle 39 10.133.201.124 3306/tcp Unit is ready

[1] https://docs.percona.com/percona-xtradb-cluster/5.7/manual/xtrabackup_sst.html

Changed in charm-percona-cluster:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-percona-cluster (master)
Changed in charm-percona-cluster:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (master)

Reviewed: https://review.opendev.org/c/openstack/charm-percona-cluster/+/869894
Committed: https://opendev.org/openstack/charm-percona-cluster/commit/a86390aeabca27167e26f38afb202e7daf39b185
Submitter: "Zuul (22348)"
Branch: master

commit a86390aeabca27167e26f38afb202e7daf39b185
Author: Rafael Lopez <email address hidden>
Date: Thu Jan 12 05:01:59 2023 +0000

    Additional check to replace missing seeded file

    The additional check is based on cluster being bootstrapped and the last
    backup being a SST.

    The change includes new function for checking the last backup was SST and unittests to verify said function as well as the main charm_check_func where the check is used and seeded file is replaced.

    Closes-Bug: #2000107
    Signed-off-by: Rafael Lopez <email address hidden>
    Change-Id: I8e516059da5299cc0e0ce8ef0802d3a46abb1a54

Changed in charm-percona-cluster:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-percona-cluster (stable/bionic)

Fix proposed to branch: stable/bionic
Review: https://review.opendev.org/c/openstack/charm-percona-cluster/+/874978

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (stable/bionic)

Reviewed: https://review.opendev.org/c/openstack/charm-percona-cluster/+/874978
Committed: https://opendev.org/openstack/charm-percona-cluster/commit/dc5790eabd687519c20d00a366888a89aeeb94be
Submitter: "Zuul (22348)"
Branch: stable/bionic

commit dc5790eabd687519c20d00a366888a89aeeb94be
Author: Rafael Lopez <email address hidden>
Date: Thu Jan 12 05:01:59 2023 +0000

    Additional check to replace missing seeded file

    The additional check is based on cluster being bootstrapped and the last
    backup being a SST.

    The change includes new function for checking the last backup was SST
    and unittests to verify said function as well as the main
    charm_check_func where the check is used and seeded file is replaced.

    Closes-Bug: #2000107
    Change-Id: I8e516059da5299cc0e0ce8ef0802d3a46abb1a54
    (cherry picked from commit a86390aeabca27167e26f38afb202e7daf39b185)

tags: added: in-stable-bionic
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.