Does not work: MariaDB database backup and restore in kolla-ansible

Bug #1952966 reported by Marko Sluga
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
New
Undecided
Unassigned

Bug Description

- [x] This doc is inaccurate in this way: The restore procedure on ubuntu does not work. Tried wallaby and xena with the same problem

Restore:

1. I create a restore container with the volume attached:

sudo docker run --rm -it --volumes-from mariadb --name dbrestore --volume mariadb_backup:/backup kolla/ubuntu-binary-mariadb-server:xena /bin/bash

2. I iun the restore commands inside the restore container as per the documentation:
cd /backup
rm -rf /backup/restore
mkdir -p /backup/restore/full
gunzip mysqlbackup-01-12-2021-1638392789.qp.xbc.xbs.gz
mbstream -x -C /backup/restore/full/ < mysqlbackup-01-12-2021-1638392789.qp.xbc.xbs
mariabackup --prepare --target-dir /backup/restore/full

Result is: [00] 2021-12-01 22:24:41 completed OK!

exit

3. I stop the restore container:
sudo docker stop mariadb

4. I start up a new mariadb container:
sudo docker run --rm -it --volumes-from mariadb --name dbrestore --volume mariadb_backup:/backup kolla/ubuntu-binary-mariadb-server:xena /bin/bash

5. I restore mariadb to the new container:
rm -rf /var/lib/mysql/*
rm -rf /var/lib/mysql/\.[^\.]*
mariabackup --copy-back --target-dir /backup/restore/full

Result is [00] 2021-12-01 22:25:48 completed OK!

exit

6. I start mariadb:
sudo docker start mariadb

Check that it started:
sudo docker logs mariadb

output is:

Running command: '/usr/bin/mysqld_safe'
211201 21:26:20 mysqld_safe Logging to '/var/log/kolla/mariadb/mariadb.log'.
211201 21:26:20 mysqld_safe Starting mariadbd daemon with databases from /var/lib/mysql/
Running command: '/usr/bin/mysqld_safe'
211201 22:26:52 mysqld_safe Logging to '/var/log/kolla/mariadb/mariadb.log'.
211201 22:26:52 mysqld_safe Starting mariadbd daemon with databases from /var/lib/mysql/
Running command: '/usr/bin/mysqld_safe'
211201 22:27:26 mysqld_safe Logging to '/var/log/kolla/mariadb/mariadb.log'.
211201 22:27:26 mysqld_safe Starting mariadbd daemon with databases from /var/lib/mysql/

It does not show "Recovered position" or anything like what the documentation shows and is borked.

Running kolla-ansible -i ~/multinode mariadb_recovery fixes it but returns it to the last known database state not to the full backup that I am trying to restore.

Thanks!

If you have a troubleshooting or support issue, use the following resources:

 - The mailing list: https://lists.openstack.org
 - IRC: 'openstack' channel on OFTC

-----------------------------------
Release: 13.1.0.dev51 on 2021-01-18 23:14:14
SHA: fa2a0d572bd195a56e3843e024ac1fdada8e1764
Source: https://opendev.org/openstack/kolla-ansible/src/doc/source/admin/mariadb-backup-and-restore.rst
URL: https://docs.openstack.org/kolla-ansible/latest/admin/mariadb-backup-and-restore.html

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

What do the mariadb logs in /var/log/kolla/mariadb say?

Changed in kolla-ansible:
status: New → Invalid
status: Invalid → Incomplete
Revision history for this message
Marko Sluga (markosluga) wrote :
Download full text (9.9 KiB)

This is the madiadb log right after I restore:

211202 14:02:53 mysqld_safe mysqld from pid file /var/lib/mysql/mariadb.pid ended
211202 14:03:04 mysqld_safe Starting mariadbd daemon with databases from /var/lib/mysql/
211202 14:03:04 mysqld_safe WSREP: Running position recovery with --disable-log-error --pid-file='/var/lib/mysql//kolla05-recover.pid'
211202 14:03:05 mysqld_safe WSREP: Recovered position 5f563bc7-52de-11ec-a1eb-4eaf1c89ec27:10082
2021-12-02 14:03:05 0 [Note] /usr/sbin/mariadbd (mysqld 10.5.13-MariaDB-1:10.5.13+maria~focal-log) starting as process 226 ...
2021-12-02 14:03:05 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 5f563bc7-52de-11ec-a1eb-4eaf1c89ec27:10082
2021-12-02 14:03:05 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2021-12-02 14:03:05 0 [Note] WSREP: wsrep_load(): Galera 26.4.9(r819f29cb) by Codership Oy <email address hidden> loaded successfully.
2021-12-02 14:03:05 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2021-12-02 14:03:05 0 [Warning] WSREP: Could not open state file for reading: '/var/lib/mysql//grastate.dat'
2021-12-02 14:03:05 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2021-12-02 14:03:05 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 0
UUID: 00000000-0000-0000-0000-000000000000
Seqno: -1 - -1
Offset: -1
Synced: 0
2021-12-02 14:03:05 0 [Note] WSREP: Skipped GCache ring buffer recovery: could not determine history UUID.
2021-12-02 14:03:05 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 192.168.1.155; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.listen_addr = tcp://192.168.1.155:4567; gmcast.segment = 0; gmcast.version
2021-12-02 14:03:05 0 [Note] WSREP: Start replication
2021-12-02 14:03:05 0 [Note] WSREP: Connecting with bootstrap option: 0
2021-12-02 14:03:05 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2021-12-02 14:03:05 0 [Note] WSREP: protonet asio version 0
2021-12-02 14:03:05 0 [Note] WSREP: Using CRC-32C for message checksums.
2021-12-02 14:03:05 0 [Note] WSREP: backend: asio
2021-12-02 14:03:05 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2021-12-02 14:03:05 0 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directo...

Revision history for this message
Marko Sluga (markosluga) wrote :
Download full text (4.4 KiB)

I actually found a fix.

Restoring the database with the steps described in the on ALL the controller nodes at the same time and then running:

kolla-ansible -i ~/multinode mariadb_recovery

will actually bring the cluster up to the state recorded in the backup.

This is the log after the recovery is done:

2021-12-02 15:57:23 6 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-12-02 15:57:23 6 [Note] WSREP: Lowest cert index boundary for CC from group: 3
2021-12-02 15:57:23 6 [Note] WSREP: Min available from gcache for CC from group: 2
2021-12-02 15:57:24 0 [Note] WSREP: Member 1.0 (kolla06) requested state transfer from '*any*'. Selected 0.0 (kolla05)(SYNCED) as donor.
2021-12-02 15:57:24 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 82)
2021-12-02 15:57:24 6 [Note] WSREP: Detected STR version: 1, req_len: 131, req: STRv1
2021-12-02 15:57:24 6 [Note] WSREP: IST request: 7caefad7-5388-11ec-8ce9-0a736430e74d:79-81|tcp://192.168.1.156:4568
2021-12-02 15:57:24 6 [Note] WSREP: Server status change synced -> donor
2021-12-02 15:57:24 6 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-12-02 15:57:24 6 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-12-02 15:57:24 0 [Note] WSREP: Running: 'wsrep_sst_mariabackup --role 'donor' --address '192.168.1.156:4444/xtrabackup_sst//1' --local-port '3306' --socket '/run/mysqld/mysqld.sock' --datadir '/var/lib/mysql/' --gtid '7caefad7-5388-11ec-8ce9-0a736430e74d:79' --gtid-domain-id '0' --binlog 'mysql-bin' --bypass --mysqld-args --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=/usr/lib/mysql/plugin --wsrep_provider=/usr/lib/galera/libgalera_smm.so --wsrep_on=ON --log-error=/var/log/kolla/mariadb/mariadb.log --pid-file=/var/lib/mysql/mariadb.pid --port=3306 --wsrep_start_position=5f563bc7-52de-11ec-a1eb-4eaf1c89ec27:10082'
2021-12-02 15:57:24 0 [Note] WSREP: Donor monitor thread started to monitor
2021-12-02 15:57:24 6 [Note] WSREP: sst_donor_thread signaled with 0
2021-12-02 15:57:24 0 [Note] WSREP: async IST sender starting to serve tcp://192.168.1.156:4568 sending 3-81, preload starts from 3
2021-12-02 15:57:24 0 [Note] WSREP: IST sender 3 -> 81
WSREP_SST: [INFO] SSL configuration: CA='', CERT='', KEY='', MODE='DISABLED', encrypt='0' (20211202 15:57:24.107)
WSREP_SST: [INFO] Moving '/var/lib/mysql/mariabackup.prepare.log' to '/tmp/sst_log_archive/mariabackup.prepare.log.2021.12.02-15.57.24.189712156' (20211202 15:57:24.192)
WSREP_SST: [INFO] Moving '/var/lib/mysql/mariabackup.move.log' to '/tmp/sst_log_archive/mariabackup.move.log.2021.12.02-15.57.24.189712156' (20211202 15:57:24.196)
WSREP_SST: [INFO] Streaming with mbstream (20211202 15:57:24.205)
WSREP_SST: [INFO] Using socat as streamer (20211202 15:57:24.212)
WSREP_SST: [INFO] Bypassing the SST for IST (20211202 15:57:24.224)
WSREP_SST: [INFO] Evaluating '/usr//bin/mbstream' -c 'xtrabackup_galera_info' 'xtrabackup_ist' | socat -u stdio TCP:192.168.1.156:4444; RC=( ${PIPESTATUS[@]} ) (20211202 15:57:24.256)
2021-12-02 15:57:24 0 [Note] WSREP: SST sent: 7caefad7-5388-11ec-8ce9-0a736430e74d:79
2021-12-02 15:57:24 0 [Note] WSREP: Server status change donor -> joine...

Read more...

Revision history for this message
Marko Sluga (markosluga) wrote :

Can anyone explain why kolla-ansible -i ~/multinode mariadb_recovery is needed to get he cluster up and running again?

Is this a thing that needs to be done and the documentation is missing or is this specific to Ubunut?

Changed in kolla-ansible:
status: Incomplete → New
Revision history for this message
Yusuf Güngör (yusuf2) wrote :

Hi, MariaDB backup restore documentation illustrates backup-restore for one node mariadb setup. How about backup-restore for a 3 or 5 node MariaDB cluster?

Is it ok to still get backup for first controller?
Do we need to stop all mariadb containers and apply the restore steps to all mariadb containers?
Do we need to run mariadb_recovery?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.