[2023.1] deployment fail at mariadb : Restart MariaDB container

Bug #2023937 reported by KangWoo Lee
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla-ansible
In Progress
Undecided
Unassigned
Antelope
In Progress
Undecided
Unassigned

Bug Description

2023.1 kolla-ansible deployment fail at mariadb cluster deployment.

TASK [mariadb : Ensure MariaDB is running normally on bootstrap host] ***************************************************************************************************************************************
changed: [r9-control3]
included: /usr/local/share/kolla-ansible/ansible/roles/mariadb/tasks/restart_services.yml for r9-control1

RUNNING HANDLER [mariadb : Restart MariaDB container] *******************************************************************************************************************************************************
changed: [r9-control1]
FAILED - RETRYING: [r9-control2]: Wait for MariaDB service port liveness (10 retries left).
FAILED - RETRYING: [r9-control3]: Wait for MariaDB service port liveness (10 retries left).
FAILED - RETRYING: [r9-control2]: Wait for MariaDB service port liveness (9 retries left).
FAILED - RETRYING: [r9-control3]: Wait for MariaDB service port liveness (9 retries left).
FAILED - RETRYING: [r9-control2]: Wait for MariaDB service port liveness (8 retries left).
FAILED - RETRYING: [r9-control3]: Wait for MariaDB service port liveness (8 retries left).
FAILED - RETRYING: [r9-control2]: Wait for MariaDB service port liveness (7 retries left).
FAILED - RETRYING: [r9-control3]: Wait for MariaDB service port liveness (7 retries left).
FAILED - RETRYING: [r9-control3]: Wait for MariaDB service port liveness (6 retries left).
FAILED - RETRYING: [r9-control2]: Wait for MariaDB service port liveness (6 retries left).
FAILED - RETRYING: [r9-control3]: Wait for MariaDB service port liveness (5 retries left).
FAILED - RETRYING: [r9-control2]: Wait for MariaDB service port liveness (5 retries left).
FAILED - RETRYING: [r9-control3]: Wait for MariaDB service port liveness (4 retries left).
FAILED - RETRYING: [r9-control2]: Wait for MariaDB service port liveness (4 retries left).
FAILED - RETRYING: [r9-control3]: Wait for MariaDB service port liveness (3 retries left).
FAILED - RETRYING: [r9-control2]: Wait for MariaDB service port liveness (3 retries left).
FAILED - RETRYING: [r9-control3]: Wait for MariaDB service port liveness (2 retries left).
FAILED - RETRYING: [r9-control2]: Wait for MariaDB service port liveness (2 retries left).
FAILED - RETRYING: [r9-control3]: Wait for MariaDB service port liveness (1 retries left).
FAILED - RETRYING: [r9-control2]: Wait for MariaDB service port liveness (1 retries left).

TASK [mariadb : Restart MariaDB container] ******************************************************************************************************************************************************************
fatal: [r9-control2]: FAILED! => {"attempts": 10, "changed": false, "elapsed": 60, "msg": "Timeout when waiting for search string MariaDB in 10.33.0.12:3306"}

TASK [mariadb : Restart MariaDB container] ******************************************************************************************************************************************************************
fatal: [r9-control3]: FAILED! => {"attempts": 10, "changed": false, "elapsed": 60, "msg": "Timeout when waiting for search string MariaDB in 10.33.0.13:3306"}

RUNNING HANDLER [mariadb : Wait for MariaDB service port liveness] ******************************************************************************************************************************************
FAILED - RETRYING: [r9-control1]: Wait for MariaDB service port liveness (10 retries left).
FAILED - RETRYING: [r9-control1]: Wait for MariaDB service port liveness (9 retries left).
FAILED - RETRYING: [r9-control1]: Wait for MariaDB service port liveness (8 retries left).

----------- control3 mariadb.log
2023-06-15 8:31:46 0 [Note] WSREP: save pc into disk
2023-06-15 8:31:46 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
2023-06-15 8:31:46 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
2023-06-15 8:31:46 0 [Note] WSREP: STATE EXCHANGE: sent state msg: a0eae2d3-0b0b-11ee-9aec-aa932c7b947b
2023-06-15 8:31:46 0 [Note] WSREP: STATE EXCHANGE: got state msg: a0eae2d3-0b0b-11ee-9aec-aa932c7b947b from 0 (r9-control1)
2023-06-15 8:31:46 0 [Note] WSREP: STATE EXCHANGE: got state msg: a0eae2d3-0b0b-11ee-9aec-aa932c7b947b from 2 (r9-control2)
2023-06-15 8:31:46 0 [Note] WSREP: STATE EXCHANGE: got state msg: a0eae2d3-0b0b-11ee-9aec-aa932c7b947b from 1 (r9-control3)
2023-06-15 8:31:46 0 [Warning] WSREP: Quorum: No node with complete state:

        Version : 6
        Flags : 0x1
        Protocols : 2 / 10 / 4
        State : NON-PRIMARY
        Desync count : 0
        Prim state : NON-PRIMARY
        Prim UUID : 00000000-0000-0000-0000-000000000000
        Prim seqno : -1
        First seqno : 1
        Last seqno : 2
        Commit cut : 2
        Last vote : -1.0
        Vote policy : 0
        Prim JOINED : 0
        State UUID : a0eae2d3-0b0b-11ee-9aec-aa932c7b947b
        Group UUID : 93636c74-0b0b-11ee-b4b1-72da202c4633
        Name : 'r9-control1'
        Incoming addr: '10.33.0.11:3306'

        Version : 6
        Flags : 00
        Protocols : 2 / 10 / 4
        State : NON-PRIMARY
        Desync count : 0
        Prim state : PRIMARY
        Prim UUID : 9977cd66-0b0b-11ee-85ec-127e8737e361
        Prim seqno : 2
        First seqno : -1
        Last seqno : 2
        Commit cut : 0
        Last vote : -1.0
        Vote policy : 0
        Prim JOINED : 1
        State UUID : a0eae2d3-0b0b-11ee-9aec-aa932c7b947b
        Group UUID : 93636c74-0b0b-11ee-b4b1-72da202c4633
        Name : 'r9-control3'
        Incoming addr: '10.33.0.13:3306'

        Version : 6
        Flags : 00
        Protocols : 2 / 10 / 4
        State : NON-PRIMARY
        Desync count : 0
        Prim state : JOINER
        Prim UUID : 9977cd66-0b0b-11ee-85ec-127e8737e361
        Prim seqno : 2
        First seqno : -1
        Last seqno : 2
        Commit cut : 0
        Last vote : -1.0
        Vote policy : 0
        Prim JOINED : 1
        State UUID : a0eae2d3-0b0b-11ee-9aec-aa932c7b947b
        Group UUID : 93636c74-0b0b-11ee-b4b1-72da202c4633
        Name : 'r9-control3'
        Incoming addr: '10.33.0.13:3306'

        Version : 6
        Flags : 00
        Protocols : 2 / 10 / 4
        State : NON-PRIMARY
        Desync count : 0
        Prim state : JOINER
        Prim UUID : 9977cd66-0b0b-11ee-85ec-127e8737e361
        Prim seqno : 2
        First seqno : -1
        Last seqno : 2
        Commit cut : 0
        Last vote : -1.0
        Vote policy : 0
        Prim JOINED : 1
        State UUID : a0eae2d3-0b0b-11ee-9aec-aa932c7b947b
        Group UUID : 93636c74-0b0b-11ee-b4b1-72da202c4633
        Name : 'r9-control2'
        Incoming addr: '10.33.0.12:3306'

2023-06-15 8:31:46 0 [Warning] WSREP: No re-merged primary component found.
2023-06-15 8:31:46 0 [Warning] WSREP: No bootstrapped primary component found.
2023-06-15 8:31:46 0 [ERROR] WSREP: /home/buildbot/buildbot/build/gcs/src/gcs_state_msg.cpp:gcs_state_msg_get_quorum():947: Failed to establish quorum.
2023-06-15 8:31:46 0 [Note] WSREP: Quorum results:
        version = 6,
        component = NON-PRIMARY,
        conf_id = -1,
        members = 1/3 (joined/total),
        act_id = -1,
        last_appl. = 0,
        protocols = -1/-1/-1 (gcs/repl/appl),
        vote policy= 1,
        group UUID = 00000000-0000-0000-0000-000000000000
2023-06-15 8:31:46 0 [Note] WSREP: Flow-control interval: [28, 28]
2023-06-15 8:31:46 0 [Note] WSREP: Received NON-PRIMARY.
2023-06-15 8:31:47 2 [Warning] WSREP: Send action {0x7face001f6b0, 151, STATE_REQUEST} returned -107 (Transport endpoint is not connected)
2023-06-15 8:31:48 2 [Warning] WSREP: Send action {0x7face001f6b0, 151, STATE_REQUEST} returned -107 (Transport endpoint is not connected)
2023-06-15 8:31:48 0 [Note] WSREP: (99242c70-bced, 'tcp://10.33.0.13:4567') turning message relay requesting off
2023-06-15 8:31:49 2 [Warning] WSREP: Send action {0x7face001f6b0, 151, STATE_REQUEST} returned -107 (Transport endpoint is not connected)
2023-06-15 8:31:50 2 [Warning] WSREP: Send action {0x7face001f6b0, 151, STATE_REQUEST} returned -107 (Transport endpoint is not connected)
2023-06-15 8:31:51 2 [Warning] WSREP: Send action {0x7face001f6b0, 151, STATE_REQUEST} returned -107 (Transport endpoint is not connected)
...many times same log...
WSREP_SST: [ERROR] Possible timeout in receiving first data from donor in gtid stage: exit codes: 124 0 (20230615 08:36:34.835)
WSREP_SST: [ERROR] Cleanup after exit with status: 32 (20230615 08:36:34.836)
WSREP_SST: [ERROR] Removing /tmp/tmp.M7NXuwal17/xtrabackup_galera_info file due to signal (20230615 08:36:34.838)
WSREP_SST: [INFO] Removing the sst_in_progress file (20230615 08:36:34.840)
WSREP_SST: [INFO] Cleaning up temporary directories (20230615 08:36:34.842)
2023-06-15 8:36:34 0 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '10.33.0.13:4444' --datadir '/var/lib/mysql/' --parent 265 --progress 0 --binlog 'mysql-bin' --mysqld-args --basedir=/usr --datadir=/var/lib/mysql/ --plugin-dir=/usr/lib64/mysql/plugin --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_on=ON --log-error=/var/log/kolla/mariadb/mariadb.log --pid-file=/var/lib/mysql/mariadb.pid --port=3306 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1: 32 (Broken pipe)
2023-06-15 8:36:34 0 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
2023-06-15 8:36:34 3 [Note] WSREP: SST received
2023-06-15 8:36:34 3 [Note] WSREP: SST received: 00000000-0000-0000-0000-000000000000:-1
2023-06-15 8:36:35 2 [Warning] WSREP: Send action {0x7face001f6b0, 151, STATE_REQUEST} returned -107 (Transport endpoint is not connected)
-------------

----- control1
[root@r9-control1 _data]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2a08f29494da quay.io/openstack.kolla/mariadb-server:2023.1-rocky-9 "dumb-init -- kolla_…" 12 minutes ago Up 12 minutes (unhealthy) mariadb
683a09ae90e2 quay.io/openstack.kolla/keepalived:2023.1-rocky-9 "dumb-init --single-…" 13 minutes ago Up 13 minutes keepalived
343e1fcd8d32 quay.io/openstack.kolla/haproxy:2023.1-rocky-9 "dumb-init --single-…" 13 minutes ago Up 13 minutes (healthy) haproxy
50636a88ef13 quay.io/openstack.kolla/cron:2023.1-rocky-9 "dumb-init --single-…" 14 minutes ago Up 14 minutes cron
6c0b0066481e quay.io/openstack.kolla/kolla-toolbox:2023.1-rocky-9 "dumb-init --single-…" 14 minutes ago Up 14 minutes kolla_toolbox
f7af9e31e885 quay.io/openstack.kolla/fluentd:2023.1-rocky-9 "dumb-init --single-…" 14 minutes ago Up 14 minutes fluentd

[root@r9-control1 _data]# ll /var/lib/docker/volumes/mariadb/_data/
total 242188
-rw-rw---- 1 42434 42434 417792 Jun 15 08:31 aria_log.00000001
-rw-rw---- 1 42434 42434 52 Jun 15 08:31 aria_log_control
-rw-rw---- 1 42434 42434 16384 Jun 15 08:31 ddl_recovery-backup.log
-rw-rw---- 1 42434 42434 134219032 Jun 15 08:31 galera.cache
-rw-rw---- 1 42434 42434 113 Jun 15 08:31 grastate.dat
-rw-rw---- 1 42434 42434 264 Jun 15 08:31 gvwstate.dat
-rw-rw---- 1 42434 42434 946 Jun 15 08:31 ib_buffer_pool
-rw-rw---- 1 42434 42434 12582912 Jun 15 08:31 ibdata1
-rw-rw---- 1 42434 42434 100663296 Jun 15 08:31 ib_logfile0
-rw-rw---- 1 42434 42434 0 Jun 15 08:31 multi-master.info
drwx------ 2 42434 42434 4096 Jun 15 08:31 mysql
-rw-rw---- 1 42434 42434 27596 Jun 15 08:31 mysql-bin.000001
-rw-rw---- 1 42434 42434 2732 Jun 15 08:31 mysql-bin.000002
-rw-rw---- 1 42434 42434 365 Jun 15 08:31 mysql-bin.000003
-rw-rw---- 1 42434 42434 342 Jun 15 08:31 mysql-bin.000004
-rw-rw---- 1 42434 42434 365 Jun 15 08:31 mysql-bin.000005
-rw-rw---- 1 42434 42434 95 Jun 15 08:31 mysql-bin.index
-rw-rw---- 1 42434 42434 7 Jun 15 08:31 mysql-bin.state
srwxrwxrwx 1 42434 42434 0 Jun 15 08:31 mysql.sock
-rw-r--r-- 1 42434 42434 15 Jun 15 08:31 mysql_upgrade_info
drwx------ 2 42434 42434 20 Jun 15 08:31 performance_schema
drwx------ 2 42434 42434 8192 Jun 15 08:31 sys
-rw------- 1 42434 42434 3461 Jun 15 08:31 wsrep_recovery.ok
-rw-rw---- 1 42434 42434 41 Jun 15 08:31 xtrabackup_galera_info

[root@r9-control1 _data]# cat /var/lib/docker/volumes/mariadb/_data/grastate.dat
# GALERA saved state
version: 2.1
uuid: 93636c74-0b0b-11ee-b4b1-72da202c4633
seqno: -1
safe_to_bootstrap: 0

[root@r9-control1 _data]# ss -nlp |grep maria
tcp LISTEN 0 4096 10.33.0.11:4567 0.0.0.0:* users:(("mariadbd",pid=24827,fd=8))

----- control2
[root@r9-control2 _data]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7bb5fb0bbdd4 quay.io/openstack.kolla/mariadb-server:2023.1-rocky-9 "dumb-init -- kolla_…" 19 minutes ago Up 19 minutes (unhealthy) mariadb
481bb2854925 quay.io/openstack.kolla/keepalived:2023.1-rocky-9 "dumb-init --single-…" 20 minutes ago Up 20 minutes keepalived
a1ae6e6c677c quay.io/openstack.kolla/haproxy:2023.1-rocky-9 "dumb-init --single-…" 20 minutes ago Up 20 minutes (healthy) haproxy
d978190ecfbd quay.io/openstack.kolla/cron:2023.1-rocky-9 "dumb-init --single-…" 21 minutes ago Up 21 minutes cron
a8df55b63f33 quay.io/openstack.kolla/kolla-toolbox:2023.1-rocky-9 "dumb-init --single-…" 21 minutes ago Up 21 minutes kolla_toolbox
232aa3d34809 quay.io/openstack.kolla/fluentd:2023.1-rocky-9 "dumb-init --single-…" 21 minutes ago Up 21 minutes fluentd

[root@r9-control2 _data]# ll /var/lib/docker/volumes/mariadb/_data/
total 131080
-rw-rw---- 1 42434 42434 134219032 Jun 15 08:31 galera.cache
-rw-rw---- 1 42434 42434 0 Jun 15 08:31 grastate.dat
-rw-rw---- 1 42434 42434 264 Jun 15 08:31 gvwstate.dat
-rw-rw---- 1 42434 42434 0 Jun 15 08:31 sst_in_progress

[root@r9-control2 _data]# ss -nlp |grep maria
tcp LISTEN 0 4096 10.33.0.12:4567 0.0.0.0:* users:(("mariadbd",pid=23440,fd=8))

----- control3
[root@r9-control3 _data]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
27da8eff47a7 quay.io/openstack.kolla/mariadb-server:2023.1-rocky-9 "dumb-init -- kolla_…" 13 minutes ago Up 13 minutes (unhealthy) mariadb
fdc36b92a1f3 quay.io/openstack.kolla/keepalived:2023.1-rocky-9 "dumb-init --single-…" 13 minutes ago Up 13 minutes keepalived
f065352bd406 quay.io/openstack.kolla/haproxy:2023.1-rocky-9 "dumb-init --single-…" 13 minutes ago Up 13 minutes (healthy) haproxy
1120e6b860bb quay.io/openstack.kolla/cron:2023.1-rocky-9 "dumb-init --single-…" 14 minutes ago Up 14 minutes cron
0519f81452bd quay.io/openstack.kolla/kolla-toolbox:2023.1-rocky-9 "dumb-init --single-…" 14 minutes ago Up 14 minutes kolla_toolbox
ae43216f7c43 quay.io/openstack.kolla/fluentd:2023.1-rocky-9 "dumb-init --single-…" 14 minutes ago Up 14 minutes fluentd

[root@r9-control3 _data]# ll /var/lib/docker/volumes/mariadb/_data/
total 241704
-rw-rw---- 1 42434 42434 16384 Jun 15 08:31 aria_log.00000001
-rw-rw---- 1 42434 42434 52 Jun 15 08:31 aria_log_control
-rw-rw---- 1 42434 42434 134219032 Jun 15 08:31 galera.cache
-rw-rw---- 1 42434 42434 0 Jun 15 08:31 grastate.dat
-rw-rw---- 1 42434 42434 264 Jun 15 08:31 gvwstate.dat
-rw-rw---- 1 42434 42434 12582912 Jun 15 08:31 ibdata1
-rw-rw---- 1 42434 42434 100663296 Jun 15 08:31 ib_logfile0
-rw-rw---- 1 42434 42434 351 Jun 15 08:31 mysql-bin.000001
-rw-rw---- 1 42434 42434 19 Jun 15 08:31 mysql-bin.index
-rw-rw---- 1 42434 42434 0 Jun 15 08:31 mysql-bin.state
srwxrwxrwx 1 42434 42434 0 Jun 15 08:31 mysql.sock
-rw------- 1 42434 42434 2621 Jun 15 08:31 wsrep_recovery.ok

[root@r9-control3 _data]# ss -nlp |grep maria
tcp LISTEN 0 4096 10.33.0.13:4568 0.0.0.0:* users:(("mariadbd",pid=23270,fd=15))
tcp LISTEN 0 4096 10.33.0.13:4567 0.0.0.0:* users:(("mariadbd",pid=23270,fd=8))

Revision history for this message
Eddie Yen (aksn74) wrote :

I can confirm that I got same issue when deploying 2023.1 with ubuntu distro.

Revision history for this message
chanyeol yoon (ycy1766) wrote :

I also had the same issue with stable/2023.01(ubuntu 22.04).
 However, it was solved using https://review.opendev.org/q/I5a12670d07077d24047aaff57ce8d33ccf7156ff (which still only exists in master branch).

Revision history for this message
Maksim Malchuk (mmalchuk) wrote :

Please wait, this would be merged asap.

Revision history for this message
Lukas Fichtner (xedon) wrote :

This issue is resolved now, thank you!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.