bootstrap_mariadb container fails to start

Bug #1815628 reported by David M Curran
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kolla
Fix Released
Undecided
Unassigned

Bug Description

stable/rocky (and master) deploy both fail when bootstrap_mariadb throw:

NFO:__main__:Setting permission for /var/lib/mysql
++ cat /run_command
+ CMD=/usr/bin/mysqld_safe
+ ARGS=
+ [[ ! -n '' ]]
+ . kolla_extend_start
++ [[ ! -d /var/log/kolla/mariadb ]]
+++ stat -c %a /var/log/kolla/mariadb
++ [[ 755 != \7\5\5 ]]
++ [[ -n 0 ]]
++ mysql_install_db
Neither host 'virt01' nor 'localhost' could be looked up with
'/usr/sbin/resolveip'
Please configure the 'hostname' command to return a correct
hostname.
If you want to solve this at a later stage, restart this script
with the --force option

The latest information about mysql_install_db is available at
https://mariadb.com/kb/en/installing-system-tables-mysql_install_db

'/usr/sbin/resolveip' does not exist but '/usr/bin/resolveip' does. This occurred within the last day.

no longer affects: ubuntu
Revision history for this message
David M Curran (currand60) wrote :

I modified the Dockerfile.j2 to copy /usr/bin/resolveip to /usr/sbin/resolveip. This allowed the first step to pass and the mariadb container to come up but now galera cannot form a cluster. I'm getting the following:

19-02-12 16:54:21 140329390958336 [Warning] WSREP: Quorum: No node with complete state:

 Version : 4
 Flags : 0x1
 Protocols : 0 / 7 / 3
 State : NON-PRIMARY
 Desync count : 0
 Prim state : NON-PRIMARY
 Prim UUID : 00000000-0000-0000-0000-000000000000
 Prim seqno : -1
 First seqno : -1
 Last seqno : -1
 Prim JOINED : 0
 State UUID : c119a12b-2f10-11e9-bbd6-3ee11300979a
 Group UUID : 00000000-0000-0000-0000-000000000000
 Name : 'virt03'
 Incoming addr: '192.168.86.246:3306'

 Version : 4
 Flags : 00
 Protocols : 0 / 7 / 3
 State : NON-PRIMARY
 Desync count : 0
 Prim state : NON-PRIMARY
 Prim UUID : 00000000-0000-0000-0000-000000000000
 Prim seqno : -1
 First seqno : -1
 Last seqno : -1
 Prim JOINED : 0
 State UUID : c119a12b-2f10-11e9-bbd6-3ee11300979a
 Group UUID : 00000000-0000-0000-0000-000000000000
 Name : 'virt02'
 Incoming addr: '192.168.86.247:3306'

 Version : 4
 Flags : 00
 Protocols : 0 / 7 / 3
 State : NON-PRIMARY
 Desync count : 0
 Prim state : NON-PRIMARY
 Prim UUID : 00000000-0000-0000-0000-000000000000
 Prim seqno : -1
 First seqno : -1
 Last seqno : 0
 Prim JOINED : 0
 State UUID : c119a12b-2f10-11e9-bbd6-3ee11300979a
 Group UUID : a6d9c3f7-2f0f-11e9-a9b6-5f810564e9ef
 Name : 'virt01'
 Incoming addr: '192.168.86.248:3306'

2019-02-12 16:54:21 140329390958336 [Warning] WSREP: No re-merged primary component found.
2019-02-12 16:54:21 140329390958336 [Warning] WSREP: No bootstrapped primary component found.
2019-02-12 16:54:21 140329390958336 [ERROR] WSREP: gcs/src/gcs_state_msg.cpp:gcs_state_msg_get_quorum():818: Failed to establish quorum.
2019-02-12 16:54:21 140329390958336 [Note] WSREP: Quorum results:

Revision history for this message
David M Curran (currand60) wrote :

Also confirmed that this occurs with the pip version and with binary-based installs.

Revision history for this message
Antony Messerli (antonym) wrote :

Can confirm this as well. It appears that the bootstrap_mariadb is getting hung from not being able to lookup the hostname which is preventing the initial bootstrap from getting ran. On the next pass, it assumes the bootstrap has completed when it actually hasn't.

Revision history for this message
David M Curran (currand60) wrote :

Modifying the Dockerfile did fix the issue. The issue I had after that was related to MTU settings. Copying 'resolvip' to the correct location should be a valid workaround. You'll have to push it to your local repo. In the end, I switched to the Centos base and it works fine.

Revision history for this message
Antony Messerli (antonym) wrote :
Revision history for this message
Mark Goddard (mgoddard) wrote :

David can you confirm whether the above commit fixes this issue for you?

Revision history for this message
David M Curran (currand60) wrote :

I moved to a different container OS to get around the issue but, I suspect it would fix the issue as my workaround was to copy resolveip into the expected directory.

Revision history for this message
Mark Goddard (mgoddard) wrote :

Thanks for following up.

Changed in kolla:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.