Will try to give more details hopefully picturing setup and symptoms better.
ENV:
multinode (tried with 4 nodes 2 of which controllers and tried with 5 nodes 2
of
which controllers)
kolla_base_distro: "centos"
kolla_install_type: "source"
openstack_release: "ussuri"
pip3 freeze | grep kolla
kolla==10.1.0
kolla-ansible==10.1.0
ISSUE AND SYMPTOMS:
Description: kolla-ansible fails on mariadb port liveness verification not
bringing up db cluster properly. As i tried multiple times (clean way -
destroying everything and restarting docker before I start over) - I see that
from time to time I get different errors, though on same step - handler mariadb
port liveness.
1) The one I explained in previous post with python "[Errno 104] Connection
reset by peer\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact
error", "rc": 1}" -> where investigation shows that one node has its mariadb
container up and running but the other doesn't even have it in pulled images,
so not a surprised cluster quorum fails. What is not clear why second
controller node doesn't have container pulled and started.
2) Is again on mariadb port liveness verification but with different exit code
and overall situation, it appears that none of controller nodes have images
pulled hence kolla-ansible is not able to start the cluster due to this and
giving
"2020-10-20 08:11:52,517 p=151140 u=lineng n=ansible | fatal: [osce3]: FAILED!
=> {"changed": false, "msg": "MariaDB cluster exists but is stopped. Please
start it using kolla-ansible
mariadb_recovery"}
2020-10-20 08:11:52,551 p=151140 u=lineng n=ansible | fatal: [osce4]: FAILED!
=>
{"changed": false, "msg": "MariaDB cluster exists but is stopped. Please start
it using kolla-ansible
mariadb_recovery"}"
- The continuation of with "kolla-ansible -i multinode mariadb_recovery" gives
similar results
"TASK [mariadb : Stop MariaDB containers]
********************************************************************************
*************************************************************
fatal: [osce3]: FAILED! => {"changed": false, "msg": "No such container:
mariadb
to stop"}"
I also attach below the historygram of ansible log covering mariadb logic
execution for issue #2 that highlights the strange abscene of kolla_docker:
pull action for mariadb role.
Hi Mark,
Thanks for replying and trying to help!
Will try to give more details hopefully picturing setup and symptoms better.
ENV: =10.1.0
multinode (tried with 4 nodes 2 of which controllers and tried with 5 nodes 2
of
which controllers)
kolla_base_distro: "centos"
kolla_install_type: "source"
openstack_release: "ussuri"
pip3 freeze | grep kolla
kolla==10.1.0
kolla-ansible=
ISSUE AND SYMPTOMS:
Description: kolla-ansible fails on mariadb port liveness verification not
bringing up db cluster properly. As i tried multiple times (clean way -
destroying everything and restarting docker before I start over) - I see that
from time to time I get different errors, though on same step - handler mariadb
port liveness.
1) The one I explained in previous post with python "[Errno 104] Connection
reset by peer\r\n", "msg": "MODULE FAILURE\nSee stdout/stderr for the exact
error", "rc": 1}" -> where investigation shows that one node has its mariadb
container up and running but the other doesn't even have it in pulled images,
so not a surprised cluster quorum fails. What is not clear why second
controller node doesn't have container pulled and started.
2) Is again on mariadb port liveness verification but with different exit code recovery" }"
and overall situation, it appears that none of controller nodes have images
pulled hence kolla-ansible is not able to start the cluster due to this and
giving
"2020-10-20 08:11:52,517 p=151140 u=lineng n=ansible | fatal: [osce3]: FAILED!
=> {"changed": false, "msg": "MariaDB cluster exists but is stopped. Please
start it using kolla-ansible
mariadb_recovery"}
2020-10-20 08:11:52,551 p=151140 u=lineng n=ansible | fatal: [osce4]: FAILED!
=>
{"changed": false, "msg": "MariaDB cluster exists but is stopped. Please start
it using kolla-ansible
mariadb_
- The continuation of with "kolla-ansible -i multinode mariadb_recovery" gives
similar results
"TASK [mariadb : Stop MariaDB containers] ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* *** ******* ******* ******* ******* ******* ******* ******* *****
*******
*******
fatal: [osce3]: FAILED! => {"changed": false, "msg": "No such container:
mariadb
to stop"}"
Docker inspect shows on both nodes:
"CreatedAt" : "2020-10- 20T08:11: 42+03:00" ,
"Mountpoint" : "/var/lib/ docker/ volumes/ mariadb/ _data",
docker inspect mariadb
[
{
"Driver": "local",
"Labels": null,
"Name": "mariadb",
"Options": null,
"Scope": "local"
}
]
I also attach below the historygram of ansible log covering mariadb logic
execution for issue #2 that highlights the strange abscene of kolla_docker:
pull action for mariadb role.
ANSIBLE_LOG:
-FOLDERS and CONFIGS only
2020-10-20 08:11:25,033 p=151140 u=lineng n=ansible | TASK [mariadb :Ensuring config directories exist] ******* ******* ******* ******* * ******* ******* ****** ******* ******* ******* ******* **** ******* ******* ******* ****** ******* ******* ****
2020-10-20 08:11:26,755 p=151140 u=lineng n=ansible | TASK [mariadb : Ensuring database backup config directory exists] **************
2020-10-20 08:11:26,941 p=151140 u=lineng n=ansible | TASK [mariadb : Copying over my.cnf for mariabackup] *******
2020-10-20 08:11:27,128 p=151140 u=lineng n=ansible | TASK [mariadb : Copying over config.json files for services] *******************
2020-10-20 08:11:29,754 p=151140 u=lineng n=ansible | TASK [mariadb : Copying over config.json files for mariabackup] ****************
2020-10-20 08:11:29,941 p=151140 u=lineng n=ansible | TASK [mariadb : Copying over galera.cnf] *******
2020-10-20 08:11:32,342 p=151140 u=lineng n=ansible | TASK [mariadb : Copying over wsrep-notify.sh] *******
2020-10-20 08:11:35,117 p=151140 u=lineng n=ansible | TASK [mariadb : Copying over xinetd clustercheck.conf] *******
-Check containers afterwards but this still doesn't do any pulling logic based on src code
2020-10-20 08:32:19,005 p=161054 u=lineng n=ansible | TASK [Check mariadb containers] centos- source- mariadb: ussuri' , 'volumes': ['/etc/ kolla/mariadb/ :/var/lib/ kolla/config_ files/: ro', '/etc/localtime :/etc/localtime :ro', '', 'mariadb: /var/lib/ mysql', 'kolla_lo log/kolla/ '], 'dimensions': {}, 'haproxy': {'mariadb': {'enabled': True, 'mode': 'tcp', 'port': '3306', 'listen_port': '3306', 'frontend_ tcp_extra' : ['option clitcpka', 'time tcp_extra' : ['option srvtcpka', 'timeout server 3600s', 'option httpchk'], 'custom_ member_ list': ['server osce3 192.168.1.6:3306 check port 4569 inter 2 external_ lb': {'enabled': False, 'mode': 'tcp', 'port': '3306', ' tcp_extra' : ['option clitcpka', 'timeout client 3600s'], 'backend_ tcp_extra' : ['option srvtcpka', 'timeout server 3600s'], 'custom_ member_ list': ['ser centos- source- mariadb: ussuri' , 'volumes': ['/etc/ kolla/mariadb/ :/var/lib/ kolla/config_ files/: ro', '/etc/localtime :/etc/localtime :ro', '', 'mariadb: /var/lib/ mysql', 'kolla_lo log/kolla/ '], 'dimensions': {}, 'haproxy': {'mariadb': {'enabled': True, 'mode': 'tcp', 'port': '3306', 'listen_port': '3306', 'frontend_ tcp_extra' : ['option clitcpka', 'time tcp_extra' : ['option srvtcpka', 'timeout server 3600s', 'option httpchk'], 'custom_ member_ list': ['server osce3 192.168.1.6:3306 check port 4569 inter 2 external_ lb': {'enabled': False, 'mode': 'tcp', 'port': '3306', ' tcp_extra' : ['option clitcpka', 'timeout client 3600s'], 'backend_ tcp_extra' : ['option srvtcpka', 'timeout server 3600s'], 'custom_ member_ list': ['ser clustercheck' , 'value': {'container_name': 'mariadb_ clustercheck' , 'group': 'mariadb' centos- source- mariadb- clustercheck: ussuri' , 'volumes': ['/etc/ kolla/mariadb- clustercheck/ :/var/lib/ kolla/config_ files/: ro', '/etc/localtime :/etc/lo logs:/var/ log/kolla/ '], 'dimensions': {}, 'environment': {'MYSQL_USERNAME': 'haproxy', 'MYSQL_PASSWORD': '', 'MYSQL_HOST': '192.168.1.6', 'AVAILABLE_ WHEN_DONO
2020-10-20 08:11:38,589 p=151140 u=lineng n=ansible | changed: [osce3] => (item={'key': 'mariadb', 'value': {'container_name': 'mariadb', 'group': 'mariadb', 'enabled': True, 'image'
: 'kolla/
gs:/var/
out client 3600s'], 'backend_
000 rise 2 fall 5', 'server osce4 192.168.1.7:3306 check port 4569 inter 2000 rise 2 fall 5 backup', '']}, 'mariadb_
listen_port': '3306', 'frontend_
ver osce3 osce3:3306 check port 4569 inter 2000 rise 2 fall 5', 'server osce4 osce4:3306 check port 4569 inter 2000 rise 2 fall 5 backup', '']}}}})
2020-10-20 08:11:38,759 p=151140 u=lineng n=ansible | changed: [osce4] => (item={'key': 'mariadb', 'value': {'container_name': 'mariadb', 'group': 'mariadb', 'enabled': True, 'image'
: 'kolla/
gs:/var/
out client 3600s'], 'backend_
000 rise 2 fall 5', 'server osce4 192.168.1.7:3306 check port 4569 inter 2000 rise 2 fall 5 backup', '']}, 'mariadb_
listen_port': '3306', 'frontend_
ver osce3 osce3:3306 check port 4569 inter 2000 rise 2 fall 5', 'server osce4 osce4:3306 check port 4569 inter 2000 rise 2 fall 5 backup', '']}}}})
2020-10-20 08:11:39,205 p=151140 u=lineng n=ansible | changed: [osce3] => (item={'key': 'mariadb-
, 'enabled': True, 'image': 'kolla/
caltime:ro', '', 'kolla_
R': '1'}}})
- Finishing up with mode docker tasks for volumes
2020-10-20 08:11:40,187 p=151140 u=lineng n=ansible | TASK [mariadb : Create MariaDB volume] ******* ******* ******* ******* ******* ******
2020-10-20 08:11:40,988 p=151140 u=lineng n=ansible | TASK [mariadb : Divide hosts by their MariaDB volume availability] *************
2020-10-20 08:11:41,181 p=151140 u=lineng n=ansible | TASK [mariadb : Establish whether the cluster has already existed] *************
- And service port liveliness
2020-10-20 08:11:41,385 p=151140 u=lineng n=ansible | TASK [mariadb : Check MariaDB service port liveness] ******* ******* ******* ****** ******* ******* ***** ******* ******* **** clustercheck container] ******* ******* ******* ** ******* ******* ******* ******* ******* ******* ******* ******* ******
2020-10-20 08:11:51,989 p=151140 u=lineng n=ansible | fatal: [osce3]: FAILED! => {"changed": false, "elapsed": 10, "msg": "Timeout when waiting for search string MariaDB in 192.168.1
.6:3306"}
2020-10-20 08:11:51,989 p=151140 u=lineng n=ansible | ...ignoring
2020-10-20 08:11:52,145 p=151140 u=lineng n=ansible | fatal: [osce4]: FAILED! => {"changed": false, "elapsed": 10, "msg": "Timeout when waiting for search string MariaDB in 192.168.1
.7:3306"}
2020-10-20 08:11:52,146 p=151140 u=lineng n=ansible | ...ignoring
2020-10-20 08:11:52,225 p=151140 u=lineng n=ansible | TASK [mariadb : Divide hosts by their MariaDB service port liveness] ***********
2020-10-20 08:11:52,319 p=151140 u=lineng n=ansible | changed: [osce3]
2020-10-20 08:11:52,344 p=151140 u=lineng n=ansible | changed: [osce4]
2020-10-20 08:11:52,424 p=151140 u=lineng n=ansible | TASK [mariadb : Fail on existing but stopped cluster] *******
2020-10-20 08:11:52,517 p=151140 u=lineng n=ansible | fatal: [osce3]: FAILED! => {"changed": false, "msg": "MariaDB cluster exists but is stopped. Please start it using kolla-ansible
mariadb_recovery"}
2020-10-20 08:11:52,551 p=151140 u=lineng n=ansible | fatal: [osce4]: FAILED! => {"changed": false, "msg": "MariaDB cluster exists but is stopped. Please start it using kolla-ansible
mariadb_recovery"}
2020-10-20 08:11:52,552 p=151140 u=lineng n=ansible | RUNNING HANDLER [mariadb : Restart MariaDB on existing cluster members] ********
2020-10-20 08:11:52,552 p=151140 u=lineng n=ansible | RUNNING HANDLER [mariadb : Start MariaDB on new nodes] *******
2020-10-20 08:11:52,553 p=151140 u=lineng n=ansible | RUNNING HANDLER [Restart mariadb-
2020-10-20 08:11:52,554 p=151140 u=lineng n=ansible | PLAY RECAP *******
Thanks and let me know if I should attach inventory, globals and all.yml for complete picture?
Cheers