MariaDB in HA does not come up after kolla-ansible stop
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
kolla-ansible |
Opinion
|
Undecided
|
Unassigned |
Bug Description
While trying to add new services to a kolla-ansible deployment, MariaDB containers on all controller nodes were stopped (with kolla-ansible stop) and will not come back up again in a sensible state.
OS: CentOS 7.3
Kolla-ansible ver: 4.0.0
Multinode environment with 3 controller nodes, 7 compute nodes and 3 ceph storage nodes
I was adding new services using the following procedure - I was adding Gnocchi and Aodh although I guess this is not so important:
- shelve VMs
- stop containers using kolla-ansible stop
- remove Virtual IPs from controller interfaces
- run prechecks - kolla-ansible prechecks
- run deploy - kolla-ansible deploy
The prechecks and all steps preceding worked fine.
The deployment gave problems when deploying MariaDB - see output below.
I observed on the controller nodes that the mariadb containers were there but in a constant restart cycle. Logs from the mariadb container below.
----- (Logs from container)
170821 16:12:23 [Note] WSREP: Start replication
170821 16:12:23 [Note] WSREP: Setting initial position to cb03500b-
170821 16:12:23 [Note] WSREP: protonet asio version 0
170821 16:12:23 [Note] WSREP: Using CRC-32C for message checksums.
170821 16:12:23 [Note] WSREP: backend: asio
170821 16:12:23 [Note] WSREP: gcomm thread scheduling priority set to other:0
170821 16:12:23 [Warning] WSREP: access file(/var/
170821 16:12:23 [Note] WSREP: restore pc from disk failed
170821 16:12:23 [Note] WSREP: GMCast version 0
170821 16:12:23 [Note] WSREP: (c0ce933d, 'tcp://
170821 16:12:23 [Note] WSREP: (c0ce933d, 'tcp://
170821 16:12:23 [Note] WSREP: EVS version 0
170821 16:12:23 [Note] WSREP: gcomm: connecting to group 'openstack', peer '192.168.
170821 16:12:23 [Note] WSREP: (c0ce933d, 'tcp://
170821 16:12:23 [Note] WSREP: (c0ce933d, 'tcp://
170821 16:12:23 [Note] WSREP: declaring c0990df6 at tcp://192.
170821 16:12:23 [Warning] WSREP: no nodes coming from prim view, prim not possible
170821 16:12:23 [Note] WSREP: view(view_
c0990df6,0
c0ce933d,0
} joined {
} left {
} partitioned {
})
170821 16:12:27 [Note] WSREP: (c0ce933d, 'tcp://
170821 16:12:28 [Note] WSREP: (c0ce933d, 'tcp://
170821 16:12:28 [Note] WSREP: (c0ce933d, 'tcp://
170821 16:12:28 [Note] WSREP: declaring c0990df6 at tcp://192.
170821 16:12:28 [Note] WSREP: declaring c3a45fd0 at tcp://192.
170821 16:12:28 [Warning] WSREP: no nodes coming from prim view, prim not possible
170821 16:12:28 [Note] WSREP: view(view_
c0990df6,0
c0ce933d,0
c3a45fd0,0
} joined {
} left {
} partitioned {
})
170821 16:12:31 [Note] WSREP: (c0ce933d, 'tcp://
170821 16:12:54 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
at gcomm/src/
170821 16:12:54 [ERROR] WSREP: gcs/src/
170821 16:12:54 [ERROR] WSREP: gcs/src/
170821 16:12:54 [ERROR] WSREP: gcs connect failed: Connection timed out
170821 16:12:54 [ERROR] WSREP: wsrep::
170821 16:12:54 [ERROR] Aborting
170821 16:12:54 [Note] WSREP: Service disconnected.
170821 16:12:55 [Note] WSREP: Some threads may fail to exit.
170821 16:12:55 [Note] /usr/sbin/mysqld: Shutdown complete
----- (output from kolla-ansible deploy - just the mariadb section)
PLAY [Apply role mariadb] *******
TASK [setup] *******
ok: [ned-controller-2]
ok: [ned-controller-1]
ok: [ned-controller-3]
TASK [common : include] *******
skipping: [ned-controller-1]
skipping: [ned-controller-2]
skipping: [ned-controller-3]
TASK [common : Registering common role has run] *******
skipping: [ned-controller-2]
skipping: [ned-controller-1]
skipping: [ned-controller-3]
TASK [mariadb : include] *******
included: /root/kolla-
TASK [mariadb : include] *******
included: /root/kolla-
TASK [mariadb : Ensuring config directories exist] *******
ok: [ned-controller-1] => (item=mariadb)
ok: [ned-controller-3] => (item=mariadb)
ok: [ned-controller-2] => (item=mariadb)
TASK [mariadb : Copying over config.json files for services] *******************
ok: [ned-controller-1] => (item=mariadb)
ok: [ned-controller-3] => (item=mariadb)
ok: [ned-controller-2] => (item=mariadb)
TASK [mariadb : Copying over galera.cnf] *******
ok: [ned-controller-2] => (item=mariadb)
ok: [ned-controller-3] => (item=mariadb)
ok: [ned-controller-1] => (item=mariadb)
TASK [mariadb : Copying over wsrep-notify.sh] *******
ok: [ned-controller-2] => (item=mariadb)
ok: [ned-controller-3] => (item=mariadb)
ok: [ned-controller-1] => (item=mariadb)
TASK [mariadb : include] *******
included: /root/kolla-
TASK [mariadb : include] *******
included: /root/kolla-
TASK [mariadb : Cleaning up temp file on localhost] *******
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be
disabled by setting deprecation_
ok: [ned-controller-1 -> localhost]
TASK [mariadb : Creating temp file on localhost] *******
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be
disabled by setting deprecation_
ok: [ned-controller-1 -> localhost]
TASK [mariadb : Creating mariadb volume] *******
ok: [ned-controller-1]
ok: [ned-controller-2]
ok: [ned-controller-3]
TASK [mariadb : Writing hostname of host with existing cluster files to temp file] ***
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be
disabled by setting deprecation_
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be
disabled by setting deprecation_
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be
disabled by setting deprecation_
ok: [ned-controller-1 -> localhost]
ok: [ned-controller-2 -> localhost]
ok: [ned-controller-3 -> localhost]
TASK [mariadb : Registering host from temp file] *******
ok: [ned-controller-1]
ok: [ned-controller-2]
ok: [ned-controller-3]
TASK [mariadb : Cleaning up temp file on localhost] *******
[DEPRECATION WARNING]: always_run is deprecated. Use check_mode = no instead..
This feature will be removed in version 2.4. Deprecation warnings can be
disabled by setting deprecation_
ok: [ned-controller-1 -> localhost]
TASK [mariadb : include] *******
skipping: [ned-controller-1]
skipping: [ned-controller-2]
skipping: [ned-controller-3]
TASK [mariadb : include] *******
skipping: [ned-controller-1]
skipping: [ned-controller-2]
skipping: [ned-controller-3]
TASK [mariadb : include] *******
included: /root/kolla-
TASK [mariadb : Starting mariadb container] *******
ok: [ned-controller-3]
ok: [ned-controller-1]
ok: [ned-controller-2]
TASK [mariadb : Waiting for MariaDB service to be ready] *******
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (10 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (10 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (10 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (9 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (9 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (9 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (8 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (8 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (8 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (7 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (7 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (7 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (6 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (6 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (6 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (5 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (5 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (5 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (4 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (4 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (4 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (3 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (3 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (3 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (2 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (2 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (2 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (1 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (1 retries left).
FAILED - RETRYING: TASK: mariadb : Waiting for MariaDB service to be ready (1 retries left).
fatal: [ned-controller-1]: FAILED! => {"attempts": 10, "changed": false, "failed": true, "module_stderr": "Shared connection to ned-controller-1 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/
fatal: [ned-controller-2]: FAILED! => {"attempts": 10, "changed": false, "failed": true, "module_stderr": "Shared connection to ned-controller-2 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/
fatal: [ned-controller-3]: FAILED! => {"attempts": 10, "changed": false, "failed": true, "module_stderr": "Shared connection to ned-controller-3 closed.\r\n", "module_stdout": "Traceback (most recent call last):\r\n File \"/tmp/
to retry, use: --limit @/root/
I've noticed the same issue using kolla-ansible for Pike release. docker/ volumes/ mariadb/ _data/gvwstate. dat is deleted.
From my investigations when mariadb container is stopped the following file /var/lib/
As an workaround I've created a copy of the gvwstate.dat file before the container was stoped.
Once the mariadb containers were started again I used the back-up file for gvwstate.dat to recover mariadb cluster