Deploy mariadb getting errors on secondary nodes and not joining the cluster (centos7+ipv6)

Bug #1856532 reported by yj.bai
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Radosław Piliszek
Train
Fix Released
High
Radosław Piliszek
Ussuri
Fix Released
High
Radosław Piliszek

Bug Description

FAILED - RETRYING: wait for slave mariadb (10 retries left).Result was: {
    "attempts": 1,
    "changed": false,
    "module_stderr": "Traceback (most recent call last):\n File \"<stdin>\", line 102, in <module>\n File \"<stdin>\", line 94, in _ansiballz_main\n File \"<stdin>\", line 40, in invoke_module\n File \"/usr/lib64/python2.7/runpy.py\", line 176, in run_module\n fname, loader, pkg_name)\n File \"/usr/lib64/python2.7/runpy.py\", line 82, in _run_module_code\n mod_name, mod_fname, mod_loader, pkg_name)\n File \"/usr/lib64/python2.7/runpy.py\", line 72, in _run_code\n exec code in run_globals\n File \"/tmp/ansible_wait_for_payload_5_SYE4/ansible_wait_for_payload.zip/ansible/modules/utilities/logic/wait_for.py\", line 687, in <module>\n File \"/tmp/ansible_wait_for_payload_5_SYE4/ansible_wait_for_payload.zip/ansible/modules/utilities/logic/wait_for.py\", line 615, in main\nsocket.error: [Errno 104] Connection reset by peer\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 1,
    "retries": 11
}

mariadb.log:

2019-12-16 16:12:08 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():158
2019-12-16 16:12:08 0 [ERROR] WSREP: gcs/src/gcs_core.cpp:gcs_core_open():209: Failed to open backend connection: -110 (Connection timed out)
2019-12-16 16:12:08 0 [ERROR] WSREP: gcs/src/gcs.cpp:gcs_open():1458: Failed to open channel 'openstack' at 'gcomm://[fd00:1001::101]:4567,[fd00:1001::103]:4567,[fd00:1001::105]:4567': -110 (Connection timed out)
2019-12-16 16:12:08 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2019-12-16 16:12:08 0 [ERROR] WSREP: wsrep::connect(gcomm://[fd00:1001::101]:4567,[fd00:1001::103]:4567,[fd00:1001::105]:4567) failed: 7
2019-12-16 16:12:08 0 [ERROR] Aborting

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

Per the docs: https://docs.openstack.org/kolla-ansible/train/admin/production-architecture-guide.html#address-family-configuration-ipv4-ipv6

IPv6 is not fully supported on CentOS 7, this includes MariaDB HA clustering.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

For CentOS you need to wait for CentOS 8 and Ussuri.
For Train you can use Debian or Ubuntu if you require IPv6.

tags: added: ipv6
tags: added: centos iov6
removed: ipv6
tags: added: ipv6
removed: iov6
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/699172

Changed in kolla-ansible:
assignee: nobody → yj.bai (baiyj)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/699173

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (stable/train)

Change abandoned by Yongjun Bai (bai.yongjun@99cloud.net) on branch: stable/train
Review: https://review.opendev.org/699173

Changed in kolla-ansible:
assignee: yj.bai (baiyj) → Jeffrey Zhang (jeffrey4l)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/700875

Changed in kolla-ansible:
assignee: Jeffrey Zhang (jeffrey4l) → Radosław Piliszek (yoctozepto)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kolla-ansible (master)

Change abandoned by Radosław Piliszek (<email address hidden>) on branch: master
Review: https://review.opendev.org/700875
Reason: we need both fixes

Revision history for this message
Radosław Piliszek (yoctozepto) wrote : Re: Deploy mariadb getting errors on secondary nodes and not joining the cluster

So this ended up being caused by two galera issues which have not-that-bad workarounds actually.

summary: Deploy mariadb getting errors on secondary nodes and not joining the
- cluster
+ cluster (ipv6)
summary: Deploy mariadb getting errors on secondary nodes and not joining the
- cluster (ipv6)
+ cluster (centos7+ipv6)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/701204

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/699172
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=908bffcfc2950e271fee1af24fb174fa6bee4aff
Submitter: Zuul
Branch: master

commit 908bffcfc2950e271fee1af24fb174fa6bee4aff
Author: yj.bai <bai.yongjun@99cloud.net>
Date: Mon Dec 16 17:05:09 2019 +0800

    Fix MariaDB galera IPv6 deployment on CentOS 7

    CentOS 7 uses old galera which has multiple issues handling
    IPv6 addressing.
    This patch applies two workarounds for CentOS 7.

    Co-Authored-By: Jeffrey Zhang <jeffrey.zhang@99cloud.net>
    Co-Authored-By: Radosław Piliszek <email address hidden>
    Change-Id: I7c178aba60c389e65075e0e6cbe4dfa5b8ce06ec
    Closes-Bug: #1856532
    Signed-off-by: yj.bai <bai.yongjun@99cloud.net>

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/train)

Reviewed: https://review.opendev.org/699173
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=51b361991975adde6903da53cf057e3b96f064c9
Submitter: Zuul
Branch: stable/train

commit 51b361991975adde6903da53cf057e3b96f064c9
Author: yj.bai <bai.yongjun@99cloud.net>
Date: Mon Dec 16 17:05:09 2019 +0800

    Fix MariaDB galera IPv6 deployment on CentOS 7

    CentOS 7 uses old galera which has multiple issues handling
    IPv6 addressing.
    This patch applies two workarounds for CentOS 7.

    Co-Authored-By: Jeffrey Zhang <jeffrey.zhang@99cloud.net>
    Co-Authored-By: Radosław Piliszek <email address hidden>
    Change-Id: I7c178aba60c389e65075e0e6cbe4dfa5b8ce06ec
    Closes-Bug: #1856532
    Signed-off-by: yj.bai <bai.yongjun@99cloud.net>
    (cherry picked from commit 908bffcfc2950e271fee1af24fb174fa6bee4aff)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/701204
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=8ac5ecb295bea8c437d37b1127682b31563dafe5
Submitter: Zuul
Branch: master

commit 8ac5ecb295bea8c437d37b1127682b31563dafe5
Author: Radosław Piliszek <email address hidden>
Date: Mon Jan 6 11:52:28 2020 +0100

    CentOS 7 IPv6 doc changes

    It advertises C7 as an IPv6-compatible platform.
    This is possible thanks to fixes in [1] and [2].

    [1] https://review.opendev.org/699458
    aka 7054b27dbb8bc893c50f66b492b7e14e5bc92237
    [2] https://review.opendev.org/699172
    aka 908bffcfc2950e271fee1af24fb174fa6bee4aff

    Change-Id: Ia353a1663a16f48ac83e5ee9a2cf1d6e183ac3a3
    Closes-bug: #1848444
    Closes-bug: #1848452
    Related-bug: #1856532
    Related-bug: #1856725

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kolla-ansible (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/701347

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kolla-ansible (stable/train)

Reviewed: https://review.opendev.org/701347
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=0a94a111a9e96d3c89831ab44eee36933a9b3c22
Submitter: Zuul
Branch: stable/train

commit 0a94a111a9e96d3c89831ab44eee36933a9b3c22
Author: Radosław Piliszek <email address hidden>
Date: Mon Jan 6 11:52:28 2020 +0100

    CentOS 7 IPv6 doc changes

    It advertises C7 as an IPv6-compatible platform.
    This is possible thanks to fixes in [1] and [2].

    [1] https://review.opendev.org/699458
    aka 7054b27dbb8bc893c50f66b492b7e14e5bc92237
    [2] https://review.opendev.org/699172
    aka 908bffcfc2950e271fee1af24fb174fa6bee4aff

    Change-Id: Ia353a1663a16f48ac83e5ee9a2cf1d6e183ac3a3
    Closes-bug: #1848444
    Closes-bug: #1848452
    Related-bug: #1856532
    Related-bug: #1856725
    (cherry picked from commit 8ac5ecb295bea8c437d37b1127682b31563dafe5)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 9.0.1

This issue was fixed in the openstack/kolla-ansible 9.0.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.