Issues with kolla-ansible mariadb_recovery

Bug #1834467 reported by Mark Goddard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
Fix Released
High
Mark Goddard
Rocky
Fix Released
High
Mark Goddard
Stein
Fix Released
High
Unassigned
Train
Fix Released
High
Mark Goddard

Bug Description

There are currently various issues with the kolla-ansible mariadb_recovery command.

* wsrep sequence number detection is broken. Log message format is
  'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
  the UUID rather than the sequence number. This is as good as random.

* Need to add become: true to log file removal since
  I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
  'docker cp' command which creates it.

* Shouldn't run handlers during recovery. If the config files change we
  would end up restarting the cluster twice.

* Need to wait for wsrep recovery container completion (don't detach). This
  avoids a potential race between wsrep recovery and the subsequent
  'stop_container'.

Mark Goddard (mgoddard)
Changed in kolla-ansible:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/667904

Changed in kolla-ansible:
assignee: nobody → Mark Goddard (mgoddard)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (master)

Reviewed: https://review.opendev.org/667904
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=86f373a198620a7082db8e243644fd8d53802c73
Submitter: Zuul
Branch: master

commit 86f373a198620a7082db8e243644fd8d53802c73
Author: Mark Goddard <email address hidden>
Date: Thu Jun 27 12:17:17 2019 +0100

    Fixes for MariaDB bootstrap and recovery

    * Fix wsrep sequence number detection. Log message format is
      'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
      the UUID rather than the sequence number. This is as good as random.

    * Add become: true to log file reading and removal since
      I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
      'docker cp' command which creates it.

    * Don't run handlers during recovery. If the config files change we
      would end up restarting the cluster twice.

    * Wait for wsrep recovery container completion (don't detach). This
      avoids a potential race between wsrep recovery and the subsequent
      'stop_container'.

    * Finally, we now wait for the bootstrap host to report that it is in
      an OPERATIONAL state. Without this we can see errors where the
      MariaDB cluster is not ready when used by other services.

    Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583
    Closes-Bug: #1834467

Changed in kolla-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/669701

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kolla-ansible (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/669703

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/rocky)

Reviewed: https://review.opendev.org/669701
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=25bf57fb56ab9cf414d7c020b40971e39c72ac20
Submitter: Zuul
Branch: stable/rocky

commit 25bf57fb56ab9cf414d7c020b40971e39c72ac20
Author: Mark Goddard <email address hidden>
Date: Thu Jun 27 12:17:17 2019 +0100

    Fixes for MariaDB bootstrap and recovery

    * Fix wsrep sequence number detection. Log message format is
      'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
      the UUID rather than the sequence number. This is as good as random.

    * Add become: true to log file reading and removal since
      I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
      'docker cp' command which creates it.

    * Don't run handlers during recovery. If the config files change we
      would end up restarting the cluster twice.

    * Wait for wsrep recovery container completion (don't detach). This
      avoids a potential race between wsrep recovery and the subsequent
      'stop_container'.

    * Finally, we now wait for the bootstrap host to report that it is in
      an OPERATIONAL state. Without this we can see errors where the
      MariaDB cluster is not ready when used by other services.

    Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583
    Closes-Bug: #1834467
    (cherry picked from commit 86f373a198620a7082db8e243644fd8d53802c73)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kolla-ansible (stable/queens)

Reviewed: https://review.opendev.org/669703
Committed: https://git.openstack.org/cgit/openstack/kolla-ansible/commit/?id=9ec00c24eb09bc7bc6ab533022f56837b7505253
Submitter: Zuul
Branch: stable/queens

commit 9ec00c24eb09bc7bc6ab533022f56837b7505253
Author: Mark Goddard <email address hidden>
Date: Thu Jun 27 12:17:17 2019 +0100

    Fixes for MariaDB bootstrap and recovery

    * Fix wsrep sequence number detection. Log message format is
      'WSREP: Recovered position: <UUID>:<seqno>' but we were picking out
      the UUID rather than the sequence number. This is as good as random.

    * Add become: true to log file reading and removal since
      I4a5ebcedaccb9261dbc958ec67e8077d7980e496 added become: true to the
      'docker cp' command which creates it.

    * Don't run handlers during recovery. If the config files change we
      would end up restarting the cluster twice.

    * Wait for wsrep recovery container completion (don't detach). This
      avoids a potential race between wsrep recovery and the subsequent
      'stop_container'.

    * Finally, we now wait for the bootstrap host to report that it is in
      an OPERATIONAL state. Without this we can see errors where the
      MariaDB cluster is not ready when used by other services.

    Change-Id: Iaf7862be1affab390f811fc485fd0eb6879fd583
    Closes-Bug: #1834467
    (cherry picked from commit 86f373a198620a7082db8e243644fd8d53802c73)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 6.2.2

This issue was fixed in the openstack/kolla-ansible 6.2.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 7.1.2

This issue was fixed in the openstack/kolla-ansible 7.1.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kolla-ansible 9.0.0.0rc1

This issue was fixed in the openstack/kolla-ansible 9.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.