seeded file is missing after bootstrap process

Bug #1868326 reported by Felipe Reyes
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Percona Cluster Charm
Fix Released
High
Alex Kavanagh

Bug Description

The charm relies on /var/lib/percona-xtradb-cluster/seeded file to determine if it's possible or not to process related units requests[0][1] but when bootstrap-pxc action is executed the units that will connect to it to receive a snapshot from the donor will wipe the content of the datadir (/var/lib/percona-xtradb-cluster) which will remove the seeded file and the charm won't ever recreate it again as can be seen in my comment for the patch version 3 https://review.opendev.org/#/c/713316/3

[0] https://github.com/openstack/charm-percona-cluster/blob/master/hooks/percona_hooks.py#L692
[1] https://github.com/openstack/charm-percona-cluster/blob/master/hooks/percona_hooks.py#L808

Revision history for this message
Edward Hope-Morley (hopem) wrote :
Changed in charm-percona-cluster:
milestone: none → 20.05
assignee: nobody → Felipe Reyes (freyes)
importance: Undecided → High
David Ames (thedac)
Changed in charm-percona-cluster:
milestone: 20.05 → 20.08
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (master)

Reviewed: https://review.opendev.org/714235
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=1f56c36b9fef4d61a2b7ae8ea0f4431698242f97
Submitter: Zuul
Branch: master

commit 1f56c36b9fef4d61a2b7ae8ea0f4431698242f97
Author: Felipe Reyes <email address hidden>
Date: Fri Mar 20 19:03:14 2020 -0300

    Mark seeded on cluster-relation-changed

    When a node is bootstrapped and the other join, they will receive a copy of
    the database via SST with the help of XtraBackup, during this process the
    /var/lib/percona-xtradb-cluster will be emptied so after this the seeded
    file will be missing. Since the action bootstrap-pxc will trigger the
    cluster-relation-changed hook it's a good place to call mark_seeded().

    Change-Id: I8510bb81435a3096d6a005610fce88ff2b7ffeab
    Closes-Bug: #1868326
    Func-Test-Pr: https://github.com/openstack-charmers/zaza-openstack-tests/pull/203

Changed in charm-percona-cluster:
status: In Progress → Fix Committed
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Since 2020-05-22 our full_stack/next_series_upgrade xenial-queens CI tests [0] fail with

2020-05-26 17:51:12 DEBUG post-series-upgrade Traceback (most recent call last):
2020-05-26 17:51:12 DEBUG post-series-upgrade File "/var/lib/juju/agents/unit-mysql-2/charm/hooks/post-series-upgrade", line 1151, in <module>
2020-05-26 17:51:12 DEBUG post-series-upgrade main()
2020-05-26 17:51:12 DEBUG post-series-upgrade File "/var/lib/juju/agents/unit-mysql-2/charm/hooks/post-series-upgrade", line 1141, in main
2020-05-26 17:51:12 DEBUG post-series-upgrade hooks.execute(sys.argv)
2020-05-26 17:51:12 DEBUG post-series-upgrade File "/var/lib/juju/agents/unit-mysql-2/charm/charmhelpers/core/hookenv.py", line 943, in execute
2020-05-26 17:51:12 DEBUG post-series-upgrade self._hooks[hook_name]()
2020-05-26 17:51:12 DEBUG post-series-upgrade File "/var/lib/juju/agents/unit-mysql-2/charm/hooks/post-series-upgrade", line 467, in series_upgrade
2020-05-26 17:51:12 DEBUG post-series-upgrade resume_unit_helper(register_configs())
2020-05-26 17:51:12 DEBUG post-series-upgrade File "/var/lib/juju/agents/unit-mysql-2/charm/hooks/percona_utils.py", line 854, in resume_unit_helper
2020-05-26 17:51:12 DEBUG post-series-upgrade _pause_resume_helper(resume_unit, configs)
2020-05-26 17:51:12 DEBUG post-series-upgrade File "/var/lib/juju/agents/unit-mysql-2/charm/hooks/percona_utils.py", line 868, in _pause_resume_helper
2020-05-26 17:51:12 DEBUG post-series-upgrade ports=None)
2020-05-26 17:51:12 DEBUG post-series-upgrade File "/var/lib/juju/agents/unit-mysql-2/charm/charmhelpers/contrib/openstack/utils.py", line 1635, in resume_unit
2020-05-26 17:51:12 DEBUG post-series-upgrade raise Exception("Couldn't resume: {}".format("; ".join(messages)))
2020-05-26 17:51:12 DEBUG post-series-upgrade Exception: Couldn't resume: Unit waiting to bootstrap ('seeded' file missing)
2020-05-26 17:51:12 ERROR juju.worker.uniter.operation runhook.go:132 hook "post-series-upgrade" failed: exit status 1

I wonder if this is caused by this change. Re-opening and attaching crashdump.

0: http://osci:8080/job/mojo_runner/22606/consoleFull

Changed in charm-percona-cluster:
status: Fix Committed → New
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :
Revision history for this message
Aurelien Lourot (aurelien-lourot) wrote :

Happened again on our series-upgrades tests from xenial-queens:

http://10.245.162.58:8080/view/MojoMatrix/job/mojo_runner/22821/consoleFull

Changed in charm-percona-cluster:
status: New → Confirmed
James Page (james-page)
Changed in charm-percona-cluster:
milestone: 20.08 → none
Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

So the bug in the series-upgrade is due to the 'resume' action (which restarts the mysql service) syncing data over the directory and wiping out the seeded file. I've put in a 'rescue' for the series-upgrade to ensure that if the seeded file is there prior to the restart, then it is placed back after the restart to ensure continued operation of the charm.

Review: https://review.opendev.org/#/c/749922

Changed in charm-percona-cluster:
assignee: Felipe Reyes (freyes) → Alex Kavanagh (ajkavanagh)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/749922
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=1b813d1fd928aadb4195c8630f9f575f6b8eafeb
Submitter: Zuul
Branch: master

commit 1b813d1fd928aadb4195c8630f9f575f6b8eafeb
Author: Alex Kavanagh <email address hidden>
Date: Fri Sep 4 11:42:41 2020 +0100

    Fix series-upgrade issue where seeded file goes missing

    During resume on a non-leader unit, the 'seeded' file can go missing
    when the package syncs over to the /var/lib/mysql or
    /var/lib/percona-xtradb-cluster directories (vivid+). It's not really
    clear why it doesn't do this every time (i.e. not every non-leader unit
    fails), but this fix ensures that if the unit *is* seeded prior to the
    series-upgrade, then it stays seeded after the series upgrade.

    The related zaza-openstack-tests change [1] is about fixing the
    series-upgrade test.

    Note there is no trusty-mitaka test here as the charm doesn't support
    trusty. The last version of percona-cluster in the charm store that
    supports trusty is rev. 276

    [1]: https://github.com/openstack-charmers/zaza-openstack-tests/pull/406

    Change-Id: I628be1c24081d7e0e150e5064c5fa4ab694632e9
    Closes-bug: #1868326

Changed in charm-percona-cluster:
status: In Progress → Fix Committed
Changed in charm-percona-cluster:
milestone: none → 20.10
Changed in charm-percona-cluster:
status: Fix Committed → Fix Released
Revision history for this message
Drew Freiberger (afreiberger) wrote :

I ran into this after a reboot of a node hosting mysql. I checked that the mysql database was part of the cluster and running properly and then touched the seeded file and ran hooks/update-status to resolve.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.