Charm upgrade from rev. 248-253 to rev. 259 with min-cluster-size: 3 fails with Bootstrap PXC failed

Bug #1755507 reported by Sandor Zeestraten
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Percona Cluster Charm
Fix Released
Critical
David Ames

Bug Description

# Issue
Upgrading an existing percona-cluster charm with min-cluster-size: 3 from rev. 250 to rev. 259 results in one of the units failing with the error message "Bootstrap PXC failed".

I did a bit more digging and I managed to reproduce it at least on rev. 248-253. Rev. 254 and up seemed to work fine.

I saw there was some work done in that area of code last year in lp#1668833, but not sure if related.

# Reproduction
## bundle.yaml
applications:
  mysql:
    charm: cs:percona-cluster-250
    num_units: 3
    options:
      min-cluster-size: 3

## Steps
* `juju deploy bundle.yaml`
* Wait for cluster to settle
* `juju upgrade-charm mysql`

# Logs
Excerpt from /var/log/juju/unit-percona-cluster-0.log from an reproduction on LXD:

2018-03-13 14:48:46 DEBUG upgrade-charm active
2018-03-13 14:48:46 INFO juju-log Unit is ready
2018-03-13 14:48:49 DEBUG juju-log Hardening function 'install'
2018-03-13 14:48:49 DEBUG juju-log Hardening function 'upgrade'
2018-03-13 14:48:49 DEBUG juju-log Hardening function 'config_changed'
2018-03-13 14:48:49 DEBUG juju-log Hardening function 'update_status'
2018-03-13 14:48:50 DEBUG juju-log No hardening applied to 'config_changed'
2018-03-13 14:48:50 INFO juju-log MySQL already installed, skipping
2018-03-13 14:48:54 DEBUG juju-log Leader is NOT bootstrapped root-password: 34bc1742-26cd-11e8-b877-aa4304b6c454
2018-03-13 14:48:55 DEBUG juju-log Leader unit - bootstrap required=True
2018-03-13 14:48:58 DEBUG juju-log Writing file /etc/mysql/percona-xtradb-cluster.conf.d/mysqld.cnf root:root 444
2018-03-13 14:49:05 DEBUG config-changed Unknown operation bootstrap-pxc.
2018-03-13 14:49:15 DEBUG config-changed Job for run-r7f3ba74eb1d44473b3ea94030535d578.service failed because the control process exited with error code. See "systemctl status run-r7f3ba74eb1d44473b3ea94030535d578.service" and "journalctl -xe" for details.
2018-03-13 14:49:15 ERROR juju-log Bootstrap PXC failed: Command '['systemd-run', '--service-type=forking', 'service', 'mysql', 'bootstrap-pxc']' returned non-zero exit status 1
2018-03-13 14:49:15 DEBUG config-changed Traceback (most recent call last):
2018-03-13 14:49:15 DEBUG config-changed File "/var/lib/juju/agents/unit-percona-cluster-0/charm/hooks/config-changed", line 837, in <module>
2018-03-13 14:49:15 DEBUG config-changed main()
2018-03-13 14:49:15 DEBUG config-changed File "/var/lib/juju/agents/unit-percona-cluster-0/charm/hooks/config-changed", line 827, in main
2018-03-13 14:49:15 DEBUG config-changed hooks.execute(sys.argv)
2018-03-13 14:49:15 DEBUG config-changed File "/var/lib/juju/agents/unit-percona-cluster-0/charm/hooks/charmhelpers/core/hookenv.py", line 800, in execute
2018-03-13 14:49:15 DEBUG config-changed self._hooks[hook_name]()
2018-03-13 14:49:15 DEBUG config-changed File "/var/lib/juju/agents/unit-percona-cluster-0/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 79, in _harden_inner2
2018-03-13 14:49:15 DEBUG config-changed return f(*args, **kwargs)
2018-03-13 14:49:15 DEBUG config-changed File "/var/lib/juju/agents/unit-percona-cluster-0/charm/hooks/config-changed", line 374, in config_changed
2018-03-13 14:49:15 DEBUG config-changed bootstrap=not leader_bootstrapped)
2018-03-13 14:49:15 DEBUG config-changed File "/var/lib/juju/agents/unit-percona-cluster-0/charm/hooks/config-changed", line 237, in render_config_restart_on_changed
2018-03-13 14:49:15 DEBUG config-changed bootstrap_pxc()
2018-03-13 14:49:15 DEBUG config-changed File "/var/lib/juju/agents/unit-percona-cluster-0/charm/hooks/percona_utils.py", line 475, in bootstrap_pxc
2018-03-13 14:49:15 DEBUG config-changed raise Exception(error_msg)
2018-03-13 14:49:15 DEBUG config-changed Exception: Bootstrap PXC failed: Command '['systemd-run', '--service-type=forking', 'service', 'mysql', 'bootstrap-pxc']' returned non-zero exit status 1
2018-03-13 14:49:15 ERROR juju.worker.uniter.operation runhook.go:113 hook "config-changed" failed: exit status 1
2018-03-13 14:49:21 DEBUG juju-log Hardening function 'install'
2018-03-13 14:49:21 DEBUG juju-log Hardening function 'upgrade'
2018-03-13 14:49:21 DEBUG juju-log Hardening function 'config_changed'
2018-03-13 14:49:21 DEBUG juju-log Hardening function 'update_status'
2018-03-13 14:49:22 DEBUG juju-log No hardening applied to 'config_changed'
2018-03-13 14:49:22 INFO juju-log MySQL already installed, skipping
2018-03-13 14:49:24 DEBUG juju-log Leader is NOT bootstrapped root-password: 34bc1742-26cd-11e8-b877-aa4304b6c454
2018-03-13 14:49:25 DEBUG juju-log Leader unit - bootstrap required=True
2018-03-13 14:49:26 DEBUG config-changed Unknown operation bootstrap-pxc.

# Versions
juju 2.3.4
percona-cluster rev. 250, 259

summary: - Upgrade from rev. 250 to rev. 259 with min-cluster-size: 3 fails with
- Bootstrap PXC failed
+ Charm upgrade from rev. 250 to rev. 259 with min-cluster-size: 3 fails
+ with Bootstrap PXC failed
summary: - Charm upgrade from rev. 250 to rev. 259 with min-cluster-size: 3 fails
- with Bootstrap PXC failed
+ Charm upgrade from rev. 248-253 to rev. 259 with min-cluster-size: 3
+ fails with Bootstrap PXC failed
description: updated
description: updated
Liam Young (gnuoy)
Changed in charm-percona-cluster:
status: New → Confirmed
importance: Undecided → Critical
assignee: nobody → Liam Young (gnuoy)
milestone: none → 18.02
Ryan Beisner (1chb1n)
Changed in charm-percona-cluster:
milestone: 18.02 → 18.05
Revision history for this message
David Ames (thedac) wrote :

We think this is is_leader_bootstrapped getting in the way because root-password is not in leader settings.

upgrade-charm should stabilize expected leader settings values.

David Ames (thedac)
Changed in charm-percona-cluster:
assignee: Liam Young (gnuoy) → David Ames (thedac)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-percona-cluster (master)

Fix proposed to branch: master
Review: https://review.openstack.org/553083

Changed in charm-percona-cluster:
status: Confirmed → In Progress
Revision history for this message
David Ames (thedac) wrote :

Sandor,

https://review.openstack.org/553083 should fix things.

For thoroughness, the fix once one is already in this error state is as follows without the new charm:

On the leader node (This should be the node in error state).

# Start mysql back up
sudo systemctl start mysql

# If root-password is not set in config
# Find the existing root password
leader-get mysql.passwd

# Set 'root-password' to the same value
leader-set root-password=$MYSQLPW

# If sst-password is not set
# Either set via config or manually
# Via config
juju config percona-cluster sst-password=$NEWPASSWD
# Manually
leader-set sst-password=$NEWPASSWORD

# Finally resolve the unit
juju resolved percona-cluster/$N

If you are upgrading to the new charm update the following will work:
On the leader node (This should be the node in error state).

# Start mysql back up
sudo systemctl start mysql

# Upgrade the charm:
juju upgrade-charm percona-cluster --force-units --switch /path/to/new/charm

# Resolve the unit
juju resolved percona-cluster/$N

Revision history for this message
Sandor Zeestraten (szeestraten) wrote :

Hi David, thank you and the gang for the speedy workaround.

Just so I'm reading things right, the change will land in 18.05 and not backported to 18.02 as it has a workaround right?

Revision history for this message
Liam Young (gnuoy) wrote :

Hi Sandor, I think this is a candidate for backporting. I see 1chb1n changed the milestone to target this fix against 18.05, I'll catch up with him in a few hours and see if backporting to stable is still an option.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (master)

Reviewed: https://review.openstack.org/553083
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=5bce1985e1fdc590a96050be7f35ea9a6d5358e3
Submitter: Zuul
Branch: master

commit 5bce1985e1fdc590a96050be7f35ea9a6d5358e3
Author: David Ames <email address hidden>
Date: Wed Mar 14 21:43:11 2018 +0000

    Ensure leader settings on charm upgrade

    Currently bootstrapping is gated by is_leader_bootstrapped which
    checks a handful of leader settings. When upgrading from older
    versions of the charm, these settings are missing leading to an
    attempt to bootstrap an already bootstrapped cluster.

    This change makes sure the leader settings is_leader_bootstrapped is
    checking for are all set by the leader on upgrade-charm.

    Closes-Bug: #1755507

    Change-Id: I172f10221b9447ca3e0c5403feaa49acccfa9e42

Changed in charm-percona-cluster:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-percona-cluster (stable/18.02)

Fix proposed to branch: stable/18.02
Review: https://review.openstack.org/553285

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-percona-cluster (stable/18.02)

Reviewed: https://review.openstack.org/553285
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=85551aced023011ab347aa4f5bad5af1462f5dcb
Submitter: Zuul
Branch: stable/18.02

commit 85551aced023011ab347aa4f5bad5af1462f5dcb
Author: David Ames <email address hidden>
Date: Wed Mar 14 21:43:11 2018 +0000

    Ensure leader settings on charm upgrade

    Currently bootstrapping is gated by is_leader_bootstrapped which
    checks a handful of leader settings. When upgrading from older
    versions of the charm, these settings are missing leading to an
    attempt to bootstrap an already bootstrapped cluster.

    This change makes sure the leader settings is_leader_bootstrapped is
    checking for are all set by the leader on upgrade-charm.

    Closes-Bug: #1755507

    Change-Id: I172f10221b9447ca3e0c5403feaa49acccfa9e42
    (cherry picked from commit 5bce1985e1fdc590a96050be7f35ea9a6d5358e3)

Liam Young (gnuoy)
Changed in charm-percona-cluster:
status: Fix Committed → Fix Released
milestone: 18.05 → 18.02
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.