Ubuntu
percona-xtradb-cluster-5.6 package

juju agent upgrade causes mysqld to stop (part of same systemd cgroup)

Bug #1664025 reported by Nobuto Murata on 2017-02-12

This bug affects 3 people

	Status	Importance	Assigned to	Milestone
Canonical Juju	Invalid	Undecided	Unassigned
OpenStack Percona Cluster Charm	Fix Released	Critical	James Page	OpenStack Percona Cluster Charm 17.05
percona-cluster (Juju Charms Collection)	Invalid	Critical	Unassigned
percona-xtradb-cluster-5.6 (Ubuntu)	Confirmed	Undecided	Unassigned

Bug Description

Juju agent upgrade causes whole openstack down which is critical.

How to reproduce:
$ juju bootstrap --config agent-version=2.0.2

$ juju deploy ./bundle.yaml

$ juju run --unit mysql/0 'pgrep -af mysqld'
14743 /bin/sh /usr/bin/mysqld_safe --wsrep-new-cluster
15242 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/percona-xtradb-cluster --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/libgalera_smm.so --wsrep-new-cluster --log-error=/var/log/mysql/error.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/run/mysqld/mysqld.sock --port=3306 --wsrep_start_position=00000000-0000-0000-0000-000000000000:-1

(controller model)
$ juju upgrade-juju -m controller --agent-version 2.0.3
(openstack model)
$ juju upgrade-juju --agent-version 2.0.3

$ juju run --unit mysql/0 'pgrep -af mysqld'
-> empty (no mysqld is running)

$ juju status mysql
Model Controller Cloud/Region Version
default localhost-localhost localhost/localhost 2.0.3

App Version Status Scale Charm Store Rev OS Notes
mysql 5.6.21-25.8 error 1 percona-cluster jujucharms 241 ubuntu

Unit Workload Agent Machine Public address Ports Message
mysql/0* error idle 2 10.0.8.104 hook failed: "config-changed"

Machine State DNS Inst id Series AZ2 started 10.0.8.104 juju-50b253-2 xenial
...

See original description

Revision history for this message

Nobuto Murata (nobuto) wrote on 2017-02-12:

juju-debug-log_controller.log.gz Edit (43.5 KiB, application/octet-stream)

description:

updated

Revision history for this message

Nobuto Murata (nobuto) wrote on 2017-02-12:

juju-debug-log_openstack_model.log.gz Edit (282.5 KiB, application/octet-stream)

Revision history for this message

Nobuto Murata (nobuto) wrote on 2017-02-12:

bundle.yaml Edit (2.7 KiB, text/plain)

Revision history for this message

Nobuto Murata (nobuto) wrote on 2017-02-12:

var_log_mysql_error.log Edit (50.2 KiB, text/plain)

Revision history for this message

Nobuto Murata (nobuto) wrote on 2017-02-12:

unit-mysql-0.log.gz Edit (15.3 KiB, application/octet-stream)

Somehow mysqld "Normal shutdown" kicked twice around agent upgrading.

00:37 is localtime while 15:37 is UTC.

[/var/log/mysql/error.log]
...
2017-02-12 15:37:47 15242 [Note] /usr/sbin/mysqld: Normal shutdown
...
2017-02-12 15:37:52 15242 [Note] /usr/sbin/mysqld: Shutdown complete
...
2017-02-12 15:38:52 20201 [Note] /usr/sbin/mysqld: Normal shutdown
...
2017-02-12 15:38:56 20201 [Note] /usr/sbin/mysqld: Shutdown complete

[journalctl -u mysql]
Feb 12 15:37:47 juju-50b253-2 systemd[1]: Stopping LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon...
Feb 12 15:37:47 juju-50b253-2 mysql[19583]: * Stopping MySQL (Percona XtraDB Cluster) mysqld
Feb 12 15:37:52 juju-50b253-2 /etc/init.d/mysql[19644]: MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid
Feb 12 15:37:52 juju-50b253-2 /etc/init.d/mysql[19648]: MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid
Feb 12 15:37:52 juju-50b253-2 mysql[19583]: ...done.
Feb 12 15:37:52 juju-50b253-2 systemd[1]: Stopped LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.
Feb 12 15:38:03 juju-50b253-2 systemd[1]: Starting LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon...
Feb 12 15:38:04 juju-50b253-2 mysql[20252]: * Starting MySQL (Percona XtraDB Cluster) database server mysqld
Feb 12 15:38:04 juju-50b253-2 mysql[20252]: ...done.
Feb 12 15:38:04 juju-50b253-2 systemd[1]: Started LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.

[unit-mysql-0.log]
unit-mysql-0: 00:37:31 INFO juju.worker.upgrader desired tool version: 2.0.2
...
unit-mysql-0: 00:37:44 DEBUG unit.mysql/0.juju-log Leader unit - bootstrap required=True
unit-mysql-0: 00:37:47 INFO unit.mysql/0.juju-log Writing file /etc/mysql/percona-xtradb-cluster.conf.d/mysqld.cnf root:root 444
unit-mysql-0: 00:37:52 INFO unit.mysql/0.config-changed Unknown operation bootstrap-pxc.
unit-mysql-0: 00:37:52 INFO unit.mysql/0.config-changed * Bootstrapping Percona XtraDB Cluster database server mysqld
unit-mysql-0: 00:38:00 INFO juju.worker.leadership mysql/0 will renew mysql leadership at 2017-02-12 15:38:30.142709216 +0000 UTC
unit-mysql-0: 00:38:03 INFO unit.mysql/0.config-changed ...done.
unit-mysql-0: 00:38:04 DEBUG unit.mysql/0.juju-log Bootstrap PXC Succeeded
...

unit-mysql-0: 00:38:51 INFO juju.worker.upgrader upgrade requested from 2.0.2 to 2.0.3

Somehow mysqld "Normal shutdown" kicked twice around agent upgrading.

00:37 is localtime while 15:37 is UTC.

[journalctl -u mysql]
Feb 12 15:37:47 juju-50b253-2 systemd[1]: Stopping LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon...
Feb 12 15:37:47 juju-50b253-2 mysql[19583]:  * Stopping MySQL (Percona XtraDB Cluster) mysqld
Feb 12 15:37:52 juju-50b253-2 /etc/init.d/mysql[19644]: MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid
Feb 12 15:37:52 juju-50b253-2 /etc/init.d/mysql[19648]: MySQL PID not found, pid_file detected/guessed: /var/run/mysqld/mysqld.pid
Feb 12 15:37:52 juju-50b253-2 mysql[19583]:    ...done.
Feb 12 15:37:52 juju-50b253-2 systemd[1]: Stopped LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.
Feb 12 15:38:03 juju-50b253-2 systemd[1]: Starting LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon...
Feb 12 15:38:04 juju-50b253-2 mysql[20252]:  * Starting MySQL (Percona XtraDB Cluster) database server mysqld
Feb 12 15:38:04 juju-50b253-2 mysql[20252]:    ...done.
Feb 12 15:38:04 juju-50b253-2 systemd[1]: Started LSB: Start and stop the mysql (Percona XtraDB Cluster) daemon.

[unit-mysql-0.log]
unit-mysql-0: 00:37:31 INFO juju.worker.upgrader desired tool version: 2.0.2
...
unit-mysql-0: 00:37:44 DEBUG unit.mysql/0.juju-log Leader unit - bootstrap required=True
unit-mysql-0: 00:37:47 INFO unit.mysql/0.juju-log Writing file /etc/mysql/percona-xtradb-cluster.conf.d/mysqld.cnf root:root 444
unit-mysql-0: 00:37:52 INFO unit.mysql/0.config-changed Unknown operation bootstrap-pxc.
unit-mysql-0: 00:37:52 INFO unit.mysql/0.config-changed  * Bootstrapping Percona XtraDB Cluster database server mysqld
unit-mysql-0: 00:38:00 INFO juju.worker.leadership mysql/0 will renew mysql leadership at 2017-02-12 15:38:30.142709216 +0000 UTC
unit-mysql-0: 00:38:03 INFO unit.mysql/0.config-changed    ...done.
unit-mysql-0: 00:38:04 DEBUG unit.mysql/0.juju-log Bootstrap PXC Succeeded
...

unit-mysql-0: 00:38:51 INFO juju.worker.upgrader upgrade requested from 2.0.2 to 2.0.3

Revision history for this message

Nobuto Murata (nobuto) wrote on 2017-02-12:

journalctl.log.gz Edit (24.4 KiB, application/octet-stream)

Revision history for this message

Nobuto Murata (nobuto) wrote on 2017-02-12:

Ok, this issue is simply reproducible with a single percona-cluster unit. OpenStack deployment is not necessary to reproduce.

description:	updated
description:	updated

Revision history for this message

Nobuto Murata (nobuto) wrote on 2017-02-12:

Well, I reproduced it with a single percona-cluster once, but not in the second time. It might be related to some race conditions, so OpenStack deployment with more relation may be necessary to reproduce.

Revision history for this message

Nobuto Murata (nobuto) wrote on 2017-02-13:

Looks like juju unit agent is tied with mysqld process somehow. That's why agent upgrade (agent stop/start) causes mysql clean shutdown.

$ sudo systemctl status jujud-unit-mysql-0
● jujud-unit-mysql-0.service - juju unit agent for mysql/0
   Loaded: loaded (/var/lib/juju/init/jujud-unit-mysql-0/jujud-unit-mysql-0.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-02-13 09:57:30 UTC; 1min 47s ago
Main PID: 16588 (bash)
    Tasks: 40
   Memory: 1.2G
      CPU: 4.353s
   CGroup: /system.slice/jujud-unit-mysql-0.service
           ├─16588 bash /var/lib/juju/init/jujud-unit-mysql-0/exec-start.sh
           ├─16592 /var/lib/juju/tools/unit-mysql-0/jujud unit --data-dir /var/lib/juju --unit-name mysql/0 --debug
           ├─16918 /bin/sh /usr/bin/mysqld_safe --wsrep-new-cluster
           └─17400 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/percona-xtradb-cluster --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/libgalera_smm.so --wsrep

Feb 13 09:57:30 juju-cb5bdd-0 systemd[1]: Started juju unit agent for mysql/0.

$ sudo systemctl stop jujud-unit-mysql-0.service
$ pgrep -af mysqld
-> empty (no mysqld is running)

Revision history for this message

James Page (james-page) wrote on 2017-02-13:

#10

Erm that does not look right to me (the fact that the mysql processes are part of the cgroup for the juju unit, resulting in them being terminated by systemd).

Revision history for this message

James Page (james-page) wrote on 2017-02-13:

#11

Confirmed:

$ sudo systemctl status jujud-unit-percona-cluster-0.service
● jujud-unit-percona-cluster-0.service - juju unit agent for percona-cluster/0
   Loaded: loaded (/var/lib/juju/init/jujud-unit-percona-cluster-0/jujud-unit-percona-cluster-0.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-02-13 10:44:10 UTC; 6min ago
Main PID: 17665 (bash)
    Tasks: 39
   Memory: 505.5M
      CPU: 1min 6.129s
   CGroup: /system.slice/jujud-unit-percona-cluster-0.service
           ├─17665 bash /var/lib/juju/init/jujud-unit-percona-cluster-0/exec-start.sh
           ├─17671 /var/lib/juju/tools/unit-percona-cluster-0/jujud unit --data-dir /var/lib/juju --unit-name percona-cluster/0 --debug
           ├─28429 /bin/sh /usr/bin/mysqld_safe --wsrep-new-cluster
           └─28927 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/percona-xtradb-cluster --plugin-dir=/usr/lib/mysql/plugin --user=mysql --wsrep-provider=/usr/lib/libgalera_smm.so --wsrep-new-cluster -

Revision history for this message

James Page (james-page) wrote on 2017-02-13:

#12

(with juju 2.1 beta5)

Changed in percona-cluster (Juju Charms Collection):
status:	New → Confirmed

Revision history for this message

James Page (james-page) wrote on 2017-02-13:

#13

I think this maybe todo with the way that we have to bootstrap the PXC cluster.

Revision history for this message

James Page (james-page) wrote on 2017-02-13:

#14

The charm has to init the local instance using:

service mysql bootstrap-pxc

It would appear that this ends up being tracked under the cgroup for the unit daemon, resulting in this problem when a restart of the unit daemon occurs; as this can happen across all unit daemons in the pxc cluster during a juju upgrade-juju operation, this is probably the root cause.

However, I would expect the follower units to have been started normally, and not be part of the unit cgroup - so only the lead unit should see this effect.

summary:

- juju agent upgrade causes mysqld to stop
+ juju agent upgrade causes mysqld to stop (part of same systemd cgroup)

Revision history for this message

James Page (james-page) wrote on 2017-02-13:

#15

FWIW we might expect the service command to ensure that anything started under it is not part of the systemd cgroup for the calling daemon.

Changed in percona-cluster (Juju Charms Collection):
importance:	Undecided → Critical

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-02-14:

#16

I am marking this bug as Incomplete for Juju while James investigates further.

Thank you for your advice and patience!

Changed in juju:
status:	New → Incomplete

Revision history for this message

Nobuto Murata (nobuto) wrote on 2017-02-14:

#17

> However, I would expect the follower units to have been started normally, and not be part of the unit cgroup - so only the lead unit should see this effect.

Right. A quick workaround for this issue is rebooting the leader unit after cluster gets up and running with multiple units so mysqld will be spawned outside of the process tree of juju unit daemon.

I saw this issue at a customer site as one of three mysqld gets down suddenly on agent upgrade.

Revision history for this message

Sandor Zeestraten (szeestraten) wrote on 2017-02-23:

#18

Hit the same issue when upgrading Juju from 2.0.3 to 2.1.0.1 with the percona-cluster charm rev 247.

James Page (james-page) on 2017-02-23

Changed in charm-percona-cluster:
importance:	Undecided → Critical
status:	New → Confirmed
Changed in percona-cluster (Juju Charms Collection):
status:	Confirmed → Invalid

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-02-24:

#19

@Sandor Zeestraten (szeestraten),
And did workaround mentioned in comment # 17 - rebooting the leader unit after cluster gets up and running - help?

Revision history for this message

Sandor Zeestraten (szeestraten) wrote on 2017-02-24:

#20

@anastasia-macmood

Yes, restarting the leader unit seemed to work.

Revision history for this message

James Page (james-page) wrote on 2017-02-24:

#21

You can also just do a stop/start of the mysql service; this issue is that the lead unit is bootstrapped using the service command, not systemctl:

service mysql bootstrap-oxc

as a result the mysqld process ends up in the wrong cgroup; restarting it using systemctl will correct this.

Revision history for this message

James Page (james-page) wrote on 2017-02-24:

#22

Really the bootstrap startup needs to be executed by systemd as well; that's a bit of a packaging change to make that easier to consume (the CentOS packages already provide appropriate systemd units for this).

In stead of that I'm looking to see if we can persuade the service command to place the mysqld processes outside the scope of the jujud-unit-* cgroup.

James Page (james-page) on 2017-02-25

Changed in charm-percona-cluster:
status:	Confirmed → Triaged
milestone:	none → 17.05

Revision history for this message

James Page (james-page) wrote on 2017-02-28:

#23

Marking Juju task as invalid - the way that pxc is bootstrapped in the primary cause of this issue.

Changed in juju:
status:	Incomplete → Invalid

Revision history for this message

James Page (james-page) wrote on 2017-02-28:

#24

I can workaround this problem in the charm by using systemd-run to ensure that the bootstrap-pxc mysqld gets its own cgroup, but this does need a broader fix in packaging as well (raised distro task to cover this).

Changed in charm-percona-cluster:
status:	Triaged → In Progress
assignee:	nobody → James Page (james-page)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-28: Fix proposed to charm-percona-cluster (master)

#25

Fix proposed to branch: master
Review: https://review.openstack.org/438917

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-28: Fix merged to charm-percona-cluster (master)

#26

Reviewed: https://review.openstack.org/438917
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=fddc1b78f4251db97129f5246fff92fefb18353f
Submitter: Jenkins
Branch: master

commit fddc1b78f4251db97129f5246fff92fefb18353f
Author: James Page <email address hidden>
Date: Tue Feb 28 11:57:43 2017 +0100

Ensure bootstrap-pxc mysqld not in unit cgroup

    The bootstrap process for percona-xtradb-cluster requires execution
    of a non-standard init.d scrip target to start the mysqld in wsrep
    new cluster mode.

    The processes started by the operation where ending up in the cgroup
    associated with the Juju unit daemon, which on restart (as a result
    of a upgrade to juju for example) would result in the mysql daemon
    being killed and not restarted.

Use systemd-run to ensure that the bootstrap-pxc operation ends up
in a distinct cgroup so that this does not happen.

Change-Id: Iff998c4c23fcad71cffe9bbee60df7f00d2c9893
Closes-Bug: 1664025

Changed in charm-percona-cluster:
status:	In Progress → Fix Committed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-03-02: Fix proposed to charm-percona-cluster (stable/17.02)

#27

Fix proposed to branch: stable/17.02
Review: https://review.openstack.org/440278

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-03-02: Fix merged to charm-percona-cluster (stable/17.02)

#28

Reviewed: https://review.openstack.org/440278
Committed: https://git.openstack.org/cgit/openstack/charm-percona-cluster/commit/?id=189e5377e0b2917cb663eec3611071e5853707b5
Submitter: Jenkins
Branch: stable/17.02

commit 189e5377e0b2917cb663eec3611071e5853707b5
Author: James Page <email address hidden>
Date: Tue Feb 28 11:57:43 2017 +0100

Ensure bootstrap-pxc mysqld not in unit cgroup

    The bootstrap process for percona-xtradb-cluster requires execution
    of a non-standard init.d scrip target to start the mysqld in wsrep
    new cluster mode.

Use systemd-run to ensure that the bootstrap-pxc operation ends up
in a distinct cgroup so that this does not happen.

Drop capture of pty for bootstrap-pxc

    Use of the '-t' flag to capture the output of the pty results
    in a non-zero return code in later systemd/Ubuntu releases
    (specifically zesty).

Drop use of this flag for broader compatibility.

    Change-Id: Iff998c4c23fcad71cffe9bbee60df7f00d2c9893
    Closes-Bug: 1664025
    Closes-Bug: 1668833
    (cherry picked from commit fddc1b78f4251db97129f5246fff92fefb18353f)
    (cherry picked from commit 1cae1942d451e0daf9681b2c823643c42565bc33)

James Page (james-page) on 2017-03-02

Changed in charm-percona-cluster:
status:	Fix Committed → Fix Released

Revision history for this message

Launchpad Janitor (janitor) wrote on 2017-03-02:

#29

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in percona-xtradb-cluster-5.6 (Ubuntu):
status:	New → Confirmed

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1644154

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntupercona-xtradb-cluster-5.6 package

juju agent upgrade causes mysqld to stop (part of same systemd cgroup)

Bug Description

Duplicates of this bug

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
percona-xtradb-cluster-5.6 package