percona server crashes on Power, ARM

Bug #1713778 reported by bugproxy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Percona Cluster Charm
Invalid
Undecided
Unassigned
The Ubuntu-power-systems project
Triaged
High
Ryan Beisner
juju (Ubuntu)
Invalid
Undecided
Unassigned
percona-xtradb-cluster-5.6 (Ubuntu)
In Progress
High
Jorge Niedbalski

Bug Description

== Comment: #0 - MANOJ N. KUMAR - 2017-08-29 09:41:10 ==
---Problem Description---
When deploying Openstack on a MaaS cluster using Juju charms, the MySQL instance (or cluster) is often in 'error' state.

---uname output---
Linux juju-6db2c9-0-lxd-5 4.4.0-92-generic #115-Ubuntu SMP Thu Aug 10 09:04:26 UTC 2017 ppc64le ppc64le ppc64le GNU/Linux

---Additional Hardware Info---
Openstack cluster on 4 nodes configured using MaaS.
Juju openstack bundle (in this case deployed by JOID).
However JOID is not relevant

Machine Type = LXD container running the MySQL instance, on ppc64el

---Steps to Reproduce---
 juju deploy cs:bundle/openstack-base-49

Userspace tool common name: mysqld

Userspace rpm: percona-xtradb-cluster-server-5.6: /usr/sbin/mysqld

Revision history for this message
bugproxy (bugproxy) wrote : mysql error.log

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-158124 severity-critical targetmilestone-inin16043
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → juju (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote :

Default Comment by Bridge

Changed in ubuntu-power-systems:
importance: Undecided → Critical
Changed in ubuntu-power-systems:
assignee: nobody → OpenStack Charmers (openstack-charmers)
Revision history for this message
Ryan Beisner (1chb1n) wrote : Re: Juju deploy of openstack on ppc64el fails due to mysqld crash

Can we please get the bundle yaml and juju status output for the deployment?

Changed in ubuntu-power-systems:
status: New → Incomplete
Changed in juju (Ubuntu):
status: New → Invalid
Changed in ubuntu-power-systems:
assignee: OpenStack Charmers (openstack-charmers) → Ryan Beisner (1chb1n)
Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Juju status attached.

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

bundles.yaml attached

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

I tried to put

innodb_force_recovery=3

As a work-around in the mysqld.cnf file in the container. Unfortunately that change does not take, because juju over-writes it with the default.

I also tried to set the performance_schema value in the mysql charm to true (default is false), but that did not help.

Narinder had earlier suggested that bump up the innodb-buffer-pool-size value up from the default (1G). So I have set it to 64G

  innodb-buffer-pool-size:
    description: |
      By default this value will be set according to 50% of system total
      memory but also can be set to any specific value for the system.
      Supported suffixes include K/M/G/T. If suffixed with %, one will get that
      percentage of system total memory allocated.
    type: string
    value: 64G

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Let me know if you need anything else.

Changed in ubuntu-power-systems:
status: Incomplete → In Progress
Revision history for this message
Manoj Iyer (manjo) wrote :

Can you pls try the following and restart nova-compute?

sudo myisamchk -r -q /var/lib/percona-xtradb-cluster/mysql/db
sudo myisamchk -r -q /var/lib/percona-xtradb-cluster/mysql/user

Changed in juju (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Ryan Beisner (1chb1n)
importance: Undecided → Critical
status: Invalid → Incomplete
status: Incomplete → Triaged
Revision history for this message
Manoj Kumar (manojnkumar) wrote :
Download full text (6.1 KiB)

Manoj:

I ran the two commands on the one container where mysqld had crashed.

ubuntu@juju-6db2c9-0-lxd-5:~$ sudo myisamchk -r -q /var/lib/percona-xtradb-cluster/mysql/db
- check record delete-chain
- recovering (with sort) MyISAM-table '/var/lib/percona-xtradb-cluster/mysql/db'
Data records: 22
- Fixing index 1
- Fixing index 2
ubuntu@juju-6db2c9-0-lxd-5:~$ sudo myisamchk -r -q /var/lib/percona-xtradb-cluster/mysql/user
- check record delete-chain
- recovering (with sort) MyISAM-table '/var/lib/percona-xtradb-cluster/mysql/user'
Data records: 26
- Fixing index 1

Then I stopped and started that container.

THe daemon did not stay up long. It went down pretty soon with this in the error.log:

170829 21:58:15 mysqld_safe Starting mysqld daemon with databases from /var/lib/percona-xtradb-cluster
170829 21:58:15 mysqld_safe Skipping wsrep-recover for ee3cf6b7-89c8-11e7-80c5-2aec0eea34fa:28447 pair
170829 21:58:15 mysqld_safe Assigning ee3cf6b7-89c8-11e7-80c5-2aec0eea34fa:28447 to wsrep_start_position
2017-08-29 21:58:15 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2017-08-29 21:58:15 0 [Note] /usr/sbin/mysqld (mysqld 5.6.34-79.1-79.1) starting as process 1279 ...
2017-08-29 21:58:15 1279 [Note] WSREP: Read nil XID from storage engines, skipping position init
2017-08-29 21:58:15 1279 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
2017-08-29 21:58:15 1279 [Note] WSREP: wsrep_load(): Galera 3.19(rXXXX) by Codership Oy <email address hidden> loaded successfully.
2017-08-29 21:58:15 1279 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
2017-08-29 21:58:15 1279 [Note] WSREP: Found saved state: ee3cf6b7-89c8-11e7-80c5-2aec0eea34fa:28447, safe_to_bootsrap: 0
2017-08-29 21:58:15 1279 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/percona-xtradb-cluster/; base_host = 172.29.239.12; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evs.suspect_timeout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/percona-xtradb-cluster/; gcache.keep_pages_count = 0; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/percona-xtradb-cluster//galera.cache; gcache.page_size = 128M; gcache.recover = no; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segme
2017-08-29 21:58:15 1279 [Note] WSREP: GCache history reset: old(ee3cf6b7-89c8-11e7-80c5-2aec0eea34fa:0) -> new(ee3cf6b7-89c8-11e7-80c5-2aec0eea34fa:28447)
2017-08-29 21:58:15 1279 [Note] WSREP: Assign initial position for certification: 28447, protocol version: -1
2017-08-29 21:58:15 1279 [Note] WSREP: wsre...

Read more...

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Manoj: Any other suggestions?

Revision history for this message
Manoj Iyer (manjo) wrote :

Manoj, the bug is now assigned to Ryan who is our expert on openstack. Lets see what his thoughts are on this issue.

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Ryan: Any update on this issue. We are not able to run an HA configuration of Openstack on Power as a result of this issue. In our current configuration we deployed Openstack with JOID. However we have seen this issue in deploying Openstack directly as well.

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: In Progress → Triaged
Revision history for this message
Ryan Beisner (1chb1n) wrote :

See related bug https://bugs.launchpad.net/charm-test-infra/+bug/1657256 which may actually be the right but to track.

I can confirm that percona xtradb cluster crashes on ppc64el Xenial. We're working on validating the proposed fixes in the other bug.

Changed in juju (Ubuntu):
status: Triaged → Invalid
Revision history for this message
Ryan Beisner (1chb1n) wrote :

For ppc64el validation, we temporarily substitute the mysql charm (cs:mysql) in place of the percona charm. Unfortunately, that charm doesn't appear to have HA features enabled. We will need the percona packaging fixes and/or explore mysql-proper HA options.

Changed in juju (Ubuntu):
assignee: Ryan Beisner (1chb1n) → nobody
importance: Critical → Undecided
Changed in ubuntu-power-systems:
importance: Critical → High
Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Ryan: The comment in the other bug asks whether we can try the percona fix. Can you tell me how to get the charm to pick up this fix?

You can update your system with unsupported packages from this untrusted PPA by adding ppa:niedbalski/lp1657256 to your system's Software Sources.

Manoj Iyer (manjo)
tags: added: triage-g
Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Ryan: I set juju config mysql source=ppa:niedbalski/lp1657256
But that does location does not seem to have the right packages:

2017-09-19 21:12:09 DEBUG install Err:7 http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu xenial/main ppc64el Packages
2017-09-19 21:12:09 DEBUG install 404 Not Found
2017-09-19 21:12:09 DEBUG install Ign:8 http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu xenial/main all Packages
2017-09-19 21:12:09 DEBUG install Ign:9 http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu xenial/main Translation-en
2017-09-19 21:12:11 DEBUG install Reading package lists...
2017-09-19 21:12:11 DEBUG install W: The repository 'http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu xenial Release' does not have a Release file.
2017-09-19 21:12:11 DEBUG install E: Failed to fetch http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu/dists/xenial/main/binary-ppc64el/Packages 404 Not Found
2017-09-19 21:12:11 DEBUG install E: Some index files failed to download. They have been ignored, or old ones used instead.

Revision history for this message
James Page (james-page) wrote :

Marking charm task as invalid as this is a application issue.

Changed in charm-percona-cluster:
status: New → Invalid
Revision history for this message
Manoj Iyer (manjo) wrote :

Manoj,

There is a PPA build https://launchpad.net/~niedbalski/+archive/ubuntu/lp1657256/ for that package that you could give it a try and report here ? https://bugs.launchpad.net/charm-test-infra/+bug/1657256

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Seeing this issue now:

2017-09-21 20:07:06 DEBUG install Some packages could not be installed. This may mean that you have
2017-09-21 20:07:06 DEBUG install requested an impossible situation or if you are using the unstable
2017-09-21 20:07:06 DEBUG install distribution that some required packages have not yet been created
2017-09-21 20:07:06 DEBUG install or been moved out of Incoming.
2017-09-21 20:07:06 DEBUG install The following information may help to resolve the situation:
2017-09-21 20:07:06 DEBUG install
2017-09-21 20:07:06 DEBUG install The following packages have unmet dependencies:
2017-09-21 20:07:06 DEBUG install percona-xtradb-cluster-server-5.6 : Depends: libevent-2.1-6 (>= 2.1.8-stable) but it is not installable
2017-09-21 20:07:06 DEBUG install E: Unable to correct problems, you have held broken packages.
2017-09-21 20:07:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "juju-log"
2017-09-21 20:07:06 INFO juju-log Couldn't acquire DPKG lock. Will retry in 10 seconds

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Manoj, can you build a ppa for Xenial?

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Manoj: Any update? We cannot make progress on Power with Openstack using Juju until this is resolved.

Revision history for this message
Manoj Iyer (manjo) wrote :

Manoj, Please refresh the PPA mentioned in comment #18 and you should see the package for Xenial published on 9/25.

Revision history for this message
Manoj Kumar (manojnkumar) wrote :
Download full text (4.5 KiB)

Manoj: Tried it again, but seeing the same libevent dependency issue as the package posted for artful:

2017-09-27 21:48:01 DEBUG worker.uniter.jujuc server.go:178 running hook tool "config-get"
2017-09-27 21:48:02 DEBUG install Hit:1 http://ports.ubuntu.com/ubuntu-ports xenial InRelease
2017-09-27 21:48:02 DEBUG install Hit:2 http://ports.ubuntu.com/ubuntu-ports xenial-updates InRelease
2017-09-27 21:48:02 DEBUG install Get:3 http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu xenial InRelease [18.0 kB]
2017-09-27 21:48:02 DEBUG install Hit:4 http://ports.ubuntu.com/ubuntu-ports xenial-backports InRelease
2017-09-27 21:48:03 DEBUG install Hit:5 http://ports.ubuntu.com/ubuntu-ports xenial-security InRelease
2017-09-27 21:48:03 DEBUG install Ign:3 http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu xenial InRelease
2017-09-27 21:48:03 DEBUG install Get:6 http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu xenial/main ppc64el Packages [2288 B]
2017-09-27 21:48:03 DEBUG install Get:7 http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu xenial/main Translation-en [708 B]
2017-09-27 21:48:05 DEBUG install Fetched 21.0 kB in 1s (13.8 kB/s)
2017-09-27 21:48:06 DEBUG install Reading package lists...
2017-09-27 21:48:06 DEBUG install W: GPG error: http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu xenial InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 9819BABD0C9E4D43
2017-09-27 21:48:06 DEBUG install W: The repository 'http://ppa.launchpad.net/niedbalski/lp1657256/ubuntu xenial InRelease' is not signed.
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-get"
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "is-leader"
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "config-get"
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-get"
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "is-leader"
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "config-get"
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-set"
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "leader-get"
2017-09-27 21:48:06 DEBUG juju.worker.uniter.remotestate watcher.go:354 got leader settings change: ok=true
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "juju-log"
2017-09-27 21:48:06 DEBUG juju-log Generating new password file '/var/lib/charm/mysql/mysql.passwd'
2017-09-27 21:48:06 DEBUG worker.uniter.jujuc server.go:178 running hook tool "juju-log"
2017-09-27 21:48:06 INFO juju-log Making dir /var/lib/charm/mysql root:root 770
2017-09-27 21:48:07 DEBUG worker.uniter.jujuc server.go:178 running hook tool "juju-log"
2017-09-27 21:48:07 DEBUG juju-log Writing file /var/lib/charm/mysql/mysql.passwd root:root 660
2017-09-27 21:48:07 DEBUG worker.uniter.jujuc server.go:178 running hook tool "is-leader"
2017-09-27 21:48:07 DEBUG worker.uniter.jujuc server.go:...

Read more...

dann frazier (dannf)
summary: - Juju deploy of openstack on ppc64el fails due to mysqld crash
+ percona server crashes on !x86 archs
summary: - percona server crashes on !x86 archs
+ percona server crashes on Power, ARM
Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Manoj: The real issue on the PPA seems to be the dependency on libevent-2.1-6. Is there a way to remove that dependency?

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Any update on this fix? We still do not have a functional patch that I can validate on ppc64le

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

@Manoj, the build on the PPA was created for artful testing, and copied into xenial series.
I will rebuild the xenial version with the patch applied on top for you to test it.

I will let you know once this is ready.

Changed in percona-xtradb-cluster-5.6 (Ubuntu):
status: New → In Progress
importance: Undecided → High
assignee: nobody → Jorge Niedbalski (niedbalski)
Revision history for this message
Manoj Iyer (manjo) wrote :

Jorge has built a Xenial version of the package in this ppa ppa:niedbalski/xenial-lp1657256 Please give it a try.

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

This might have installed standalone, but I could not try it out, because what I am testing is a juju charm bundle to deploy openstack. And the charm fails with:

2017-10-08 15:41:09 DEBUG install WARNING: The following packages cannot be authenticated!
2017-10-08 15:41:09 DEBUG install percona-xtradb-cluster-server-5.6
2017-10-08 15:41:09 DEBUG install E: There were unauthenticated packages and -y was used without --allow-unauthenticated

Revision history for this message
Manoj Kumar (manojnkumar) wrote :

Jorge: When I installed it manually, the openstack cluster came up fine.

Revision history for this message
Narinder Gupta (narindergupta) wrote :

It seems team have tested and verified the package from PPA and it works fine in their current testing. Can we start the SRU process?

Revision history for this message
dann frazier (dannf) wrote :

Is this a duplicate of LP: #1657256 ?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.