Pacemaker doesn't wait for innobackupex-apply finish

Bug #1660275 reported by Alexander Rubtsov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Kuklin
Nominated for Ocata by Oleksiy Molchanov
Mitaka
Fix Released
High
Gabor Orosz
Newton
Fix Committed
High
Vladimir Kuklin

Bug Description

 === Environment ===
Release: MOS 9.0 (build #507, 12/12-2016)
Plugins: Zabbix
Features: Reduced Footprint

=== Description ===
After reboot/failure of one Controller node, MySQL database sync is started
Under some circumstances (big size of the database, slow hardware, etc) the applying InnoDB log files might take longer than usual
If it takes longer than the resource timeout, then Pacemaker kills the MySQL process

 === Actual behavior ===
Pacemaker kills the MySQL process and the applying stage will never end
2017-01-05T12:04:03.984764+01:00 cic-1 -innobackupex-apply[12576]: InnoDB: Setting log file ./ib_logfile101 size to 886 MB
2017-01-05T12:04:04.631371+01:00 cic-1 -innobackupex-apply[12576]: InnoDB: Progress in MB: 100 200 300 400 500 600 700 800
2017-01-05T12:04:10.427842+01:00 cic-1 -innobackupex-apply[12576]: InnoDB: Setting log file ./ib_logfile1 size to 886 MB
2017-01-05T12:04:10.815098+01:00 cic-1 ocf-mysql-wss[10509]: WARNING: p_mysqld: proc_stop(): pid param /var/run/resource-agents/mysql-wss/mysql-wss.pid is not a file or a number, try match by mysqld.*/var/lib/mysql
2017-01-05T12:04:10.818016+01:00 cic-1 ocf-mysql-wss[10513]: INFO: p_mysqld: proc_stop(): Stopping mysqld.*/var/lib/mysql by PID none
2017-01-05T12:04:10.871082+01:00 cic-1 -innobackupex-apply[12576]: InnoDB: Progress in MB: 100 200 300 400 500 600 700innobackupex: Error:
2017-01-05T12:04:10.871098+01:00 cic-1 -innobackupex-apply[12576]: innobackupex: xtrabackup (2nd execution) failed at /usr//bin/innobackupex line 2572.

Eventually, MySQL on this node ends up having inconsistent state of InnoDB log and can't start without manual intervention anymore
2017-01-05T12:06:01.892540+01:00 cic-1 mysqld[17011]: 2017-01-05 12:06:01 17010 [ERROR] InnoDB: Only one log file found.
2017-01-05T12:06:01.892554+01:00 cic-1 mysqld[17011]: 2017-01-05 12:06:01 17010 [ERROR] Plugin 'InnoDB' init function returned error.
2017-01-05T12:06:01.892558+01:00 cic-1 mysqld[17011]: 2017-01-05 12:06:01 17010 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2017-01-05T12:06:01.892761+01:00 cic-1 mysqld[17011]: 2017-01-05 12:06:01 17010 [ERROR] Unknown/unsupported storage engine: innodb
2017-01-05T12:06:01.892773+01:00 cic-1 mysqld[17011]: 2017-01-05 12:06:01 17010 [ERROR] Aborting

 === Expected behavior ===
Please add the logic in OCF script which checks that the innobackupex-apply is still in progress and don't let Pacemaker kill MySQL process
(something similar is implemented with regards to SST process: https://bugs.launchpad.net/fuel/+bug/1478310)

 === Attachments ===
I'm not allowed to upload customer's log files here
If MySQL/Pacemaker logs are required, I can show them privately

Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
Changed in fuel:
status: New → Confirmed
milestone: 9.2 → 11.0
tags: added: area-library
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/429787

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/435352

Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Vladimir Kuklin (vkuklin)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/435353

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/mitaka)

Reviewed: https://review.openstack.org/429787
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=525461a88aa4521485c832e86644953741172980
Submitter: Jenkins
Branch: stable/mitaka

commit 525461a88aa4521485c832e86644953741172980
Author: Gabor Orosz <email address hidden>
Date: Mon Feb 6 17:20:00 2017 +0100

    Handle the InnoDB restore phase as part of SST

    Previously, the backup restoration phase was not considered part of the
    State Snapshot Transfer, as only the backup creation and transportation
    processes are checked for this purpose. To cover the missing phase as
    well, it is more reasonable to monitor the appropriate process that
    controls the entire State Snapshot Transfer.

    Closes-Bug: 1660275

    Change-Id: Ie98af501c1cd130098381a8463452f892898470b
    Signed-off-by: Gabor Orosz <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/435352
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=7f81f3aea2412ee703c8d20894b731859b0e5243
Submitter: Jenkins
Branch: master

commit 7f81f3aea2412ee703c8d20894b731859b0e5243
Author: Gabor Orosz <email address hidden>
Date: Mon Feb 6 17:20:00 2017 +0100

    Handle the InnoDB restore phase as part of SST

    Previously, the backup restoration phase was not considered part of the
    State Snapshot Transfer, as only the backup creation and transportation
    processes are checked for this purpose. To cover the missing phase as
    well, it is more reasonable to monitor the appropriate process that
    controls the entire State Snapshot Transfer.

    Closes-Bug: 1660275

    Change-Id: Ie98af501c1cd130098381a8463452f892898470b
    Signed-off-by: Gabor Orosz <email address hidden>

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/newton)

Reviewed: https://review.openstack.org/435353
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=aed70b97674f418a1d6509ddee889c34e98ffd2c
Submitter: Jenkins
Branch: stable/newton

commit aed70b97674f418a1d6509ddee889c34e98ffd2c
Author: Gabor Orosz <email address hidden>
Date: Mon Feb 6 17:20:00 2017 +0100

    Handle the InnoDB restore phase as part of SST

    Previously, the backup restoration phase was not considered part of the
    State Snapshot Transfer, as only the backup creation and transportation
    processes are checked for this purpose. To cover the missing phase as
    well, it is more reasonable to monitor the appropriate process that
    controls the entire State Snapshot Transfer.

    Closes-Bug: 1660275

    Change-Id: Ie98af501c1cd130098381a8463452f892898470b
    Signed-off-by: Gabor Orosz <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-library 11.0.0.0rc1

This issue was fixed in the openstack/fuel-library 11.0.0.0rc1 release candidate.

Revision history for this message
Ekaterina Shutova (eshutova) wrote :

Verified on 9.2 + mu1 updates.

tags: added: on-verification
Revision history for this message
Sergey Novikov (snovikov) wrote :

Verified on MOS 10.0 (RC #2)

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.