MySQL failed to start on remaining node after stop deploying added controllers and adding new controllers

Bug #1515249 reported by Vasily Gorin
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Fuel Library (Deprecated)
7.0.x
Won't Fix
High
Fuel Sustaining

Bug Description

Scenario:
1. Deploy any environment with 1 controller and NeutronTUN or NeutronVLAN
2. Add 2 controllers to the cluster
3. Deploy changes
4. Stop deployment at the middle of deployment of added controllers
5. Remove these added controllers from the cluster
6. Add 2 controllers to the cluster once again
7. Deploy changes

Log from puppet(node-1, controller):
2015-11-11 11:59:07 +0000 Puppet (err): /usr/bin/mysql -uwsrep_sst -p3bU596Qu -Nbe "show status like 'wsrep_local_state_comment'" | /bi
n/grep -q Synced && sleep 10 returned 1 instead of one of [0]
.....
....
2015-11-11 11:59:07 +0000 /Stage[main]/Galera/Exec[wait-for-synced-state]/returns (err): change from not run to 0 failed: /usr/bin/mysql -uwsrep_sst -p3bU596Qu -Nbe "show status like 'wsrep_local_state_comment'" | /bin/grep -q Synced && sleep 10 returned 1 instead of one of [0]

Revision history for this message
Vasily Gorin (vgorin) wrote :
Artem Roma (aroma-x)
Changed in fuel:
status: New → Confirmed
assignee: nobody → Fuel Library Team (fuel-library)
tags: added: area-library
Revision history for this message
Michael Polenchuk (mpolenchuk) wrote :

This looks similar to bug/1513401

tags: added: galera
Changed in fuel:
status: Confirmed → New
milestone: 7.0-updates → 8.0
tags: added: life-cycle-management
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Need to confirm or decline for the 8.0 as well

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

first deploy:
 2015-11-10T16:09:20 ./astute/astute info: [670] Processing RPC call 'granular_deploy'
 2015-11-10T16:54:20 ./astute/astute info: [670] 76027fbf-6b7e-4d01-9be3-e421ae734828: Spent 14.594449397 seconds on puppet run for following nodes(uids): 1
time step due to snapshot revert:
 2015-11-11T09:34:19 ./astute/astute info: [691] Processing RPC call 'image_provision'
 2015-11-11T09:38:36 ./astute/astute info: [691] Processing RPC call 'granular_deploy'
2013 error first encounter:
 2015-11-11T09:58:24.026146+00:00 node-1 neutron-server err: 2015-11-11 09:58:24.021 4006 ERROR oslo_messaging.rpc.dispatcher [req-3497b289-ff7a-4599-acf1-525c8747acde ] Exception during message handling: (OperationalError) (2013, "Lost connection to MySQL server at 'reading initial communication packet', system error: 0") None None
stop and redeploy (at this point MySQL is already broken, see 2013.*initial errors):
 2015-11-11T10:42:31 ./astute/astute info: [691] Processing RPC call 'stop_deploy_task'
 2015-11-11T11:04:38 ./astute/astute info: [681] Processing RPC call 'granular_deploy'
 2015-11-11T11:59:13 ./astute/astute err: [681] Error running RPC method granular_deploy: Deployment failed on nodes 1, trace:

The failure time frame: from 2015-11-11T09:38:36 to 2015-11-11T09:58:25, which is close to the revert time. This is likely the env specific issue

Changed in fuel:
status: New → Incomplete
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

As I can see from the node-1's ocf-mysql-wss.log, the restart of MySQL on the single remaining node-1 was expected at the
2015-11-11T09:58, but it seems failed to start, hence failed the deployment:

 2015-11-11T09:58:48.351564+00:00 info: INFO: MySQL started
 2015-11-11T10:42:55.199364+00:00 err: ERROR: MySQL lost quorum or uninitialized
 2015-11-11T10:42:55.466521+00:00 info: INFO: Sending SIGTERM to PID: 4622

And the pattern repeats endlessly:
 2015-11-11T10:43:06.518454+00:00 info: INFO: PIDFile /var/run/resource-agents/mysql-wss/mysql-wss.pid of MySQL server not found. Sleeping for
  5 seconds. 0 retries left
 2015-11-11T10:43:06.525678+00:00 info: INFO: MySQL is not running
 2015-11-11T10:43:11.594361+00:00 info: INFO: PIDFile /var/run/resource-agents/mysql-wss/mysql-wss.pid of MySQL server not found. Sleeping for
  5 seconds. 0 retries left
 2015-11-11T10:43:11.604084+00:00 info: INFO: MySQL is not running
 2015-11-11T10:43:16.652601+00:00 info: INFO: PIDFile /var/run/resource-agents/mysql-wss/mysql-wss.pid of MySQL server not found. Sleeping for
  5 seconds. 0 retries left
 2015-11-11T10:43:16.656860+00:00 err: ERROR: MySQL is not running
 2015-11-11T10:43:16.667398+00:00 info: INFO: GTID OK: bcf11d51-87c7-11e5-9e59-e20967ba0e6b:55799
 2015-11-11T10:43:16.671592+00:00 info: INFO: Galera GTID: bcf11d51-87c7-11e5-9e59-e20967ba0e6b:55799
 2015-11-11T10:43:16.717483+00:00 info: INFO: Starting MySQL

tags: added: tricky
summary: - Deploy failed after stop deploying added controllers and adding new
- controllers
+ MySQL failed to start on remaining node after stop deploying added
+ controllers and adding new controllers
tags: added: team-bugfix
Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

Moved to invalid due to stalling for 4 weeks in incomplete state. Reopen if reproduced.

Changed in fuel:
status: Incomplete → Invalid
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Setting this as Won't fix for 7.0 as there is no progress for 6 months.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.