[BVT] Deployment failed due to mysql connectivity problems

Bug #1561468 reported by Timur Nurlygayanov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
Critical
Alex Schultz

Bug Description

This issue reproduced on 9.0 build #78, which we are using on CI for MOS packages gating right now.

Detailed bug description:
Sometimes (in 30% of cases) deployment failed with the error:

Task 'deploy' has incorrect status. error != ready, 'Deployment has failed. Critical nodes failed: Node[1], Node[3], Node[2], Node[5], Node[4], Node[6]. Stopping the deployment process!'

Steps to reproduce:
1. Run bvt_2 SWARM tests:
python run_system_test.py run -q --nologcapture --with-xunit --group=bvt_2

Expected results:
All tests passed

Actual result:
Tests failed with the error:
Task 'deploy' has incorrect status. error != ready, 'Deployment has failed. Critical nodes failed: Node[1], Node[3], Node[2], Node[5], Node[4], Node[6]. Stopping the deployment process!'

Reproducibility:
30 % of cases, the reason is not clear yet.

Workaround:
We don't know the workaround yet.

Impact:
BVT gating jobs affected. It is blocker issue which prevents to make this job voting.

More information:
https://packaging-ci.infra.mirantis.net/job/9.0-pkg-systest-ubuntu/169/console

Diagnostic snapshot is attached.

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :
summary: - Deployment failed with unclear error about failed critical nodes
+ Deployment failed with unclear error about failed critical nodes:
+ Critical nodes failed: Node[1], Node[3], Node[2], Node[5], Node[4],
+ Node[6]
summary: - Deployment failed with unclear error about failed critical nodes:
+ [BVT] Deployment failed with unclear error about failed critical nodes:
Critical nodes failed: Node[1], Node[3], Node[2], Node[5], Node[4],
Node[6]
tags: added: swarm-blocker
Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote : Re: [BVT] Deployment failed with unclear error about failed critical nodes: Critical nodes failed: Node[1], Node[3], Node[2], Node[5], Node[4], Node[6]

Added tag 'swarm-blocker' because this issue is the reason of failed BVT tests for MOS gating job.

Changed in fuel:
assignee: nobody → Fuel Library Team (fuel-library)
milestone: none → 9.0
Revision history for this message
Alex Schultz (alex-schultz) wrote :

It appears that it failed because of the mysql issues. A fix was merged this morning for it, https://review.openstack.org/#/c/296712/. You'll probably need to make sure this is applied to the packaging CI.

summary: - [BVT] Deployment failed with unclear error about failed critical nodes:
- Critical nodes failed: Node[1], Node[3], Node[2], Node[5], Node[4],
- Node[6]
+ [BVT] Deployment failed due to mysql connectivity problems
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Alex Schultz (alex-schultz)
status: Confirmed → Fix Committed
Revision history for this message
Alex Schultz (alex-schultz) wrote :

For additional information:

From astute.log:
2016-03-23 14:20:30 DEBUG [28220] Task time summary: primary-keystone with status failed on node 2 took 00:12:22

From keystone-manage.log:
2016-03-23T14:08:56.271997+00:00 crit: 2016-03-23 14:08:56.261 27482 CRITICAL keystone [-] DBConnectionError: (_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')

This occurred because the galera cluster was attempting to form so the mysql servers were not yet stable and in a state of flux while the task attempted to write to the DBs. This is addressed by the previously mentioned commit to wait until the databases have synced before starting the openstack deployment items.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.