Fuel for OpenStack

Retry of neutron-db-sync doesn't work if execution fails during tables creation

Bug #1769860 reported by Alexander Rubtsov on 2018-05-08

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Won't Fix	High	Alexander Rubtsov	Fuel for OpenStack 9.x-updates

Bug Description

Release: MOS 9.2

The corresponding excerpts from Puppet log file:
/Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns (debug): Exec try 1/10
Exec[neutron-db-sync](provider=posix) (debug): Executing 'neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head'
Puppet (debug): Executing 'neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head'
/Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns (debug): Sleeping for 5.0 seconds between tries
.....
/Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns (debug): Exec try 10/10
Exec[neutron-db-sync](provider=posix) (debug): Executing 'neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head'
Puppet (debug): Executing 'neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head'
/Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns (debug): Sleeping for 5.0 seconds between tries
/Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns (debug): Sleeping for 5.0 seconds between tries
.....
/Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync]/returns (notice): sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1050, "Table 'agents' already exists") [SQL: u"\nCREATE TABLE agents (\n\tid VARCHAR(36) NOT NULL, \n\tagent_type VARCHAR(255) NOT NULL,.....]
/Stage[main]/Neutron::Db::Sync/Exec[neutron-db-sync] (err): Failed to call refresh: neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugin.ini upgrade head returned 1 instead of one of [0]
...
http://paste.openstack.org/show/x8rHSl9ErSvD9tivzj4h/

This failure doesn't make Fuel mark the entire deployment as failed, which is wrong because actually Neutron is unable to operate.

This issue is rarely reproducible.
It seems that it occurs only if creation/population of MySQL tables was interrupted in the middle of the process.

Tags:

Revision history for this message

Alexander Rubtsov (arubtsov) wrote on 2018-05-08:

#1

sla2 for 9.0-updates

Changed in fuel:
importance:	Undecided → Medium
assignee:	nobody → MOS Maintenance (mos-maintenance)
milestone:	none → 9.x-updates
tags:	added: customer-found sla2

Oleksiy Molchanov (omolchanov) on 2018-05-08

Changed in fuel:
status:	New → Confirmed
assignee:	MOS Maintenance (mos-maintenance) → Oleksiy Molchanov (omolchanov)
assignee:	Oleksiy Molchanov (omolchanov) → Alexander Rubtsov (arubtsov)

Revision history for this message

Oleksiy Molchanov (omolchanov) wrote on 2018-05-08:

#2

Alexander,

1) Is this initial fuel deploy?
2) Can we have full diagnostic snapshot?

Revision history for this message

Alexander Rubtsov (arubtsov) wrote on 2018-05-10:

#3

Oleksiy,

1) I will ask the customer about that
2) Unfortunately, the log files from the problematic deployment are not available anymore.

Revision history for this message

Alexander Rubtsov (arubtsov) wrote on 2018-06-06:

#4

The customer was not able to reproduce the issue again to collect the diagnostic snapshot.

Changed in fuel:
status:	Confirmed → Incomplete

Revision history for this message

Alexander Rubtsov (arubtsov) wrote on 2018-06-15:

#5

The issue has appeared again and the customer was able to collect the diagnostic snapshot this time. Please contact me so that I will provide you with the log files directly.

Changed in fuel:
status:	Incomplete → New
assignee:	Alexander Rubtsov (arubtsov) → Oleksiy Molchanov (omolchanov)

Revision history for this message

Alexander Rubtsov (arubtsov) wrote on 2018-06-18:

#6

sla1 for 9.0-updates

Changed in fuel:
importance:	Medium → High
tags:	added: sla1 removed: sla2

Denis Meltsaykin (dmeltsaykin) on 2018-06-18

Changed in fuel:
milestone:	9.x-updates → 9.2-mu-7

Revision history for this message

Alexander Rubtsov (arubtsov) wrote on 2018-06-26:

#7

It seems the issue is wider than just neutron-db-sync. The same customer has hit similar incident with Cinder database.

Revision history for this message

Denis Meltsaykin (dmeltsaykin) wrote on 2018-06-29:

#8

Looks like this is a rarely occurring issue which is not that easy to fix, see also https://bugs.launchpad.net/fuel/+bug/1641136.

Revision history for this message

Denis Meltsaykin (dmeltsaykin) wrote on 2018-07-02:

#9

Ok, so it seems that the root cause of the issue is some kind of high CPU load during deployment which caused MySQL to be broken. This leads to failures of the db syncs which are mostly executed as a "refreshonly = true" events and the well known bug in Puppet lets the flow to continue even if refreshonly events are failed. But it shouldn't be a big problem since while you have non-working MySQL the deployment will still be failed. Regarding the possible fixes: db-syncs are known to be non-idempotent in older versions, so in Mitaka it's just dangerous to run *-db-sync over an already prepared database so these events *must be* "refreshonly". And on the other hand we cannot change the puppet code itself or update puppet to 5.x version since it would take enormous amount of testing which is not available for the stable products.

Given all the above and the fact that the issue normally occurs very rarely (no occurrences for the last year, ~100 SWARM runs with ~150 full cluster deployments in each SWARM run) and is mostly connected with overloaded H/W or VM resources I'm marking it as Won't Fix. Please add more resources to your deployments and troubleshoot high load on your environments.

Changed in fuel:
status:	New → Won't Fix
assignee:	Oleksiy Molchanov (omolchanov) → Alexander Rubtsov (arubtsov)

Denis Meltsaykin (dmeltsaykin) on 2018-07-16

Changed in fuel:
milestone:	9.2-mu-7 → 9.x-updates

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.