[SRU] race between nova-compute and neutron-ovs-cleanup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| nova (Ubuntu) |
High
|
Edward Hope-Morley | ||
| Trusty |
High
|
Edward Hope-Morley | ||
| Utopic |
High
|
Edward Hope-Morley | ||
| Vivid |
High
|
Edward Hope-Morley |
Bug Description
[Impact]
This issue appears to be a consequence of https:/
I have started to spot, however, that on some hosts (metal only) there is now a race between the two whereby nova-compute sometimes fails to start on system boot/reboot with the following in /var/log/
...
libvirt-bin stop/waiting
wait-for-state stop/waiting
neutron-ovs-cleanup start/pre-start, process 3084
start: Job failed to start
If I manually restart nova-compute all is fine. So this looks like a race between nova-compute's wait-for-state and neutron-
The proposed solution here is add some retry logic to nova-compute upstart job to tolerate neutron-ovs-cleanup not being able to start yet. We, therefore, allow a certain number of retries, every other with an incremented delay, before giving up and allowing nova-compute to start anyway. If ovs-cleanup failed to start after what is a failry liberal retry period, it is assumed to have failed altogether thus making is safe(ish) to start nova-compute.
[Test Case]
In one terminal (as root) do:
service neutron-ovs-cleanup stop; service openvswitch-switch stop; service nova-compute restart
In another do:
sudo tail -F /var/log/
Observe the retries occurring
Then do 'sudo service openvswitch-switch start' and observe nova-compute retry and succeed.
[Regression Potential]
If openvswitch-switch does not start within the max retries and intervals nova-compute will start anyway and of ovs-cleanup were at some point to run one would see the behaviour that LP 1420572 was intended to resolve. It does not seem to make sense to wait indefinitely for ovs-cleanup to be up and the coded interval is pretty liberal and should be plenty enough.
affects: | nova (Ubuntu) → neutron (Ubuntu) |
description: | updated |
description: | updated |
Changed in neutron (Ubuntu Trusty): | |
importance: | Undecided → High |
Changed in neutron (Ubuntu Utopic): | |
importance: | Undecided → High |
Changed in neutron (Ubuntu Vivid): | |
importance: | Undecided → High |
Changed in neutron (Ubuntu): | |
status: | New → In Progress |
assignee: | nobody → Edward Hope-Morley (hopem) |
Edward Hope-Morley (hopem) wrote : | #2 |
So it does appear that this a result of the openvswitch-switch service taking a while to startup on boot/reboot. I have added some retry logic to nova-compute and it appears that openvswitch is taking around 80s to start:
neutron-ovs-cleanup start/pre-start, process 3055
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 1s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 3s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 3s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 5s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 5s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 7s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 7s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 9s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 9s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 11s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 11s
neutron-ovs-cleanup start/pre-start, process 3055
Attempting to start neutron-ovs-cleanup
start: Job is already running: neutron-ovs-cleanup
Recheck neutron-ovs-cleanup status in 13s
neutron-ovs-cleanup start/running
2015-07-03 17:51:40.660 15690 DEBUG nova.servicegro
2015-07-03 17:51:43.053 15690 INFO nova.openstack.
2015-07-03 17:51:44.320 15690 DEBUG stevedore.extension [-] found extension EntryPoint.
2015-07-03 17:51:44.602 15690 DEBUG stevedore.extension [-] found extension EntryPoint.
Edward Hope-Morley (hopem) wrote : | #4 |
Edward Hope-Morley (hopem) wrote : | #5 |
description: | updated |
Changed in nova (Ubuntu Trusty): | |
status: | New → In Progress |
Changed in nova (Ubuntu Utopic): | |
status: | New → In Progress |
Changed in nova (Ubuntu Vivid): | |
status: | New → In Progress |
Changed in nova (Ubuntu Trusty): | |
assignee: | nobody → Edward Hope-Morley (hopem) |
Changed in nova (Ubuntu Utopic): | |
assignee: | nobody → Edward Hope-Morley (hopem) |
Changed in nova (Ubuntu Vivid): | |
assignee: | nobody → Edward Hope-Morley (hopem) |
description: | updated |
description: | updated |
The attachment "trusty.debdiff" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]
tags: | added: patch |
Edward Hope-Morley (hopem) wrote : | #7 |
To clarify, the reason I am not using wait-for-state (http://
Edward Hope-Morley (hopem) wrote : | #8 |
Launchpad Janitor (janitor) wrote : | #9 |
This bug was fixed in the package nova - 2:12.0.
---------------
nova (2:12.0.
* d/nova-
neutron-
-- Edward Hope-Morley <email address hidden> Mon, 06 Jul 2015 11:53:51 +0100
Changed in nova (Ubuntu): | |
status: | In Progress → Fix Released |
Edward Hope-Morley (hopem) wrote : | #10 |
Hello Edward, or anyone else affected,
Accepted nova into trusty-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
Changed in nova (Ubuntu Trusty): | |
status: | In Progress → Fix Committed |
tags: | added: verification-needed |
Changed in nova (Ubuntu Utopic): | |
status: | In Progress → Fix Committed |
Chris J Arges (arges) wrote : | #12 |
Hello Edward, or anyone else affected,
Accepted nova into utopic-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
Chris J Arges (arges) wrote : | #13 |
Hello Edward, or anyone else affected,
Accepted nova into vivid-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
Changed in nova (Ubuntu Vivid): | |
status: | In Progress → Fix Committed |
Edward Hope-Morley (hopem) wrote : | #14 |
T/U/V verified *-proposed builds. Deployed Openstack I/J/K with these packages and all seems good.
tags: |
added: verification-done removed: verification-needed |
Launchpad Janitor (janitor) wrote : | #15 |
This bug was fixed in the package nova - 1:2014.
---------------
nova (1:2014.
[ Edward Hope-Morley ]
- d/nova-
and neutron-ovs-cleanup (LP: #1471022)
-- Edward Hope-Morley <email address hidden> Wed, 08 Jul 2015 09:44:18 -0500
Changed in nova (Ubuntu Trusty): | |
status: | Fix Committed → Fix Released |
Chris J Arges (arges) wrote : Update Released | #16 |
The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
Launchpad Janitor (janitor) wrote : | #17 |
This bug was fixed in the package nova - 1:2014.
---------------
nova (1:2014.
* d/nova-
and neutron-ovs-cleanup (LP: #1471022).
-- Edward Hope-Morley <email address hidden> Mon, 06 Jul 2015 12:06:00 +0100
Changed in nova (Ubuntu Utopic): | |
status: | Fix Committed → Fix Released |
Launchpad Janitor (janitor) wrote : | #18 |
This bug was fixed in the package nova - 1:2015.
---------------
nova (1:2015.
* d/nova-
and neutron-ovs-cleanup (LP: #1471022).
-- Edward Hope-Morley <email address hidden> Fri, 03 Jul 2015 18:57:59 +0100
Changed in nova (Ubuntu Vivid): | |
status: | Fix Committed → Fix Released |
So, this is possibly a result of neutron-ovs-cleaunp failing to start at the time nova-compute does the wait-for-state (and implicitly tries to start neutron- ovs-cleanup) due to the fact that openvswitch is not ready to start at that very moment. I am going to attempt to resolve this by making the nova-compute wait-for-state logic more accommodating of the fact that neutron-ovs-cleanup may not be ready to start at the time of the check.