test_reassign_port_between_servers failing with tap device is busy errors in neutron xenial jobs since 7/28
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
High
|
Matt Riedemann | ||
| Newton |
Medium
|
Tony Breeds | ||
| Ocata |
Medium
|
Matt Riedemann |
Bug Description
Recent failures showing up in the tempest tests for 'test_reassign_
From n-cpu.log (http://
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
2016-07-29 07:25:47.948 20203 ERROR nova.virt.
Stuart McLaren (stuart-mclaren) wrote : | #1 |
Matt Riedemann (mriedem) wrote : Re: interface attach tests failing with tap device is busy errors in neutron xenial jobs since 7/28 | #2 |
logstash shows it starting on 7/28: http://
summary: |
- failure attaching interface + interface attach tests failing with tap device is busy errors in neutron + xenial jobs since 7/28 |
Changed in nova: | |
status: | New → Confirmed |
importance: | Undecided → High |
Matt Riedemann (mriedem) wrote : | #3 |
This change switched the neutron gate jobs over to xenial:
Matt Riedemann (mriedem) wrote : | #4 |
Looks like it might just be failing in this test:
tempest.
Matt Riedemann (mriedem) wrote : | #5 |
Looks like possibly an issue with glean?
attach device xml:
<interface type="bridge">
<mac address=
<model type="virtio"/>
<driver name="qemu"/>
<source bridge=
<target dev="tap953222b
</interface>
Jul 29 12:59:21 ubuntu-
Jul 29 12:59:21 ubuntu-
Jul 29 12:59:21 ubuntu-
Jul 29 12:59:21 ubuntu-
Jul 29 12:59:21 ubuntu-
Jul 29 12:59:21 ubuntu-
Jul 29 12:59:21 ubuntu-
Jul 29 12:59:21 ubuntu-
Jul 29 12:59:21 ubuntu-
Jul 29 12:59:21 ubuntu-
summary: |
- interface attach tests failing with tap device is busy errors in neutron - xenial jobs since 7/28 + test_reassign_port_between_servers failing with tap device is busy + errors in neutron xenial jobs since 7/28 |
Matt Riedemann (mriedem) wrote : | #6 |
The device detach from the guest in libvirt is asynchronous, so this must be much slower in libvirt 1.3.1 on xenial nodes, so neutron is telling us that the port is detach (device_id is None on the port) after detach - which is what the tempest test is polling on - before the interface is actually detached from the guest.
So we probably need a retry loop in detach_interface in the libvirt driver (like we have for detach_volume) to retry until timeout for the interface to be gone from the guest and consider the detach successful.
Matt Riedemann (mriedem) wrote : | #7 |
However, detach_interface is a cast from compute API to the compute manager, so the Tempest test doesn't really have a way to poll that the interface is actually detached from the guest (beyond doing something like ssh'ing into the guest to verify the interface with the given mac is gone).
Matt Riedemann (mriedem) wrote : | #8 |
Nevermind, it's the compute manager in nova that's telling neutron that the port is no longer bound:
We first call the virt driver to detach the interface (which is async) and then update the port telling neutron that the device_id is '', which is what tempest is waiting for.
So if we add a poll / retry in the libvirt guest module for the detach, we can delay the port update which tempest is waiting for and we should be good.
Changed in nova: | |
status: | Confirmed → Triaged |
assignee: | nobody → Matt Riedemann (mriedem) |
Matt Riedemann (mriedem) wrote : | #9 |
Skipping the test in tempest for now: https:/
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
status: | Triaged → In Progress |
Sean Dague (sdague) wrote : | #11 |
Not seen in the gate any more, the fixing patch is in merge conflict and really old
Changed in nova: | |
status: | In Progress → Invalid |
assignee: | Matt Riedemann (mriedem) → nobody |
Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https:/
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.
Changed in nova: | |
assignee: | nobody → Matt Riedemann (mriedem) |
status: | Invalid → In Progress |
Related fix proposed to branch: master
Review: https:/
Matt Riedemann (mriedem) wrote : | #14 |
It's going to be hard getting this back to Newton because it depends on:
https:/
Which isn't in Newton. It's in stable/ocata though.
As noted in that patch, it's tests are broken too and being fixed here:
tags: | added: libvirt neutron |
Matt Riedemann (mriedem) wrote : | #15 |
So to get this to Newton, we'd need to backport:
https:/
And then:
https:/
And then:
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 1ecd71b08d14450
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 9 18:41:11 2017 -0500
libvirt: fix and break up _test_attach_
The detach_interface flow in this test was broken because
it wasn't mocking out domain.
it was expecting to be passed to that method wasn't actually
being verified. The same thing is broken in test
test_
it copies the other broken test code.
This change breaks apart the monster attach/detach test method
and converts the detach_interface portion to mock and fixes
the broken assertion.
test_
fixed, not converted to mock.
Change-Id: I6d9a975876c565
Related-Bug: #1607714
Related fix proposed to branch: stable/ocata
Review: https:/
Fix proposed to branch: stable/ocata
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit a3b3e8d8314b0ce
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 9 15:54:41 2017 -0500
libvirt: wait for interface detach from the guest
The test_reassign_
a port in neutron and two servers. It attaches the port to the
first server and then quickly detaches it and waits for the
port.device_id to be unbound from the server. Then it repeats
that for the second server.
The interface detach from the guest is asynchronous and happens
before nova unbinds the port, so there is a race where the port's
device_id is unset but the interface is still on the first guest
when we try to attach to the second guest, which fails.
This is a latent bug, but apparently has been tickled by the
move to our neutron CI jobs to use ubuntu xenial nodes.
The fix is to add a detach and retry loop on the interface detach
on the guest so that we can wait until the interface is gone
from the guest before nova unbinds the port in neutron, which is
what the Tempest test is waiting for. Then the device should be
available for attaching to the second guest.
This is similar to what we do with detaching volumes.
Closes-Bug: #1607714
Change-Id: Ic04aad8923ea2e
Changed in nova: | |
status: | In Progress → Fix Released |
Related fix proposed to branch: stable/newton
Review: https:/
Fix proposed to branch: stable/newton
Review: https:/
This issue was fixed in the openstack/nova 16.0.0.0b2 development milestone.
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/ocata
commit c0820944ea8554f
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 9 18:41:11 2017 -0500
libvirt: fix and break up _test_attach_
The detach_interface flow in this test was broken because
it wasn't mocking out domain.
it was expecting to be passed to that method wasn't actually
being verified. The same thing is broken in test
test_
it copies the other broken test code.
This change breaks apart the monster attach/detach test method
and converts the detach_interface portion to mock and fixes
the broken assertion.
test_
fixed, not converted to mock.
Change-Id: I6d9a975876c565
Related-Bug: #1607714
(cherry picked from commit 1ecd71b08d14450
tags: | added: in-stable-ocata |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/ocata
commit 02ad4f862a7c5b5
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 9 15:54:41 2017 -0500
libvirt: wait for interface detach from the guest
The test_reassign_
a port in neutron and two servers. It attaches the port to the
first server and then quickly detaches it and waits for the
port.device_id to be unbound from the server. Then it repeats
that for the second server.
The interface detach from the guest is asynchronous and happens
before nova unbinds the port, so there is a race where the port's
device_id is unset but the interface is still on the first guest
when we try to attach to the second guest, which fails.
This is a latent bug, but apparently has been tickled by the
move to our neutron CI jobs to use ubuntu xenial nodes.
The fix is to add a detach and retry loop on the interface detach
on the guest so that we can wait until the interface is gone
from the guest before nova unbinds the port in neutron, which is
what the Tempest test is waiting for. Then the device should be
available for attaching to the second guest.
This is similar to what we do with detaching volumes.
Closes-Bug: #1607714
Change-Id: Ic04aad8923ea2e
(cherry picked from commit a3b3e8d8314b0ce
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/newton
commit 70c44831a5cdf60
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 9 18:41:11 2017 -0500
libvirt: fix and break up _test_attach_
The detach_interface flow in this test was broken because
it wasn't mocking out domain.
it was expecting to be passed to that method wasn't actually
being verified. The same thing is broken in test
test_
it copies the other broken test code.
This change breaks apart the monster attach/detach test method
and converts the detach_interface portion to mock and fixes
the broken assertion.
test_
fixed, not converted to mock.
Conflicts:
NOTE(mriedem): The conflict is due to change
I5c461a8242
Change-Id: I6d9a975876c565
Related-Bug: #1607714
(cherry picked from commit 1ecd71b08d14450
(cherry picked from commit c0820944ea8554f
tags: | added: in-stable-newton |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/newton
commit 1e66b034eb2c717
Author: Matt Riedemann <email address hidden>
Date: Thu Feb 9 15:54:41 2017 -0500
libvirt: wait for interface detach from the guest
The test_reassign_
a port in neutron and two servers. It attaches the port to the
first server and then quickly detaches it and waits for the
port.device_id to be unbound from the server. Then it repeats
that for the second server.
The interface detach from the guest is asynchronous and happens
before nova unbinds the port, so there is a race where the port's
device_id is unset but the interface is still on the first guest
when we try to attach to the second guest, which fails.
This is a latent bug, but apparently has been tickled by the
move to our neutron CI jobs to use ubuntu xenial nodes.
The fix is to add a detach and retry loop on the interface detach
on the guest so that we can wait until the interface is gone
from the guest before nova unbinds the port in neutron, which is
what the Tempest test is waiting for. Then the device should be
available for attaching to the second guest.
This is similar to what we do with detaching volumes.
Closes-Bug: #1607714
Conflicts:
NOTE(mriedem): The conflict is due to change
I5c461a8242
Change-Id: Ic04aad8923ea2e
(cherry picked from commit a3b3e8d8314b0ce
(cherry picked from commit ca0a46e36615f22
This issue was fixed in the openstack/nova 15.0.7 release.
This issue was fixed in the openstack/nova 14.0.8 release.
Another instance:
http:// logs.openstack. org/52/ 347352/ 8/check/ gate-tempest- dsvm-neutron- full-ubuntu- xenial/ 1f82cb6/ console. html