DVR routers attached to shared networks aren't being unscheduled from a compute node after deleting the VMs using the shared net
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| neutron |
Undecided
|
Oleg Bondarev | ||
| Juno |
Undecided
|
Unassigned | ||
| Kilo |
Undecided
|
Unassigned |
Bug Description
As the administrator, a DVR router is created and attached to a shared network. The administrator also created the shared network.
As a non-admin tenant, a VM is created with the port using the shared network. The only VM using the shared network is scheduled to a compute node. When the VM is deleted, it is expected the qrouter namespace of the DVR router is removed. But it is not. This doesn't happen with routers attached to networks that are not shared.
The environment consists of 1 controller node and 1 compute node.
Routers having the problem are created by the administrator attached to shared networks that are also owned by the admin:
As the administrator, do the following commands on a setup having 1 compute node and 1 controller node:
1. neutron net-create shared-net -- --shared True
Shared net's uuid is f9ccf1f9-
2. neutron subnet-create --name shared-subnet shared-net 10.0.0.0/16
3. neutron router-create shared-router
Router's UUID is ab78428a-
4. neutron router-
5. neutron router-gateway-set shared-router public
As a non-admin tenant (tenant-id: 95cd5d9c61cf45c
1. neutron net-show shared-net
+------
| Field | Value |
+------
| admin_state_up | True |
| id | f9ccf1f9-
| name | shared-net |
| router:external | False |
| shared | True |
| status | ACTIVE |
| subnets | c4fd4279-
| tenant_id | 2a54d6758fab47f
+------
At this point, there are no VMs using the shared-net network running in the environment.
2. Boot a VM that uses the shared-net network: nova boot ... --nic net-id=
3. Assign a floating IP to the VM "vm_sharednet"
4. Delete "vm_sharednet". On the compute node, the qrouter namespace of the shared router (qrouter-
stack@DVR-
qrouter-
...
This is consistent with the output of "neutron l3-agent-
$ neutron l3-agent-
+------
| id | host | admin_state_up | alive |
+------
| 42f12eb0-
| ff869dc5-
+------
Running the "neutron l3-agent-
$ neutron l3-agent-
Removed router ab78428a-
stack@DVR-
stack@DVR-
This is a workaround to get the qrouter namespace deleted from the compute node. The L3-agent scheduler should have removed the router from the compute node when the VM is deleted.
Changed in neutron: | |
assignee: | nobody → Stephen Ma (stephen-ma) |
Changed in neutron: | |
status: | New → In Progress |
tags: | added: l3-dvr-backlog |
Stephen Ma (stephen-ma) wrote : | #2 |
Explanation of why this problem is happening.
In this case, the VM created by non-admin tenant. The VM is using a shared network created by the admin tenant. The subnet's interface is tied to an admin-created router. So the qr-port device is also owned by the admin. When a tenant creates a VM using the shared network, the tenant owns the VM's port. So in this case, a VM's port and the qr ports don't have the same tenant ids.
If the VM is created by the admin, the qrouter namespace on the compute node is removed when the VM is removed. However, when the VM is created by a non-admin user, the qrouter namespace stays on the compute node. This show that the neutron api server is running as the owner of the VM, not the admin, during the VM port deletion.
The decision to delete a namespace is made in dvr_deletens_
The admin context is also needed for the other queries made by dvr_deletens_
On a cloud setup with only 1 compute node, given that dvr_deletens_
0. Create the shared network subnet, and router as described in the description.
1. As tenant 1, create a VM using the shared network. When the VM boots up assign a floating IP to the VM
2. As tenant 2, repeat (1).
3. As tenant 2, ping the VM using the floating IP assigned to tenant 2's VM using the FIP. Ping should work. Continue to ping.
4. As tenant 1, delete the VM.
5. Now the pings to tenant 2's VM fails.
The reason for the ping failure after step 4 is that the router namespace on the compute node was deleted as a result deleting tenant 1's VM for the reason described above.
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit edbade486102a21
Author: Stephen Ma <email address hidden>
Date: Tue Feb 24 23:31:33 2015 +0000
Router is not unscheduled when the last port is deleted
When checking for ports that are still in use on a DVR router,
the L3 agent scheduler makes the assumption that a port's
network must be owned by the same tenant. This isn't always
true as the admin could have created a shared network that
other tenants may use. The result of this assumption is that
the router associated with the shared network may not be
unscheduled from a VM host when the last VM (created by a
non-admin tenant) using the shared network is deleted from
the compute node.
The owner of a VM may not own all the ports of a shared
network. Other tenants may have VMs using the same shared
network running on the same compute node. Also the VM owner
may not own the router ports. In order to check whether a
router can be unscheduled from a node has to be run with
admin context so all the ports associated with router are
returned from database queries.
This patch fixes this problem by using the admin context to
make the queries needed for the DVR scheduler to make the
correct unschedule decision.
Change-Id: I45477713d7ce16
Closes-bug: #1424096
Changed in neutron: | |
status: | In Progress → Fix Committed |
tags: | added: juno-backport-potential |
Fix proposed to branch: stable/juno
Review: https:/
Fix proposed to branch: stable/kilo
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/juno
commit 036133f932dd86f
Author: Stephen Ma <email address hidden>
Date: Tue Feb 24 23:31:33 2015 +0000
Router is not unscheduled when the last port is deleted
When checking for ports that are still in use on a DVR router,
the L3 agent scheduler makes the assumption that a port's
network must be owned by the same tenant. This isn't always
true as the admin could have created a shared network that
other tenants may use. The result of this assumption is that
the router associated with the shared network may not be
unscheduled from a VM host when the last VM (created by a
non-admin tenant) using the shared network is deleted from
the compute node.
The owner of a VM may not own all the ports of a shared
network. Other tenants may have VMs using the same shared
network running on the same compute node. Also the VM owner
may not own the router ports. In order to check whether a
router can be unscheduled from a node has to be run with
admin context so all the ports associated with router are
returned from database queries.
This patch fixes this problem by using the admin context to
make the queries needed for the DVR scheduler to make the
correct unschedule decision.
(cherry picked from commit edbade486102a21
Conflicts:
Closes-bug: #1424096
Change-Id: I45477713d7ce16
tags: | added: in-stable-juno |
Fix proposed to branch: neutron-pecan
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/kilo
commit 1813da49aded224
Author: Stephen Ma <email address hidden>
Date: Tue Feb 24 23:31:33 2015 +0000
Router is not unscheduled when the last port is deleted
When checking for ports that are still in use on a DVR router,
the L3 agent scheduler makes the assumption that a port's
network must be owned by the same tenant. This isn't always
true as the admin could have created a shared network that
other tenants may use. The result of this assumption is that
the router associated with the shared network may not be
unscheduled from a VM host when the last VM (created by a
non-admin tenant) using the shared network is deleted from
the compute node.
The owner of a VM may not own all the ports of a shared
network. Other tenants may have VMs using the same shared
network running on the same compute node. Also the VM owner
may not own the router ports. In order to check whether a
router can be unscheduled from a node has to be run with
admin context so all the ports associated with router are
returned from database queries.
This patch fixes this problem by using the admin context to
make the queries needed for the DVR scheduler to make the
correct unschedule decision.
Change-Id: I45477713d7ce16
Closes-bug: #1424096
(cherry picked from commit edbade486102a21
tags: | added: in-stable-kilo |
Changed in neutron: | |
milestone: | none → liberty-1 |
status: | Fix Committed → Fix Released |
Changed in neutron: | |
milestone: | liberty-1 → 7.0.0 |
Oleg Bondarev (obondarev) wrote : | #9 |
I faced the bug while reworking unit tests into functional tests: when performing steps described in the description I get:
2015-12-15 17:41:23,484 ERROR [neutron.
Traceback (most recent call last):
File "neutron/
File "neutron/
context, router['agent_id'], router[
File "neutron/
router = self.get_
File "neutron/
router = self._get_
File "neutron/
raise l3.RouterNotFou
RouterNotFound: Router 7d52836b-
and router is not removed from host which has no more dvr serviceable ports.
Looks like we also need admin context in order to remove admin router from a host when non-admin tenant removes last dvr serviceable port on a shared network.
Changed in neutron: | |
status: | Fix Released → Confirmed |
Fix proposed to branch: master
Review: https:/
Changed in neutron: | |
assignee: | Stephen Ma (stephen-ma) → Oleg Bondarev (obondarev) |
status: | Confirmed → In Progress |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 96ba199d733944e
Author: Oleg Bondarev <email address hidden>
Date: Tue Dec 15 17:58:51 2015 +0300
Use admin context when removing DVR router on vm port deletion
In case non-admin tenant removes last VM on a shared network (owned
by admin) connected to a DVR router (also owned by admin) we need
to remove the router from the host where there are no more dvr
serviceable ports. Commit edbade486102a21
fixed logic that determines routers that should be removed from host.
However in order to actually remove the router we also need admin
context.
This was not caught by unit tests and one reason for that is so called
'mock everything' approach which is evil and generally useless.
This patch replaces unit tests with functional tests that we able
to catch the bug.
Closes-Bug: #1424096
Change-Id: Ia6cdf2294562c2
Changed in neutron: | |
status: | In Progress → Fix Released |
This issue was fixed in the openstack/neutron 8.0.0.0b2 development milestone.
tags: | removed: juno-backport-potential |
Fix proposed to branch: stable/liberty
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/liberty
commit 69a384a9af4f0fe
Author: Oleg Bondarev <email address hidden>
Date: Tue Dec 15 17:58:51 2015 +0300
Use admin context when removing DVR router on vm port deletion
In case non-admin tenant removes last VM on a shared network (owned
by admin) connected to a DVR router (also owned by admin) we need
to remove the router from the host where there are no more dvr
serviceable ports. Commit edbade486102a21
fixed logic that determines routers that should be removed from host.
However in order to actually remove the router we also need admin
context.
This was not caught by unit tests and one reason for that is so called
'mock everything' approach which is evil and generally useless.
This patch replaces unit tests with functional tests that we able
to catch the bug.
Closes-Bug: #1424096
Change-Id: Ia6cdf2294562c2
(cherry picked from commit 96ba199d733944e
tags: | added: in-stable-liberty |
This issue was fixed in the openstack/neutron 7.1.0 release.
Fix proposed to branch: master /review. openstack. org/159296
Review: https:/