Comment 2 for bug 1424096

Revision history for this message
Stephen Ma (stephen-ma) wrote :

Explanation of why this problem is happening.

In this case, the VM created by non-admin tenant. The VM is using a shared network created by the admin tenant. The subnet's interface is tied to an admin-created router. So the qr-port device is also owned by the admin. When a tenant creates a VM using the shared network, the tenant owns the VM's port. So in this case, a VM's port and the qr ports don't have the same tenant ids.

If the VM is created by the admin, the qrouter namespace on the compute node is removed when the VM is removed. However, when the VM is created by a non-admin user, the qrouter namespace stays on the compute node. This show that the neutron api server is running as the owner of the VM, not the admin, during the VM port deletion.
The decision to delete a namespace is made in dvr_deletens_if_no_port. It makes 5 database queries to make the decision. The first query is to get_dvr_routers_by_portid to retrieve the ids of routers affected by the VM port removal. To do this, it has to find the ports on the subnet whose owner is 'network:router_interface_distributed'. In this case, the owner of router-interface port is the admin. Because the context is only the VM owner, no such port is found, so the router list is empty. So no routers needs to be removed from any node. So this is the reason, the admin context is needed to return the true situation.

The admin context is also needed for the other queries made by dvr_deletens_if_no_port. To determine whether a namespace on a compute node needs to be deleted, it needs to find out whether there are other ports using the same network and subnet on the compute node. Because the network is shared, other tenants also may have VMs using the same network on the compute node. Without the admin context, it will only return the ports used by only the tenant. Since the tenant has already deleted the port, the namespace could be removed. For this reason, the following test failed, if the other database queries in the dvr_deletens_if_no_port doesn't have admin context as well:

On a cloud setup with only 1 compute node, given that dvr_deletens_if_no_port calls get_dvr_routers_by_portid using admin context, but the other queries are called without having admin context:

  0. Create the shared network subnet, and router as described in the description.

  1. As tenant 1, create a VM using the shared network. When the VM boots up assign a floating IP to the VM
  2. As tenant 2, repeat (1).
  3. As tenant 2, ping the VM using the floating IP assigned to tenant 2's VM using the FIP. Ping should work. Continue to ping.
  4. As tenant 1, delete the VM.
  5. Now the pings to tenant 2's VM fails.

The reason for the ping failure after step 4 is that the router namespace on the compute node was deleted as a result deleting tenant 1's VM for the reason described above.