cross tenant network polution post upgrade to Havana RC2
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Expired
|
Undecided
|
Unassigned |
Bug Description
We've been running Havana RC1 on our internal OpenStack deployment
that we use for QA'ing OpenStack on Ubuntu fine since last week - it
was running b3 prior to that; I bumped all of the packages to RC2 as
available this morning (including neutron and nova) and promptly saw a
whole raft of tenant network access issues which I think might be
related to the same underlying cause.
We run with Neutron OpenvSwitch plugin with GRE overlay networks.
We run multiple tenants with the same IP address ranges accessed via
servers assigned floating IP's; I noticed that I kept getting bumped
from my access server and dug in a bit further in the l3 router
namespace on the gateway node; the arp address of the server was
switching to a port assigned to another tenants instance, indicating
some sort of cross l2 network pollution between tenants.
I appear to have cleaned this up by running:
sudo neutron-ovs-cleanup
on the compute host that had the other tenants instance and then hard
rebooting all of the instances running on that host to re-connect all
of the instances.
I noticed alot of cruft on the integration bridge; this is taken from
a host where I have not done the cleanups steps:
ubuntu@ciguapa:~$ sudo ovs-vsctl show
8aa44160-
Bridge br-int
Port "qvoff030e8d-73"
tag: 4095
Port "tap15d5f03d-af"
tag: 1
Port patch-tun
Port "qvo15d5f03d-af"
Port "tapc143c034-e0"
tag: 3
Port "qvo1b3f5a5f-60"
tag: 4095
Port "qvod43a627c-a0"
Port "tapd43a627c-a0"
tag: 2
Port "qvo8162d068-ce"
tag: 4095
Port "qvoc143c034-e0"
Port br-int
Port "qvoc2e6f8a5-56"
tag: 4095
I guess this might be an artifact of upgrading from b3->RC1->RC2 but
it feels pretty nasty to me.
tags: | added: havana-rc-potential |
Changed in neutron: | |
assignee: | nobody → Mark McClain (markmcclain) |
assignee: | Mark McClain (markmcclain) → Kyle Mestery (mestery) |
tags: |
added: havana-backport-potential removed: havana-rc-potential |
Changed in neutron: | |
status: | New → Incomplete |
Changed in neutron: | |
assignee: | Kyle Mestery (mestery) → nobody |
For fixing bug 12240001 I've disabled arping by default, because it was crashing the kernel under load.
This can be easily restored in the agent configuration; I don't know if that's the root cause, but the unsolicited ARP will update the ARP cache in the broadcast domain of the logical router.
I see many of the interfaces on br-int that you've posted have been put by the agent on the 'dead vlan' (4095). Could they belong to VMs which did not properly shutdown?