VM receives incorrect routing information

Bug #1683261 reported by Zhiyuan Cai
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Tricircle
Invalid
Undecided
Unassigned

Bug Description

For cross-region network, we only create one DHCP port for all the regions. Such simplification works because one VM port will not cross regions, only local Neutron locates in the same region with the VM has the IP/MAC information of that VM port. Though it's possible that several Dnsmasq daemons receive the same DHCP request, only one will response.

However, after the implementation of cross-region VxLAN network, shadow port is introduced. We find that DHCP agent will also add IP/MAC information of shadow port to the Dnsmasq addn_hosts file. In this case, more than one Dnsmasq daemon may response to the same DHCP request and VM may receive incorrect routing information.

Revision history for this message
Zhiyuan Cai (luckyvega-g) wrote :

This problem may occur when both local and non-local networks are attached to a non-local router and this non-local router is not for north-south networking purpose.

Not quite get the complete solution to this problem. Real core plugin(ML2) will create the port in DB, bind the port then send notification in "create_port" method. DHCP agent will add the shadow port to its addn_hosts file after it receives the notification. So there's no easy hacking point for us.

One workaround is to deploy DHCP agent in a host where there is no VMs(a dedicated network node). Since we don't create shadow port for DHCP port, compute node in one OpenStack cloud will not create tunnel to the dedicated network node in another OpenStack cloud, so VMs in the compute node will not receive DHCP response from DHCP agent from other OpenStack clouds. L3 agent can also be deployed in the dedicated network node because we allocate different gateway IPs for each OpenStack clouds, VMs only need to talk to router in its own OpenStack cloud.

Revision history for this message
Mathieu Goessens (gebura) wrote :

Hi,

As I understand it, the DHCP server may reply to VMs with a shadow port IP, which won't be able to route the packets as it will be unreachable.

Thus, a simple workaround would be to add those IP to the router, then the VMs can reach it, and route traffic through it as expected. Thats what I am actually doing using the following script. While not ideal, especially from a design point of view, it seems to work fine, at least on stable/queens (did not test stable/rocky nor master)

-- snip --
source $HOME/devstack/openrc admin admin
unset OS_REGION_NAME
export R=--os-region-name=CentralRegion
export R1=--os-region-name=RegionOne
export R2=--os-region-name=RegionTwo
export R3=--os-region-name=RegionThree

# Aliases to work on specific ip netns, may depend of your configuration / topology
netns1=$(sudo ip netns | grep qrouter | head -n1)
netns1exec="sudo ip netns exec $netns1"
netns2=$(sudo ip netns | grep qrouter | tail -n1)
netns2exec="sudo ip netns exec $netns2"

# Find the shadow port IPs. More filtering would be better, but it seems to be fine.
nexthops=$(openstack port list $R | grep interface | grep -v 100 | awk -F" " '{print $8}' | cut -d\' -f2)

# Add all those IPs to the relevant interface in the proper ip netns
for nexthop in $nexthops; do
 interface=$($netns1exec ip route show $nexthop/24 | cut -d" " -f 3);
 $netns1exec ip addr add $nexthop/24 dev $interface 2>/dev/null
done

-- snip --

baisen (song1)
Changed in tricircle:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.