Some newly created VM's are not able to reach "outside" resources (e.g. apt repositories) on then l3ha + dvr env, I can easily reproduce this problem as long as VM and main router are not on the same host, and 'apt update' command can not be run inside VM, so the north-south traffic is broken.
Here are steps to easily reproduce it.
1, set up wallaby or ussuri vrrp + dvr env (it works on train, not work on ussuri and wallaby)
2, create a test vm, query host by: nova show <VM> |grep host
3, query main router by: neutron l3-agent-list-hosting-router $(openstack router show provider-router -fvalue -cid)
4, make sure VM and main router are not on the same host
5, on main router host, it will fail to run: ip netns exec snat-xxx ping <VM-IP> -c1
I've done some bisect, I found:
15.3.4 (bionic-train) - no problem
1c2e10f859 - no problem
16.4.0 (bionic-ussuri) - has problem
16.0.0-0ubuntu3 - has problem, and also have multiple active routers problem
16.0.0~b3~git2020041516.5f42488a9a-0ubuntu2 - BAD version, all routers are in standby state so we can't do any test
16.1.0 (focal) - has problem, and also have multiple active routers problem
16.2.0 (focal) - has problem
16.3.0 (focal) - has problem
16.4.0 (focal-ussuri) - has problem
focal-wallaby - has problem
Because I often have multiple standby issue with some commit id (eg: 14dd3e95ca) so that I can't continue bisect.
I also used 'ovs-appctl ofproto/trace' and tcpdump to do some debugs, the results are as follows.
Some newly created VM's are not able to reach "outside" resources (e.g. apt repositories) on then l3ha + dvr env, I can easily reproduce this problem as long as VM and main router are not on the same host, and 'apt update' command can not be run inside VM, so the north-south traffic is broken.
Here are steps to easily reproduce it.
1, set up wallaby or ussuri vrrp + dvr env (it works on train, not work on ussuri and wallaby) list-hosting- router $(openstack router show provider-router -fvalue -cid)
2, create a test vm, query host by: nova show <VM> |grep host
3, query main router by: neutron l3-agent-
4, make sure VM and main router are not on the same host
5, on main router host, it will fail to run: ip netns exec snat-xxx ping <VM-IP> -c1
I've done some bisect, I found:
15.3.4 (bionic-train) - no problem b3~git202004151 6.5f42488a9a- 0ubuntu2 - BAD version, all routers are in standby state so we can't do any test
1c2e10f859 - no problem
16.4.0 (bionic-ussuri) - has problem
16.0.0-0ubuntu3 - has problem, and also have multiple active routers problem
16.0.0~
16.1.0 (focal) - has problem, and also have multiple active routers problem
16.2.0 (focal) - has problem
16.3.0 (focal) - has problem
16.4.0 (focal-ussuri) - has problem
focal-wallaby - has problem
Because I often have multiple standby issue with some commit id (eg: 14dd3e95ca) so that I can't continue bisect.
I also used 'ovs-appctl ofproto/trace' and tcpdump to do some debugs, the results are as follows.
train - works /pastebin. ubuntu. com/p/MHNVf8wXt b/ /pastebin. ubuntu. com/p/Fqxp4mvkg V/ /pastebin. ubuntu. com/p/YppWc2Pg3 3/ /pastebin. ubuntu. com/p/MPmQ5xbnT 2/ - can get icmp reply
sg-xxx -> vm - https:/
tcpdump on sg-xxx - https:/
tcpdump on vm's tap - https:/
tcpdump on qr-xxx - https:/
ussuri - not work /pastebin. ubuntu. com/p/hKfSB9gmd 9/ /pastebin. ubuntu. com/p/NCcnGS4gd j/ - sg-xxx can't get icmp reply /pastebin. ubuntu. com/p/DHdVbB66N T/ - VM can't get sg-xxx's arp reply /pastebin. ubuntu. com/p/4hJ7vdRRC 4/ - can't get arp reply
sg-xxx -> vm - https:/
tcpdump on sg-xxx - https:/
tcpdump on vm's tap - https:/
tcpdump on qr-xxx - https:/
It looks like VM can't get arp reply for sg-xxx interface,