Activity log for bug #1945306

Date Who What changed Old value New value Message
2021-09-28 09:41:34 Hua Zhang bug added bug
2021-09-28 09:45:08 Hua Zhang description Some newly created VM's are not able to reach "outside" resources (e.g. apt repositories) on then l3ha + dvr env, I can easily reproduce this problem as long as VM and main router are not on the same host, and 'apt update' command can not be run inside VM, so the north-south traffic is broken. Here are steps to easily reproduce it. 1, set up wallaby or ussuri vrrp + dvr env (it works on train, not work on ussuri and wallaby) 2, create a test vm, query host by: nova show <VM> |grep host 3, query main router by: neutron l3-agent-list-hosting-router $(openstack router show provider-router -fvalue -cid) 4, make sure VM and main router are not on the same host 5, on main router host, it will fail to run: ip netns exec snat-xxx ping <VM-IP> -c1 I've done some bisect, I found: 15.3.4 (bionic-train) - no problem 1c2e10f859 - no problem 16.4.0 (bionic-ussuri) - has problem 16.0.0-0ubuntu3 - has problem, and also have multiple active routers problem 16.0.0~b3~git2020041516.5f42488a9a-0ubuntu2 - BAD version, all routers are in standby state so we can't do any test 16.1.0 (focal) - has problem, and also have multiple active routers problem 16.2.0 (focal) - has problem 16.3.0 (focal) - has problem 16.4.0 (focal-ussuri) - has problem focal-wallaby - has problem Because I often have multiple standby issue with some commit id (eg: 14dd3e95ca) so that I can't continue bisect. I also used 'ovs-appctl ofproto/trace' and tcpdump to do some debugs, the results are as follows. train - works sg-xxx -> vm - https://pastebin.ubuntu.com/p/MHNVf8wXtb/ tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/Fqxp4mvkgV/ tcpdump on vm's tap - https://pastebin.ubuntu.com/p/YppWc2Pg33/ tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/MPmQ5xbnT2/ - can get icmp reply ussuri - not work sg-xxx -> vm - https://pastebin.ubuntu.com/p/hKfSB9gmd9/ tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/NCcnGS4gdj/ - sg-xxx can't get icmp reply tcpdump on vm's tap - https://pastebin.ubuntu.com/p/DHdVbB66NT/ - VM can't get sg-xxx's arp reply tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/4hJ7vdRRC4/ - can't get arp reply It looks like VM can't get arp reply for sg-xxx interface, Some newly created VM's are not able to reach "outside" resources (e.g. apt repositories) on the l3ha + dvr env, this problem can be easily reproduced as long as VM and main router are not on the same host, and 'apt update' command can not be run inside VM, so the north-south traffic is broken. Here are steps to easily reproduce it. 1, set up wallaby or ussuri vrrp + dvr env (it works on train, not work on ussuri and wallaby) 2, create a test vm, query host by: nova show <VM> |grep host 3, query main router by: neutron l3-agent-list-hosting-router $(openstack router show provider-router -fvalue -cid) 4, make sure VM and main router are not on the same host 5, on main router host, it will fail to run: ip netns exec snat-xxx ping <VM-IP> -c1 I've done some bisect, I found: 15.3.4 (bionic-train) - no problem 1c2e10f859 - no problem 16.4.0 (bionic-ussuri) - has problem 16.0.0-0ubuntu3 - has problem, and also have multiple active routers problem 16.0.0~b3~git2020041516.5f42488a9a-0ubuntu2 - BAD version, all routers are in standby state so we can't do any test 16.1.0 (focal) - has problem, and also have multiple active routers problem 16.2.0 (focal) - has problem 16.3.0 (focal) - has problem 16.4.0 (focal-ussuri) - has problem focal-wallaby - has problem Because I often have multiple standby issue with some commit id (eg: 14dd3e95ca) so that I can't continue bisect. I also used 'ovs-appctl ofproto/trace' and tcpdump to do some debugs, the results are as follows. train - works sg-xxx -> vm - https://pastebin.ubuntu.com/p/MHNVf8wXtb/ tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/Fqxp4mvkgV/ tcpdump on vm's tap - https://pastebin.ubuntu.com/p/YppWc2Pg33/ tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/MPmQ5xbnT2/ - can get icmp reply ussuri - not work sg-xxx -> vm - https://pastebin.ubuntu.com/p/hKfSB9gmd9/ tcpdump on sg-xxx - https://pastebin.ubuntu.com/p/NCcnGS4gdj/ - sg-xxx can't get icmp reply tcpdump on vm's tap - https://pastebin.ubuntu.com/p/DHdVbB66NT/ - VM can't get sg-xxx's arp reply tcpdump on qr-xxx - https://pastebin.ubuntu.com/p/4hJ7vdRRC4/ - can't get arp reply It looks like VM can't get arp reply for sg-xxx interface,
2021-09-29 13:07:28 Bence Romsics neutron: status New Triaged
2021-09-29 13:07:35 Bence Romsics neutron: importance Undecided High
2021-09-29 13:08:23 Bence Romsics tags l3-dvr-backlog l3-ha
2021-09-30 11:29:59 Nobuto Murata bug added subscriber Nobuto Murata
2021-10-04 10:21:14 Hemanth Nakkina tags l3-dvr-backlog l3-ha l3-dvr-backlog l3-ha sts
2021-10-04 11:57:41 Bence Romsics summary north-south traffic not working when VM and main router are not on the same host [dvr+l3ha] north-south traffic not working when VM and main router are not on the same host