Activity log for bug #1783654

Date Who What changed Old value New value Message
2018-07-25 23:22:53 Arjun Baindur bug added bug
2018-07-26 00:26:34 Dmitrii Shcherbakov bug added subscriber Dmitrii Shcherbakov
2018-07-30 21:45:33 Brian Haley bug added subscriber Brian Haley
2018-08-10 22:20:34 Arjun Baindur neutron: assignee Arjun Baindur (abaindur)
2018-08-16 15:06:19 Oleg Bondarev neutron: status New Confirmed
2018-08-16 15:06:29 Oleg Bondarev neutron: importance Undecided Critical
2018-08-23 05:57:45 OpenStack Infra neutron: status Confirmed In Progress
2018-08-23 05:57:45 OpenStack Infra neutron: assignee Arjun Baindur (abaindur) Swaminathan Vasudevan (swaminathan-vasudevan)
2018-08-24 08:45:25 OpenStack Infra neutron: status In Progress Fix Released
2018-08-24 09:18:32 Dmitrii Shcherbakov bug task added cloud-archive
2018-08-24 09:24:25 Dmitrii Shcherbakov bug task added neutron (Ubuntu)
2018-09-21 09:22:31 Bernard Cafarelli tags l3-dvr-backlog ovs l3-dvr-backlog neutron-proactive-backport-potential ovs
2018-09-21 12:02:25 OpenStack Infra tags l3-dvr-backlog neutron-proactive-backport-potential ovs in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs
2018-10-02 18:26:50 Corey Bryant nominated for series Ubuntu Cosmic
2018-10-02 18:26:50 Corey Bryant bug task added neutron (Ubuntu Cosmic)
2018-10-02 18:26:50 Corey Bryant nominated for series Ubuntu Bionic
2018-10-02 18:26:50 Corey Bryant bug task added neutron (Ubuntu Bionic)
2018-10-02 18:27:02 Corey Bryant neutron (Ubuntu Bionic): status New Triaged
2018-10-02 18:27:06 Corey Bryant neutron (Ubuntu Cosmic): importance Undecided High
2018-10-02 18:27:09 Corey Bryant neutron (Ubuntu Bionic): importance Undecided High
2018-10-02 18:27:13 Corey Bryant neutron (Ubuntu Cosmic): status New Triaged
2018-10-02 18:27:26 Corey Bryant nominated for series cloud-archive/pike
2018-10-02 18:27:26 Corey Bryant bug task added cloud-archive/pike
2018-10-02 18:27:26 Corey Bryant nominated for series cloud-archive/rocky
2018-10-02 18:27:26 Corey Bryant bug task added cloud-archive/rocky
2018-10-02 18:27:26 Corey Bryant nominated for series cloud-archive/queens
2018-10-02 18:27:26 Corey Bryant bug task added cloud-archive/queens
2018-10-02 18:27:36 Corey Bryant cloud-archive/pike: status New Triaged
2018-10-02 18:27:39 Corey Bryant cloud-archive/queens: status New Triaged
2018-10-02 18:27:42 Corey Bryant cloud-archive/rocky: status New Triaged
2018-10-02 18:27:44 Corey Bryant cloud-archive/queens: importance Undecided High
2018-10-02 18:27:51 Corey Bryant cloud-archive/pike: importance Undecided High
2018-10-02 18:27:53 Corey Bryant cloud-archive/rocky: importance Undecided High
2018-10-02 21:25:06 Corey Bryant cloud-archive/rocky: importance High Critical
2018-10-02 21:25:15 Corey Bryant neutron (Ubuntu Cosmic): importance High Critical
2018-10-02 21:25:20 Corey Bryant neutron (Ubuntu Bionic): importance High Critical
2018-10-02 21:26:00 Corey Bryant cloud-archive/queens: importance High Critical
2018-10-02 21:26:04 Corey Bryant cloud-archive/pike: importance High Critical
2018-10-03 12:12:11 Corey Bryant cloud-archive/pike: status Triaged Invalid
2018-10-03 12:12:13 Corey Bryant cloud-archive/pike: importance Critical Undecided
2018-10-03 12:17:40 Corey Bryant description Seems like collateral from https://bugs.launchpad.net/neutron/+bug/1751396 In DVR, the distributed gateway port's IP and MAC are shared in the qrouter across all hosts. The dvr_process_flow on the physical bridge (which replaces the shared router_distributed MAC address with the unique per-host MAC when its the source), is missing, and so is the drop rule which instructs the bridge to drop all traffic destined for the shared distributed MAC. Because of this, we are seeing the router MAC on the network infrastructure, causing it on flap on br-int on every compute host: root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 1 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 2 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 1 4 fa:16:3e:42:a2:ec 1 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec 11 4 fa:16:3e:42:a2:ec 0 Where port 1 is phy-br-vlan, connecting to the physical bridge, and port 11 is the correct local qr-interface. Because these dvr flows are missing on br-vlan, pkts w/ source mac ingress into the host and br-int learns it upstream. The symptom is when pinging a VM's floating IP, we see occasional packet loss (10-30%), and sometimes the responses are sent upstream by br-int instead of the qrouter, so the ICMP replies come with fixed IP of the replier since no NAT'ing took place, and on the tenant network rather than external network. When I force net_shared_only to False here, the problem goes away: https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L436 It should we noted we *ONLY* need to do this on our dvr_snat host. The dvr process's are missing on every compute host. But if we shut qrouter on the snat host, FIP functionality works and DVR mac stops flapping on others. Or if we apply fix only to snat host, it works. Perhaps there is something on SNAT node that is unique Seems like collateral from https://bugs.launchpad.net/neutron/+bug/1751396 In DVR, the distributed gateway port's IP and MAC are shared in the qrouter across all hosts. The dvr_process_flow on the physical bridge (which replaces the shared router_distributed MAC address with the unique per-host MAC when its the source), is missing, and so is the drop rule which instructs the bridge to drop all traffic destined for the shared distributed MAC. Because of this, we are seeing the router MAC on the network infrastructure, causing it on flap on br-int on every compute host: root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec    11 4 fa:16:3e:42:a2:ec 1 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec    11 4 fa:16:3e:42:a2:ec 2 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec     1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec    11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec    11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec     1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec     1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec     1 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec     1 4 fa:16:3e:42:a2:ec 1 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec    11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec    11 4 fa:16:3e:42:a2:ec 0 root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec    11 4 fa:16:3e:42:a2:ec 0 Where port 1 is phy-br-vlan, connecting to the physical bridge, and port 11 is the correct local qr-interface. Because these dvr flows are missing on br-vlan, pkts w/ source mac ingress into the host and br-int learns it upstream. The symptom is when pinging a VM's floating IP, we see occasional packet loss (10-30%), and sometimes the responses are sent upstream by br-int instead of the qrouter, so the ICMP replies come with fixed IP of the replier since no NAT'ing took place, and on the tenant network rather than external network. When I force net_shared_only to False here, the problem goes away: https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L436 It should we noted we *ONLY* need to do this on our dvr_snat host. The dvr process's are missing on every compute host. But if we shut qrouter on the snat host, FIP functionality works and DVR mac stops flapping on others. Or if we apply fix only to snat host, it works. Perhaps there is something on SNAT node that is unique Ubuntu SRU details: ------------------- [Impact] See above [Test Case] Deploy OpenStack with dvr enabled and then follow the steps above. [Regression Potential] The patches that are backported have already landed upstream in the corresponding stable branches, helping to minimize any regression potential.
2018-10-03 12:17:50 Corey Bryant bug added subscriber Ubuntu Stable Release Updates Team
2018-10-04 11:43:21 Launchpad Janitor neutron (Ubuntu Cosmic): status Triaged Fix Released
2018-10-05 13:14:36 Timo Aaltonen neutron (Ubuntu Bionic): status Triaged Fix Committed
2018-10-05 13:14:39 Timo Aaltonen bug added subscriber SRU Verification
2018-10-05 13:14:45 Timo Aaltonen tags in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-needed verification-needed-bionic
2018-10-05 18:01:49 Corey Bryant cloud-archive/queens: status Triaged Fix Committed
2018-10-05 18:01:51 Corey Bryant tags in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-needed verification-needed-bionic in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-needed verification-needed-bionic verification-queens-needed
2018-10-08 21:52:19 OpenStack Infra cloud-archive/rocky: status Triaged Fix Committed
2018-10-10 14:03:38 Corey Bryant tags in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-needed verification-needed-bionic verification-queens-needed in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-needed
2018-10-10 15:41:58 Corey Bryant cloud-archive/rocky: status Fix Committed Fix Released
2018-10-10 15:42:32 Corey Bryant tags in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-needed in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-done
2018-10-17 06:58:31 OpenStack Infra tags in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-done in-stable-ocata in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-done
2018-10-17 19:38:10 Brian Murray tags in-stable-ocata in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-done in-stable-ocata in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-needed-bionic verification-queens-done
2018-10-23 03:52:31 OpenStack Infra cloud-archive/pike: status Invalid Fix Committed
2018-11-06 16:52:01 Corey Bryant cloud-archive/pike: status Fix Committed Invalid
2018-11-19 15:49:21 Corey Bryant tags in-stable-ocata in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-needed-bionic verification-queens-done in-stable-ocata in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-done
2019-01-16 19:44:04 Corey Bryant neutron (Ubuntu Bionic): status Fix Committed Fix Released
2019-01-16 19:44:10 Corey Bryant cloud-archive/queens: status Fix Committed Fix Released