2018-07-25 23:22:53 |
Arjun Baindur |
bug |
|
|
added bug |
2018-07-26 00:26:34 |
Dmitrii Shcherbakov |
bug |
|
|
added subscriber Dmitrii Shcherbakov |
2018-07-30 21:45:33 |
Brian Haley |
bug |
|
|
added subscriber Brian Haley |
2018-08-10 22:20:34 |
Arjun Baindur |
neutron: assignee |
|
Arjun Baindur (abaindur) |
|
2018-08-16 15:06:19 |
Oleg Bondarev |
neutron: status |
New |
Confirmed |
|
2018-08-16 15:06:29 |
Oleg Bondarev |
neutron: importance |
Undecided |
Critical |
|
2018-08-23 05:57:45 |
OpenStack Infra |
neutron: status |
Confirmed |
In Progress |
|
2018-08-23 05:57:45 |
OpenStack Infra |
neutron: assignee |
Arjun Baindur (abaindur) |
Swaminathan Vasudevan (swaminathan-vasudevan) |
|
2018-08-24 08:45:25 |
OpenStack Infra |
neutron: status |
In Progress |
Fix Released |
|
2018-08-24 09:18:32 |
Dmitrii Shcherbakov |
bug task added |
|
cloud-archive |
|
2018-08-24 09:24:25 |
Dmitrii Shcherbakov |
bug task added |
|
neutron (Ubuntu) |
|
2018-09-21 09:22:31 |
Bernard Cafarelli |
tags |
l3-dvr-backlog ovs |
l3-dvr-backlog neutron-proactive-backport-potential ovs |
|
2018-09-21 12:02:25 |
OpenStack Infra |
tags |
l3-dvr-backlog neutron-proactive-backport-potential ovs |
in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs |
|
2018-10-02 18:26:50 |
Corey Bryant |
nominated for series |
|
Ubuntu Cosmic |
|
2018-10-02 18:26:50 |
Corey Bryant |
bug task added |
|
neutron (Ubuntu Cosmic) |
|
2018-10-02 18:26:50 |
Corey Bryant |
nominated for series |
|
Ubuntu Bionic |
|
2018-10-02 18:26:50 |
Corey Bryant |
bug task added |
|
neutron (Ubuntu Bionic) |
|
2018-10-02 18:27:02 |
Corey Bryant |
neutron (Ubuntu Bionic): status |
New |
Triaged |
|
2018-10-02 18:27:06 |
Corey Bryant |
neutron (Ubuntu Cosmic): importance |
Undecided |
High |
|
2018-10-02 18:27:09 |
Corey Bryant |
neutron (Ubuntu Bionic): importance |
Undecided |
High |
|
2018-10-02 18:27:13 |
Corey Bryant |
neutron (Ubuntu Cosmic): status |
New |
Triaged |
|
2018-10-02 18:27:26 |
Corey Bryant |
nominated for series |
|
cloud-archive/pike |
|
2018-10-02 18:27:26 |
Corey Bryant |
bug task added |
|
cloud-archive/pike |
|
2018-10-02 18:27:26 |
Corey Bryant |
nominated for series |
|
cloud-archive/rocky |
|
2018-10-02 18:27:26 |
Corey Bryant |
bug task added |
|
cloud-archive/rocky |
|
2018-10-02 18:27:26 |
Corey Bryant |
nominated for series |
|
cloud-archive/queens |
|
2018-10-02 18:27:26 |
Corey Bryant |
bug task added |
|
cloud-archive/queens |
|
2018-10-02 18:27:36 |
Corey Bryant |
cloud-archive/pike: status |
New |
Triaged |
|
2018-10-02 18:27:39 |
Corey Bryant |
cloud-archive/queens: status |
New |
Triaged |
|
2018-10-02 18:27:42 |
Corey Bryant |
cloud-archive/rocky: status |
New |
Triaged |
|
2018-10-02 18:27:44 |
Corey Bryant |
cloud-archive/queens: importance |
Undecided |
High |
|
2018-10-02 18:27:51 |
Corey Bryant |
cloud-archive/pike: importance |
Undecided |
High |
|
2018-10-02 18:27:53 |
Corey Bryant |
cloud-archive/rocky: importance |
Undecided |
High |
|
2018-10-02 21:25:06 |
Corey Bryant |
cloud-archive/rocky: importance |
High |
Critical |
|
2018-10-02 21:25:15 |
Corey Bryant |
neutron (Ubuntu Cosmic): importance |
High |
Critical |
|
2018-10-02 21:25:20 |
Corey Bryant |
neutron (Ubuntu Bionic): importance |
High |
Critical |
|
2018-10-02 21:26:00 |
Corey Bryant |
cloud-archive/queens: importance |
High |
Critical |
|
2018-10-02 21:26:04 |
Corey Bryant |
cloud-archive/pike: importance |
High |
Critical |
|
2018-10-03 12:12:11 |
Corey Bryant |
cloud-archive/pike: status |
Triaged |
Invalid |
|
2018-10-03 12:12:13 |
Corey Bryant |
cloud-archive/pike: importance |
Critical |
Undecided |
|
2018-10-03 12:17:40 |
Corey Bryant |
description |
Seems like collateral from https://bugs.launchpad.net/neutron/+bug/1751396
In DVR, the distributed gateway port's IP and MAC are shared in the qrouter across all hosts.
The dvr_process_flow on the physical bridge (which replaces the shared router_distributed MAC address with the unique per-host MAC when its the source), is missing, and so is the drop rule which instructs the bridge to drop all traffic destined for the shared distributed MAC.
Because of this, we are seeing the router MAC on the network infrastructure, causing it on flap on br-int on every compute host:
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 1
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 2
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 1
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
Where port 1 is phy-br-vlan, connecting to the physical bridge, and port 11 is the correct local qr-interface. Because these dvr flows are missing on br-vlan, pkts w/ source mac ingress into the host and br-int learns it upstream.
The symptom is when pinging a VM's floating IP, we see occasional packet loss (10-30%), and sometimes the responses are sent upstream by br-int instead of the qrouter, so the ICMP replies come with fixed IP of the replier since no NAT'ing took place, and on the tenant network rather than external network.
When I force net_shared_only to False here, the problem goes away: https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L436
It should we noted we *ONLY* need to do this on our dvr_snat host. The dvr process's are missing on every compute host. But if we shut qrouter on the snat host, FIP functionality works and DVR mac stops flapping on others. Or if we apply fix only to snat host, it works. Perhaps there is something on SNAT node that is unique |
Seems like collateral from https://bugs.launchpad.net/neutron/+bug/1751396
In DVR, the distributed gateway port's IP and MAC are shared in the qrouter across all hosts.
The dvr_process_flow on the physical bridge (which replaces the shared router_distributed MAC address with the unique per-host MAC when its the source), is missing, and so is the drop rule which instructs the bridge to drop all traffic destined for the shared distributed MAC.
Because of this, we are seeing the router MAC on the network infrastructure, causing it on flap on br-int on every compute host:
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 1
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 2
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
1 4 fa:16:3e:42:a2:ec 1
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
root@milhouse:~# ovs-appctl fdb/show br-int | grep fa:16:3e:42:a2:ec
11 4 fa:16:3e:42:a2:ec 0
Where port 1 is phy-br-vlan, connecting to the physical bridge, and port 11 is the correct local qr-interface. Because these dvr flows are missing on br-vlan, pkts w/ source mac ingress into the host and br-int learns it upstream.
The symptom is when pinging a VM's floating IP, we see occasional packet loss (10-30%), and sometimes the responses are sent upstream by br-int instead of the qrouter, so the ICMP replies come with fixed IP of the replier since no NAT'ing took place, and on the tenant network rather than external network.
When I force net_shared_only to False here, the problem goes away: https://github.com/openstack/neutron/blob/stable/pike/neutron/plugins/ml2/drivers/openvswitch/agent/ovs_dvr_neutron_agent.py#L436
It should we noted we *ONLY* need to do this on our dvr_snat host. The dvr process's are missing on every compute host. But if we shut qrouter on the snat host, FIP functionality works and DVR mac stops flapping on others. Or if we apply fix only to snat host, it works. Perhaps there is something on SNAT node that is unique
Ubuntu SRU details:
-------------------
[Impact]
See above
[Test Case]
Deploy OpenStack with dvr enabled and then follow the steps above.
[Regression Potential]
The patches that are backported have already landed upstream in the corresponding stable branches, helping to minimize any regression potential. |
|
2018-10-03 12:17:50 |
Corey Bryant |
bug |
|
|
added subscriber Ubuntu Stable Release Updates Team |
2018-10-04 11:43:21 |
Launchpad Janitor |
neutron (Ubuntu Cosmic): status |
Triaged |
Fix Released |
|
2018-10-05 13:14:36 |
Timo Aaltonen |
neutron (Ubuntu Bionic): status |
Triaged |
Fix Committed |
|
2018-10-05 13:14:39 |
Timo Aaltonen |
bug |
|
|
added subscriber SRU Verification |
2018-10-05 13:14:45 |
Timo Aaltonen |
tags |
in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs |
in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-needed verification-needed-bionic |
|
2018-10-05 18:01:49 |
Corey Bryant |
cloud-archive/queens: status |
Triaged |
Fix Committed |
|
2018-10-05 18:01:51 |
Corey Bryant |
tags |
in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-needed verification-needed-bionic |
in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-needed verification-needed-bionic verification-queens-needed |
|
2018-10-08 21:52:19 |
OpenStack Infra |
cloud-archive/rocky: status |
Triaged |
Fix Committed |
|
2018-10-10 14:03:38 |
Corey Bryant |
tags |
in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-needed verification-needed-bionic verification-queens-needed |
in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-needed |
|
2018-10-10 15:41:58 |
Corey Bryant |
cloud-archive/rocky: status |
Fix Committed |
Fix Released |
|
2018-10-10 15:42:32 |
Corey Bryant |
tags |
in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-needed |
in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-done |
|
2018-10-17 06:58:31 |
OpenStack Infra |
tags |
in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-done |
in-stable-ocata in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-done |
|
2018-10-17 19:38:10 |
Brian Murray |
tags |
in-stable-ocata in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-done |
in-stable-ocata in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-needed-bionic verification-queens-done |
|
2018-10-23 03:52:31 |
OpenStack Infra |
cloud-archive/pike: status |
Invalid |
Fix Committed |
|
2018-11-06 16:52:01 |
Corey Bryant |
cloud-archive/pike: status |
Fix Committed |
Invalid |
|
2018-11-19 15:49:21 |
Corey Bryant |
tags |
in-stable-ocata in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-needed-bionic verification-queens-done |
in-stable-ocata in-stable-queens l3-dvr-backlog neutron-proactive-backport-potential ovs verification-done verification-done-bionic verification-queens-done |
|
2019-01-16 19:44:04 |
Corey Bryant |
neutron (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2019-01-16 19:44:10 |
Corey Bryant |
cloud-archive/queens: status |
Fix Committed |
Fix Released |
|