bgpvpn router fallback broken by change in neutron openvswitch firewall

Bug #1789878 reported by Thomas Morin
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
BaGPipe
Fix Released
High
Thomas Morin
networking-bgpvpn
Fix Released
High
Thomas Morin
neutron
Fix Released
High
Nguyen Phuong An

Bug Description

This issue impacts current master, stable/rocky and stable/queens.

The first symptom is that we have seen failures of many tests from legacy-tempest-dsvm-networking-bgpvpn-bagpipe since the merge of [1] in neutron code (August 14th).

Background:

networking-bagpipe code for BGPVPN has a "router fallback" mechanism: in cases where a network is at the same time connected to a Router and associated to a BGPVPN, the traffic sent by a VM to its gateway is redirected to br-mpls to attempt BGPVPN route matching, before eventually being sent, as a fallback, to the neutron netns router if it did no VPN route was matched in br-mpls.

For this mechanism to work, a rule is in place in table 91 to override the NORMAL action (which would result in flood/learn) for the traffic destinated to the gateway MAC address, with a higher priority rule that sends the traffic to br-tun instead (br-tun is where the redirection to br-mpls takes place):

 cookie=0x8b0cf47ac991c941, duration=5371.870s, table=91, n_packets=217, n_bytes=21266, priority=2,reg6=0x18,dl_dst=fa:16:3e:c5:89:72 actions=mod_vlan_vid:24,output:"patch-tun"
 cookie=0x89f8a81c314f2696, duration=71265.896s, table=91, n_packets=338, n_bytes=27094, priority=1 actions=NORMAL

(above, fa:16:3e:c5:89:72 is the gateway MAC address for the network with vlan_id 24)

Analysis of the issue:

Change [1] modified some rules that were resubmiting to table 91, to instaead use a NORMAL action, resulting in only the first packets (from a conntrack standpoint) to reach table 91.

This prevents the redirection of traffic to br-tun,br-mpls.

The tricky thing is that the issue does not always occurs: when there is no entry in the MAC leaning table (ovs-appctl fdb/show br-int) for the gateway MAC, the traffic is flooded and eventually reaches br-tun,br-mpls . This explains why some tests, but not all tests, fail.
(not also that the tests where no Router is used in the destination network do not seem to fail.)

[1] https://review.openstack.org/#/q/Ib6ced838a7ec6d5c459a8475318556001c31bdf

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

I think that the following fix would work: in the places where resubmit(,91) was replaced by NORMAL, we could do a resubmit(,99) (table 99 would be a new table). In this table 99, we would put the NORMAL action.

Then networking-bagpipe BGPVPN "router fallback" code would put its override rule in table 99 instead of table 91.

Nguyen Phuong An (annp)
Changed in neutron:
assignee: nobody → Nguyen Phuong An (annp)
description: updated
tags: added: ovs-fw
Changed in neutron:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

Thanks for assigning yourself annp :)

Do you think my suggested solution can work good enough to not regress on bug 1782576 ?
(I can't tell, because I'm not sure what the performance question was in that context)

Don't hesitate to ping me on IRC to discuss the solution.

Revision history for this message
Nguyen Phuong An (annp) wrote :

Hi Thomas,

> Do you think my suggested solution can work good enough to not regress on bug 1782576 ?

Yes, I think your suggested solution will work.

> (I can't tell, because I'm not sure what the performance question was in that context):

For security group log, we only want to log first packet of each connection session. So we have changed that to ovsfw only forward first packet to table=91. Then sg log will add flow log with action=NORMAL,CONTROLLER.

Changed in bgpvpn:
status: New → Confirmed
importance: Undecided → High
Changed in neutron:
assignee: Nguyen Phuong An (annp) → Thomas Morin (tmmorin-orange)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/598593

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-bgpvpn (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/599310

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

Related fix proposed to networking-bagpipe (branch: master)
Review: https://review.openstack.org/#/c/598677/

Changed in bgpvpn:
assignee: nobody → Thomas Morin (tmmorin-orange)
Changed in networking-bagpipe:
assignee: nobody → Thomas Morin (tmmorin-orange)
Changed in networking-bagpipe:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.openstack.org/599321

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-bgpvpn (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/599431

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-bgpvpn (master)

Reviewed: https://review.openstack.org/599310
Committed: https://git.openstack.org/cgit/openstack/networking-bgpvpn/commit/?id=4897b5c7d0d14a84b9c44aba3fa8d88d1f2d811c
Submitter: Zuul
Branch: master

commit 4897b5c7d0d14a84b9c44aba3fa8d88d1f2d811c
Author: Thomas Morin <email address hidden>
Date: Mon Sep 3 10:08:52 2018 +0200

    tempest: temporarily disable some tests until bug 1789878 is fixed

    This change temporarily disable some tempests tests until
    bug 1789878 is fixed.

    It also puts an upper constraints to our networking-odl dependency,
    which now draws networking-odl 13+ which requires ceilometer
    with a "-e git..." that does not play nice with "pip install -c ...".

    Change-Id: I67e8834c09f475b997edf9bf095fa1ac6595e4e0
    Related-Bug: 1789878

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-bgpvpn (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/599528

Changed in neutron:
assignee: Thomas Morin (tmmorin-orange) → Nguyen Phuong An (annp)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/599622

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-bgpvpn (stable/rocky)

Reviewed: https://review.openstack.org/599431
Committed: https://git.openstack.org/cgit/openstack/networking-bgpvpn/commit/?id=c24cf38ab81876d504dd719da4ddaba55351bda2
Submitter: Zuul
Branch: stable/rocky

commit c24cf38ab81876d504dd719da4ddaba55351bda2
Author: Thomas Morin <email address hidden>
Date: Mon Sep 3 10:08:52 2018 +0200

    tempest: temporarily disable some tests until bug 1789878 is fixed

    This change temporarily disable some tempests tests until
    bug 1789878 is fixed.

    It also puts an upper constraints to our networking-odl dependency,
    which now draws networking-odl 13+ which requires ceilometer
    with a "-e git..." that does not play nice with "pip install -c ...".

    Change-Id: I67e8834c09f475b997edf9bf095fa1ac6595e4e0
    Related-Bug: 1789878
    (cherry-picked from 4897b5c7d0d14a84b9c44aba3fa8d88d1f2d811c)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/598593
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=9feb5db61c68c7b95846e80449505f4f1617dd2a
Submitter: Zuul
Branch: master

commit 9feb5db61c68c7b95846e80449505f4f1617dd2a
Author: Thomas Morin <email address hidden>
Date: Fri Aug 31 12:34:39 2018 +0200

    ovs fw: apply the NORMAL action on egress traffic in a single table

    This change is a follow-up to Ib6ced838a7ec6d5c459a8475318556001c31bdf,
    reintroducing a single place for applying the NORMAL action to
    egress traffic, which is necessary to fix a regression introduced
    by Ib6ced838a7ec6d5c459a8475318556001c31bdf.

    Change-Id: I60d299275effd9ef35c8007773d3c9fcabfa50fa
    Partial-Bug: 1789878

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-bagpipe (master)

Reviewed: https://review.openstack.org/598677
Committed: https://git.openstack.org/cgit/openstack/networking-bagpipe/commit/?id=3867733d640fff4d70288ce845173c86b2b0b8af
Submitter: Zuul
Branch: master

commit 3867733d640fff4d70288ce845173c86b2b0b8af
Author: Thomas Morin <email address hidden>
Date: Fri Aug 31 14:11:12 2018 +0200

    bgpvpn: override NORMAL action in ACCEPTED_EGRESS_TRAFFIC_NORMAL_TABLE

    Bug 1789878 describes why the current implementation of traffic
    redirection for router fallback has been defeated by change
    Ib6ced838a7ec6d5c459a8475318556001c31bdf which was brought in the neutron
    ovs firewalling code.

    This change moves the place where we put a higher priority
    rule to override the NORMAL action to the ACCEPTED_EGRESS_TRAFFIC_NORMAL_TABLE,
    which is added by I60d299275effd9ef35c8007773d3c9fcabfa50fa to
    re-introduce a single table where NORMAL happens for egress traffic.

    To allow this change to pass the lower-constraints requirement
    check job is temporarily set to non-voting, because this job
    does not seem to be able to take into account the fact that the
    master branch needs neutron master.

    Change-Id: I8923ca9583b3337556aaf98b4f25732ea6b7bb1c
    Depends-On: https://review.openstack.org/598593
    Closes-Bug: 1789878

Changed in networking-bagpipe:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/600362

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-bagpipe (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/600370

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to networking-bagpipe (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/600377

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-bgpvpn (master)

Reviewed: https://review.openstack.org/599321
Committed: https://git.openstack.org/cgit/openstack/networking-bgpvpn/commit/?id=aa96e0e1a9316a24976db03a3897b7f7e123a1c3
Submitter: Zuul
Branch: master

commit aa96e0e1a9316a24976db03a3897b7f7e123a1c3
Author: Thomas Morin <email address hidden>
Date: Mon Sep 3 10:46:19 2018 +0200

    tempest: reenable tests now that bug 1789878 is fixed

    Change-Id: Ibd5389d333e46e3f30dd29022932d78c89c85410
    Depends-On: https://review.openstack.org/598677
    Related-Bug: 1789878

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-bgpvpn (stable/queens)

Reviewed: https://review.openstack.org/599528
Committed: https://git.openstack.org/cgit/openstack/networking-bgpvpn/commit/?id=036b177ac0a2712207c9b2dccbb7df015ebaaef7
Submitter: Zuul
Branch: stable/queens

commit 036b177ac0a2712207c9b2dccbb7df015ebaaef7
Author: Thomas Morin <email address hidden>
Date: Mon Sep 3 10:08:52 2018 +0200

    tempest: temporarily disable some tests until bug 1789878 is fixed

    This change temporarily disable some tempests tests until
    bug 1789878 is fixed.

    Note well: this is a cherry-pick with modifications.
    On stable/queens, bug 1789878 prevents all scenarios
    tests from succeeding, or at least reliably so. This
    is why this change skips all scenario tests.

    Change-Id: I67e8834c09f475b997edf9bf095fa1ac6595e4e0
    Related-Bug: 1789878
    (cherry-picked from 4897b5c7d0d14a84b9c44aba3fa8d88d1f2d811c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/599622
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=116e73ba9b9e86d26e4f10be6cfd436b46390143
Submitter: Zuul
Branch: stable/rocky

commit 116e73ba9b9e86d26e4f10be6cfd436b46390143
Author: Thomas Morin <email address hidden>
Date: Fri Aug 31 12:34:39 2018 +0200

    ovs fw: apply the NORMAL action on egress traffic in a single table

    This change is a follow-up to Ib6ced838a7ec6d5c459a8475318556001c31bdf,
    reintroducing a single place for applying the NORMAL action to
    egress traffic, which is necessary to fix a regression introduced
    by Ib6ced838a7ec6d5c459a8475318556001c31bdf.

    Change-Id: I60d299275effd9ef35c8007773d3c9fcabfa50fa
    Partial-Bug: 1789878

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/600362
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=02e9a3c4513b04d9f2ef9f1fb8de44c0344bb6f2
Submitter: Zuul
Branch: stable/queens

commit 02e9a3c4513b04d9f2ef9f1fb8de44c0344bb6f2
Author: Thomas Morin <email address hidden>
Date: Fri Aug 31 12:34:39 2018 +0200

    ovs fw: apply the NORMAL action on egress traffic in a single table

    This change is a follow-up to Ib6ced838a7ec6d5c459a8475318556001c31bdf,
    reintroducing a single place for applying the NORMAL action to
    egress traffic, which is necessary to fix a regression introduced
    by Ib6ced838a7ec6d5c459a8475318556001c31bdf.

    Change-Id: I60d299275effd9ef35c8007773d3c9fcabfa50fa
    Partial-Bug: 1789878
    (cherry picked from commit 9feb5db61c68c7b95846e80449505f4f1617dd2a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-bagpipe (stable/queens)

Reviewed: https://review.openstack.org/600377
Committed: https://git.openstack.org/cgit/openstack/networking-bagpipe/commit/?id=e045b524224946fba69669a0be5931dcf548e447
Submitter: Zuul
Branch: stable/queens

commit e045b524224946fba69669a0be5931dcf548e447
Author: Thomas Morin <email address hidden>
Date: Fri Aug 31 14:11:12 2018 +0200

    bgpvpn: override NORMAL action in ACCEPTED_EGRESS_TRAFFIC_NORMAL_TABLE

    Bug 1789878 describes why the current implementation of traffic
    redirection for router fallback has been defeated by change
    Ib6ced838a7ec6d5c459a8475318556001c31bdf which was brought in the neutron
    ovs firewalling code.

    This change moves the place where we put a higher priority
    rule to override the NORMAL action to the ACCEPTED_EGRESS_TRAFFIC_NORMAL_TABLE,
    which is added by I60d299275effd9ef35c8007773d3c9fcabfa50fa to
    re-introduce a single table where NORMAL happens for egress traffic.

    To allow this change to pass the lower-constraints requirement
    check job is temporarily set to non-voting, because this job
    does not seem to be able to take into account the fact that the
    master branch needs neutron master.

    Change-Id: I8923ca9583b3337556aaf98b4f25732ea6b7bb1c
    Depends-On: https://review.openstack.org/600362
    Closes-Bug: 1789878
    (cherry-picked from 3867733d640fff4d70288ce845173c86b2b0b8af)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to networking-bgpvpn (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/601230

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to networking-bagpipe (stable/rocky)

Reviewed: https://review.openstack.org/600370
Committed: https://git.openstack.org/cgit/openstack/networking-bagpipe/commit/?id=6150dccbc763a8d5b5ff22c2bbbcec8527e7213d
Submitter: Zuul
Branch: stable/rocky

commit 6150dccbc763a8d5b5ff22c2bbbcec8527e7213d
Author: Thomas Morin <email address hidden>
Date: Fri Aug 31 14:11:12 2018 +0200

    bgpvpn: override NORMAL action in ACCEPTED_EGRESS_TRAFFIC_NORMAL_TABLE

    Bug 1789878 describes why the current implementation of traffic
    redirection for router fallback has been defeated by change
    Ib6ced838a7ec6d5c459a8475318556001c31bdf which was brought in the neutron
    ovs firewalling code.

    This change moves the place where we put a higher priority
    rule to override the NORMAL action to the ACCEPTED_EGRESS_TRAFFIC_NORMAL_TABLE,
    which is added by I60d299275effd9ef35c8007773d3c9fcabfa50fa to
    re-introduce a single table where NORMAL happens for egress traffic.

    This change anticipates the release of neutron 13.0.1 which
    will include https://review.openstack.org/599622 . A simple
    recheck will be sufficient to progress this change once the
    actual release has been made.

    Change-Id: I8923ca9583b3337556aaf98b4f25732ea6b7bb1c
    Depends-On: https://review.openstack.org/599622
    Closes-Bug: 1789878
    (cherry picked from commit 3867733d640fff4d70288ce845173c86b2b0b8af)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to networking-bgpvpn (stable/rocky)

Reviewed: https://review.openstack.org/601230
Committed: https://git.openstack.org/cgit/openstack/networking-bgpvpn/commit/?id=d406efdf423af7f0dc72b11553e9255b10b99759
Submitter: Zuul
Branch: stable/rocky

commit d406efdf423af7f0dc72b11553e9255b10b99759
Author: Thomas Morin <email address hidden>
Date: Mon Sep 3 10:46:19 2018 +0200

    tempest: reenable tests now that bug 1789878 is fixed

    Change-Id: Ibd5389d333e46e3f30dd29022932d78c89c85410
    Depends-On: https://review.openstack.org/600370
    Related-Bug: 1789878
    (cherry picked from commit aa96e0e1a9316a24976db03a3897b7f7e123a1c3)

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-bagpipe 9.0.1

This issue was fixed in the openstack/networking-bagpipe 9.0.1 release.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi Thomas. Do we need to backport aa96e0e1a9316a24976db03a3897b7f7e123a1c3 (reenable tempest tests) to Queens?

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

Yes, good catch!

Revision history for this message
Thomas Morin (tmmorin-orange) wrote :

ah, in fact, no for stable/queens a plain revert was sufficient and done in https://review.openstack.org/#/c/601229/

thanks

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-bagpipe 8.0.1

This issue was fixed in the openstack/networking-bagpipe 8.0.1 release.

Changed in neutron:
status: In Progress → Fix Released
Changed in bgpvpn:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/networking-bagpipe 10.0.0.0b1

This issue was fixed in the openstack/networking-bagpipe 10.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.