ip6tables rules for PD subnets not fully recreated at l3-agent restart

Bug #1789403 reported by Eigil Obrestad
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Brian Haley

Bug Description

currently ip6tables in the qrouter namespace has the following rule. This causes unmarked packets to drop.

-A neutron-l3-agent-scope -o qr-f4eceee5-a4 -m mark ! --mark 0x4000000/0xffff0000 -j DROP

In a related bug (1570122) there was a problem that prefix-delegated sub-nets did not get a rule setting this mark on traffic incoming on the gateway port, and this traffic was thus dropped. Now, this seems to work correctly when a user creates a subnet with IPv6 from PD. The problem arises when the l3-agent restarts, or the router moves to another l3-agent, as the rule marking the traffic is not recreated in these cases. The result is the same symptoms as the bug #1570122

Adding the rule manually makes traffic flowing again, for instance with the line:
$ip6tables -t mangle -A neutron-l3-agent-scope -i qg-28f7e259-d2 -j MARK --set-xmark 0x4000000/0xffff0000

We are running at the Queens release at the moment:
 - neutron-l3-agent 2:12.0.2-0ubuntu1~cloud0

This bugs are a major obstacle for IPv6 in our clouds, as we cannot deliver reliable transport of ipv6 packets when this rule suddenly are missing.

tags: added: l3-ipam-dhcp
Changed in neutron:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Brian Haley (brian-haley) wrote :

Can you test a patch if I post one? I think there's one line of code missing.

https://review.openstack.org/597710

Changed in neutron:
assignee: nobody → Brian Haley (brian-haley)
Revision history for this message
Eigil Obrestad (obrestad) wrote :

I manually patched it into our test-installation, and it seems to do the trick. When we turn of the network node currently hosting the router the router is recreated on another network node and a random VM behind this router starts ponging after a little while.

Just for reference, this is the router_info.py which we currently have there with the patch at line 565:
http://paste.ubuntu.com/p/vtfzY8wgJw/

The rest of the file is from version "12.0.2-0ubuntu1~cloud0"

Revision history for this message
Brian Haley (brian-haley) wrote :

I think the patch is going to need an update. When I do that can you re-try with the changes to the l3-agent?

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Eigil Obrestad (obrestad) wrote :

Sure. Just notify me when you want us to test the patch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/597710
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=d19dcf1ef2f8e4b837e57dfef4ed1580c5d1e7b7
Submitter: Zuul
Branch: master

commit d19dcf1ef2f8e4b837e57dfef4ed1580c5d1e7b7
Author: Brian Haley <email address hidden>
Date: Wed Aug 29 17:06:59 2018 -0400

    Fix IPv6 prefix delegation issue on agent restart

    On l3-agent restart, prefix delegation subnets weren't always
    inserted into the local router_info cache, leading to a missing
    ip6tables rule. Add it when the internal network is configured
    if the prefix has already been assigned.

    Change-Id: Ic045e2763ba2772bcaf037591821501e84e40878
    Closes-bug: #1789403

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
Lars Erik Pedersen (pedersen-larserik) wrote :

Can this be backported to stable/queens and released in UCA?

Revision history for this message
Brian Haley (brian-haley) wrote :

Yes, it should be backported to stable/rocky and stable/queens, and even stable/pike if it applies.

tags: added: rocky-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/604118

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/604119

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/604118
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b2cd92241fab17fe9dcf418f51aee959c9842a51
Submitter: Zuul
Branch: stable/rocky

commit b2cd92241fab17fe9dcf418f51aee959c9842a51
Author: Brian Haley <email address hidden>
Date: Wed Aug 29 17:06:59 2018 -0400

    Fix IPv6 prefix delegation issue on agent restart

    On l3-agent restart, prefix delegation subnets weren't always
    inserted into the local router_info cache, leading to a missing
    ip6tables rule. Add it when the internal network is configured
    if the prefix has already been assigned.

    Change-Id: Ic045e2763ba2772bcaf037591821501e84e40878
    Closes-bug: #1789403
    (cherry picked from commit d19dcf1ef2f8e4b837e57dfef4ed1580c5d1e7b7)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/604119
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=1e7230b7b9746779b6d6633e4b49ea5b60a38de4
Submitter: Zuul
Branch: stable/queens

commit 1e7230b7b9746779b6d6633e4b49ea5b60a38de4
Author: Brian Haley <email address hidden>
Date: Wed Aug 29 17:06:59 2018 -0400

    Fix IPv6 prefix delegation issue on agent restart

    On l3-agent restart, prefix delegation subnets weren't always
    inserted into the local router_info cache, leading to a missing
    ip6tables rule. Add it when the internal network is configured
    if the prefix has already been assigned.

    Change-Id: Ic045e2763ba2772bcaf037591821501e84e40878
    Closes-bug: #1789403
    (cherry picked from commit d19dcf1ef2f8e4b837e57dfef4ed1580c5d1e7b7)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.2

This issue was fixed in the openstack/neutron 13.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.5

This issue was fixed in the openstack/neutron 12.0.5 release.

Revision history for this message
Lars Erik Pedersen (pedersen-larserik) wrote :

Can anyone "guess" when this will be released in UCA? UCA doesn't even have 12.0.4...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0b1

This issue was fixed in the openstack/neutron 14.0.0.0b1 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.