Netlink solution not enough mature for Ocata

Bug #1664294 reported by Cedric Brandily
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Low
Cuong Nguyen

Bug Description

neutron-fwaas replaced conntrack-tools by netlink + pyroute2[1] in order to delete faster conntrack entries when the rules associated to a firewall (through a policy) are updated.

This change highlights many warnings:
1) the code readability should be improved:
** it embeds dead code/unused constants[2]
** it uses non-meaningful variable names[3] ("h", "ct")
** it uses hardcoded values instead of existing constants[3]
** it miss docstrings/comments for tricky parts[2][3]

2) the change tricky part[3] is untested

3) the code seems unhealthy,
** nothing ensures that file descriptors/objects are correctly closed/destroyed if an exception is raised[3]
** pyroute2.netns.setns use[3] seems not (green)thread-safe. It seems a (green)thread can move the process in $netns2 when a concurrent greenthread is "working" in $netns1. It can be solved by a lock.

4) the change should highlight if and why nfct and libc C libraries used in the change are eventlet-friendly.

5) the change uses pyroute2.netns.setns in [3] which moves the current process to a specific netns
** it seems that when we enter the netns, the process seems to never go back in the root netns[9]. Is it an acceptable limitation?
** When the process is in a netns, every network action is done in the netns including rpc calls, remote sylog which won't work because they should be done in the root namespace.

1) => we can live on the short term with it (easy to address)
2) => we can address it (costly)
3) => we can address it (easy to solve)
4) => i don't know how to get an answer to this question
5) => it seems to imply that we have to dedicate specific(s) process(es) moving between netns to list/kill/flush_entries like as it's done in oslo.rootwrap daemon mode (very costly by required)

These warnings are based on my understanding of the netlink feature, perhaps i miss something.

[1] https://review.openstack.org/389654
[2] https://github.com/openstack/neutron-fwaas/blob/master/neutron_fwaas/common/netlink_constants.py
[3] https://github.com/openstack/neutron-fwaas/blob/master/neutron_fwaas/privileged/netlink_lib.py
[9] Code used to test pyroute2:
import os, pyroute2, pyroute2.netns

root_ifs = {x.get_attr('IFLA_IFNAME') for x in pyroute2.IPRoute().get_links()}
fd = pyroute2.netns.setns('taz')
ns_ifs = {x.get_attr('IFLA_IFNAME') for x in pyroute2.IPRoute().get_links()}
os.close(fd)
last_ifs = {x.get_attr('IFLA_IFNAME') for x in pyroute2.IPRoute().get_links()}

if root_ifs == ns_ifs:
    print 'Add a new interface in root netns with ip tuntap a mode tap'
elif last_ifs == root_ifs:
    print 'OK: we ended in root netns'
elif root_ifs != last_ifs == ns_ifs:
    print 'KO: we ended in taz netns'
else:
    print 'Something is wrong'

Revision history for this message
Ha Van Tu (tuhv) wrote :

5) the change uses pyroute2.netns.setns in [3] which moves the current process to a specific netns
** it seems that when we enter the netns, the process seems to never go back in the root netns[9]. Is it an acceptable limitation?
** When the process is in a netns, every network action is done in the netns including rpc calls, remote sylog which won't work because they should be done in the root namespace.

Actually the netns we set is only on the oslo.rootwrap process mode. So it doest not affect to rpc calls, remote syslog.

Revision history for this message
Ha Van Tu (tuhv) wrote :

@ Cedric Brandily,
Thank you for pointing out the problem in the Netlink feature.
My planning to improve the Netlink feature as good as possible, and also make the conntrack configurable (we can switch between conntrack-tools and Netlink), make conntrack-tools as default.
In my opinion, keeping the netlink_lib module is necessary because it can be a good reference of improving conntrack performance

Ha Van Tu (tuhv)
Changed in neutron:
status: New → Confirmed
Revision history for this message
Kevin Benton (kevinbenton) wrote :

@Cedric,

As Ha Van Tu pointed out, all of this code is executed in the privileged privsep process, which doesn't make use of eventlet. How much does that change your analysis?

Revision history for this message
Cedric Brandily (cbrandily) wrote :

@tuhv,
>> Actually the netns we set is only on the oslo.rootwrap process mode.
> So it doest not affect to rpc calls, remote syslog.

iiuc, logging is also done inside the privileged privsep process through a LOG.exception, which could be a trouble because of [1] but perhaps the logging handlers in the privileged privsep process are specific/constrained?

@Kevin,

4) solved, should be documented
5) seems still to be a concern as it moves the privileged privsep process from a netns to an other so we cannot ensure in which netns a command will be runned

[1] https://docs.python.org/3/howto/logging-cookbook.html#logging-to-a-single-file-from-multiple-processes

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-fwaas (master)

Fix proposed to branch: master
Review: https://review.openstack.org/433598

Changed in neutron:
assignee: Cedric Brandily (cbrandily) → Ha Van Tu (tuhv)
status: Confirmed → In Progress
Revision history for this message
Kevin Benton (kevinbenton) wrote :

@tuhv,

Can we add a context manager that does the following?

On entrance, set the namespace to whatever it needs to be.
On exit, set the namespace back to root with pyroute2.netns.setns('/proc/1/ns/net')
If it fails to set the namespace back to root, completely kill the process with an ugly error.

Then anything that needs to be performed in a namespace will be executed inside a context manager. That will let us easily reason about when a process is in a particular namespace.

Revision history for this message
Cedric Brandily (cbrandily) wrote :

@Kevin, @tuhv, i am working on the netns context manager

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/433633

Changed in neutron:
assignee: Ha Van Tu (tuhv) → Cedric Brandily (cbrandily)
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Not sure if there's time to address this safely within the RC window.

Changed in neutron:
milestone: none → pike-1
tags: added: ocata-backport-potential
removed: ocata-rc-potential
Changed in neutron:
importance: Undecided → Medium
importance: Medium → Low
Changed in neutron:
assignee: Cedric Brandily (cbrandily) → Ha Van Tu (tuhv)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/434136

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/434181

Changed in neutron:
assignee: Ha Van Tu (tuhv) → Cedric Brandily (cbrandily)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/434504

Changed in neutron:
assignee: Cedric Brandily (cbrandily) → Ha Van Tu (tuhv)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/434593

Changed in neutron:
assignee: Ha Van Tu (tuhv) → Cedric Brandily (cbrandily)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-fwaas (master)

Reviewed: https://review.openstack.org/434504
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=01ae5e7ec0df29bb07a27ae2f0aad1f5b563cf9d
Submitter: Jenkins
Branch: master

commit 01ae5e7ec0df29bb07a27ae2f0aad1f5b563cf9d
Author: Cedric Brandily <email address hidden>
Date: Wed Feb 15 20:09:52 2017 +0100

    Revert "Netlink solution to improve FWaaS performance"

    This reverts commit 6d5afd1a6fb2c88ae5b8f2e9f3067bd604943a37.

    Because Netlink solution moves privileged privsep process into netns
    without moving it back to root netns which could break other methods
    using privileged process in neutron.

    A follow-up change will rethink Netlink solution to correct its netns
    management and improves code documentation/coverage/safety.

    This change doesn't revert requirements.txt. It will allow to backport
    the revert of this change (ie: allow to backport the rethinked Netlink
    solution).

    Partial-Bug: #1664294
    Change-Id: I0c2d3b90a5799ce8d3baf4d20e95b352d12dbdc7

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-fwaas (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/435150

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-fwaas (stable/ocata)

Reviewed: https://review.openstack.org/435150
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=0ee81f541fa0445c64e104730ddee97128cde885
Submitter: Jenkins
Branch: stable/ocata

commit 0ee81f541fa0445c64e104730ddee97128cde885
Author: Cedric Brandily <email address hidden>
Date: Wed Feb 15 20:09:52 2017 +0100

    Revert "Netlink solution to improve FWaaS performance"

    This reverts commit 6d5afd1a6fb2c88ae5b8f2e9f3067bd604943a37.

    Because Netlink solution moves privileged privsep process into netns
    without moving it back to root netns which could break other methods
    using privileged process in neutron.

    A follow-up change will rethink Netlink solution to correct its netns
    management and improves code documentation/coverage/safety.

    This change doesn't revert requirements.txt. It will allow to backport
    the revert of this change (ie: allow to backport the rethinked Netlink
    solution).

    Partial-Bug: #1664294
    Change-Id: I0c2d3b90a5799ce8d3baf4d20e95b352d12dbdc7
    (cherry picked from commit 01ae5e7ec0df29bb07a27ae2f0aad1f5b563cf9d)

tags: added: in-stable-ocata
Changed in neutron:
assignee: Cedric Brandily (cbrandily) → Ha Van Tu (tuhv)
Changed in neutron:
assignee: Ha Van Tu (tuhv) → Cedric Brandily (cbrandily)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-fwaas (master)

Change abandoned by Ha Van Tu (<email address hidden>) on branch: master
Review: https://review.openstack.org/432183
Reason: This patch is included in the Netlink library patch: https://review.openstack.org/#/c/437311/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Ha Van Tu (<email address hidden>) on branch: master
Review: https://review.openstack.org/434593
Reason: This patch is moved to https://review.openstack.org/438445

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Ha Van Tu (<email address hidden>) on branch: master
Review: https://review.openstack.org/434181
Reason: This patch is moved to https://review.openstack.org/#/c/433598/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Ha Van Tu (<email address hidden>) on branch: master
Review: https://review.openstack.org/434136
Reason: This patch is moved to https://review.openstack.org/#/c/438445/

Changed in neutron:
assignee: Cedric Brandily (cbrandily) → Ha Van Tu (tuhv)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-fwaas (master)

Reviewed: https://review.openstack.org/433633
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=094d2d5f01914fbebd4e16cc2c41390a72b5874d
Submitter: Jenkins
Branch: master

commit 094d2d5f01914fbebd4e16cc2c41390a72b5874d
Author: Cedric Brandily <email address hidden>
Date: Tue Feb 14 13:36:02 2017 +0100

    Define in_namespace contextmanager

    This change defines the contextmanager in_namespace[1]. It moves current
    process in a network namespace (in __enter__) and moves it back in its
    original network namespace (in _exit__) or kills current process if
    __exit__ fails in order to ensure following commands will be executed
    in the correct network namespace.

    This change is an enabler to the Netlink solution to clean conntrack
    entries.

    [1] neutron_fwaas.privileged.utils

    Partial-Bug: #1664294
    Change-Id: I587257db8e1fce56a95f0db3dc4e0752751fdd81

Changed in neutron:
assignee: Ha Van Tu (tuhv) → Cuong Nguyen (cuongnv)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/433598
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=8f6a1b90af7fdf3b865b700c09fbb415f4647812
Submitter: Jenkins
Branch: master

commit 8f6a1b90af7fdf3b865b700c09fbb415f4647812
Author: Cuong Nguyen <email address hidden>
Date: Tue Feb 14 19:21:24 2017 +0700

    Enable to configure conntrack driver

    This patch enables to configure conntrack driver.
    Initially, "conntrack-tools" is being used to manage connection,
    however, it's costly and down performance[1]. The alternative can be
    found here[2] with need to improve reliability and stability.

    [1] https://bugs.launchpad.net/neutron/+bug/1630832
    [2] https://review.openstack.org/#/c/389654/

    Partial-Bug: #1664294
    Co-Authored-By: Nguyen Phuong An <email address hidden>
    Change-Id: Id0597f74bef67b85776445e7bc591eb085f55acc

Changed in neutron:
assignee: Cuong Nguyen (cuongnv) → Reedip (reedip-banerjee)
Changed in neutron:
assignee: Reedip (reedip-banerjee) → Cuong Nguyen (cuongnv)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/437311
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=b03caed15afca5cd372f8af429731069968c881a
Submitter: Jenkins
Branch: master

commit b03caed15afca5cd372f8af429731069968c881a
Author: Cuong Nguyen <email address hidden>
Date: Thu Feb 23 18:06:29 2017 +0700

    Netlink library to delete conntrack entries

    This patch proposes Netlink library to manage conntrack
    entries. Netlink library will be used by ConntrackNetlink Driver.

    Partial-Bug: #1664294
    Co-Authored-By: Nguyen Phuong An <email address hidden>
    Change-Id: Ied00e1badaac87c017a77f782bcd8f3bab072c46

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/438445
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=ec9d37183a085293f72f4c653e5c67e155c15be8
Submitter: Jenkins
Branch: master

commit ec9d37183a085293f72f4c653e5c67e155c15be8
Author: Cuong Nguyen <email address hidden>
Date: Mon Feb 27 19:20:58 2017 +0700

    ConntrackNetlink driver to delete conntrack entries

    This patch proposes ConntrackNetlink driver to delete conntrack
    entries.
    Using Netlink will save about 90 percent of time that used by conntrack-tools.
    For detail information, visit: https://goo.gl/3tm9Fx

    Partial-Bug: #1664294
    Change-Id: Id9a3b7c3f8fedbd91ced1a5b359dbf568cd26653
    Co-Authored-By: Nguyen Phuong An <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-fwaas (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/458435

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/458436

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/458437

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/458438

Changed in neutron:
milestone: pike-1 → pike-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-fwaas (stable/ocata)

Reviewed: https://review.openstack.org/458435
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=a08e30539a34f8ec20ab807ea1630d99d7adfb4e
Submitter: Jenkins
Branch: stable/ocata

commit a08e30539a34f8ec20ab807ea1630d99d7adfb4e
Author: Cedric Brandily <email address hidden>
Date: Tue Feb 14 13:36:02 2017 +0100

    Define in_namespace contextmanager

    This change defines the contextmanager in_namespace[1]. It moves current
    process in a network namespace (in __enter__) and moves it back in its
    original network namespace (in _exit__) or kills current process if
    __exit__ fails in order to ensure following commands will be executed
    in the correct network namespace.

    This change is an enabler to the Netlink solution to clean conntrack
    entries.

    [1] neutron_fwaas.privileged.utils

    Partial-Bug: #1664294
    Change-Id: I587257db8e1fce56a95f0db3dc4e0752751fdd81
    (cherry picked from commit 094d2d5f01914fbebd4e16cc2c41390a72b5874d)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/458436
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=86e6857be0258bff4d32b302733b1bef854091eb
Submitter: Jenkins
Branch: stable/ocata

commit 86e6857be0258bff4d32b302733b1bef854091eb
Author: Cuong Nguyen <email address hidden>
Date: Tue Feb 14 19:21:24 2017 +0700

    Enable to configure conntrack driver

    This patch enables to configure conntrack driver.
    Initially, "conntrack-tools" is being used to manage connection,
    however, it's costly and down performance[1]. The alternative can be
    found here[2] with need to improve reliability and stability.

    [1] https://bugs.launchpad.net/neutron/+bug/1630832
    [2] https://review.openstack.org/#/c/389654/

    Partial-Bug: #1664294
    Co-Authored-By: Nguyen Phuong An <email address hidden>
    Change-Id: Id0597f74bef67b85776445e7bc591eb085f55acc
    (cherry picked from commit 8f6a1b90af7fdf3b865b700c09fbb415f4647812)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/458437
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=a54e42a3a313f2d89c5d6da6d222555160b8b2ad
Submitter: Jenkins
Branch: stable/ocata

commit a54e42a3a313f2d89c5d6da6d222555160b8b2ad
Author: Cuong Nguyen <email address hidden>
Date: Thu Feb 23 18:06:29 2017 +0700

    Netlink library to delete conntrack entries

    This patch proposes Netlink library to manage conntrack
    entries. Netlink library will be used by ConntrackNetlink Driver.

    Partial-Bug: #1664294
    Co-Authored-By: Nguyen Phuong An <email address hidden>
    Change-Id: Ied00e1badaac87c017a77f782bcd8f3bab072c46
    (cherry picked from commit b03caed15afca5cd372f8af429731069968c881a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/458438
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=d80832c2084ec7b70328a603b2bcc4e2776f0123
Submitter: Jenkins
Branch: stable/ocata

commit d80832c2084ec7b70328a603b2bcc4e2776f0123
Author: Cuong Nguyen <email address hidden>
Date: Mon Feb 27 19:20:58 2017 +0700

    ConntrackNetlink driver to delete conntrack entries

    This patch proposes ConntrackNetlink driver to delete conntrack
    entries.
    Using Netlink will save about 90 percent of time that used by conntrack-tools.
    For detail information, visit: https://goo.gl/3tm9Fx

    Partial-Bug: #1664294
    Change-Id: Id9a3b7c3f8fedbd91ced1a5b359dbf568cd26653
    Co-Authored-By: Nguyen Phuong An <email address hidden>
    (cherry picked from commit ec9d37183a085293f72f4c653e5c67e155c15be8)

Revision history for this message
Cao Xuan Hoang (hoangcx) wrote :

I think we can close this bug. Though?

Revision history for this message
Cao Xuan Hoang (hoangcx) wrote :
Revision history for this message
songminglong (songminglong) wrote :

No one noticed that the step "_apply_synchronized()" at neutron/agent/linux/iptables_manager.py was time consuming when there were many firewall rules(>70), especially in function "changes = _generate_path_between_rules(old_rules, new_rules)" and "_generate_chain_diff_iptables_commands()"?
I have test the performance for function "_generate_path_between_rules()", it consumed about 30s when generating param "changes" for command iptable-restore with 130 firewall rules. So I think this is where we need to improve.

Revision history for this message
Cuong Nguyen (cuongnv) wrote :

this ticket is for FWaaS, we track work for neutron sec group here https://bugs.launchpad.net/neutron/+bug/1630832. Pls discuss there, I mark this as Fix Committed.

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-fwaas (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/489980

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-fwaas (master)

Reviewed: https://review.openstack.org/489980
Committed: https://git.openstack.org/cgit/openstack/neutron-fwaas/commit/?id=24b25ccfbe541b4699d88a819ecffd4b82b091ef
Submitter: Zuul
Branch: master

commit 24b25ccfbe541b4699d88a819ecffd4b82b091ef
Author: Hunt Xu <email address hidden>
Date: Wed Aug 2 16:56:13 2017 +0800

    Enable to use conntrack driver in fwaas_v2

    The conntrack entries management codes were identical in fwaas and
    fwaas_v2 drivers. However [1] didn't handle the fwaas_v2 driver's part.

    [1] Id0597f74bef67b85776445e7bc591eb085f55acc

    Related-Bug: #1664294
    Change-Id: I4a444fb313bbef2d3a410703ea06d2a0605df772

Changed in neutron:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.