[RFE] FWaaS: Using Netlink instead of conntrack-tools to improve performance

Bug #1630832 reported by Ha Van Tu
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Medium
Unassigned

Bug Description

Updating firewall with a large number of firewall rules needs improving performance.

When the Firewall is updated, the conntrack entries will be deleted by conntrack-tools ("conntrack -D" commands) with each rule associated with this firewall. The problem is inside a cloud system with a large number of firewall rules applied. Updating so much rules will call a large number of subprocesses to implement the "conntrack -D" commands. That will consume the system resource and it will take a long time to finish updating firewall[1]. The client example to delete rules is in [2].
By using Netlink, we can call the subprocess only one time [3], so as to reduce the system resource and time to update firewall[4].

There should be some points need to be discussed:
- The standard Netlink interface for Python. There are 2 sources: [5] and [6] on github, but I don't know these resources are acceptable or not. If there is not standard one, we may need to write conntrack library for OpenStack from scratch.
- The "conntrack -D" needs *root privilege*. My solution is using oslo_privsep for instead.[7]

[1] For example: With the developer system (Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz and 16GiB memory) and using "conntrack-tools", it take average 429s to finish removing 10.000 rules.
[2] http://paste.openstack.org/show/584602/
[3] http://paste.openstack.org/show/584603/
[4] For example: With the developer system (Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz and 16GiB memory) and using "Netlink", it take average 33s to finish removing 10.000 rules.
[5] https://github.com/ei-grad/python-conntrack
[6] https://github.com/regit/pynetfilter_conntrack
[7] https://review.openstack.org/#/c/389654/

Ha Van Tu (tuhv)
summary: - Using Netlink to improve performance of conntrack-tools
+ [RFE] FWaaS: Using Netlink instead of conntrack-tools to improve
+ performance
Ha Van Tu (tuhv)
description: updated
Revision history for this message
Brian Haley (brian-haley) wrote :

python-conntrack is unmaintained
pynetfilter_conntrack seems more complete just not very active (does that mean it's stable?)

Either way, are there other options?

1) Are we duplicating any conntrack calls?
2) Are we running a lot of commands when a different, one-liner will work?

But feel free to create a patch and send it out for review based on what you wrote above as well.

Ha Van Tu (tuhv)
description: updated
Changed in neutron:
status: New → Confirmed
importance: Undecided → Wishlist
Revision history for this message
Ha Van Tu (tuhv) wrote :

@Brian Haley:
Thanks for discussing, I would like to response as below:
1) Are we duplicating any conntrack calls?
>> The conntrack command in FWaaS filter conntrack entries by "protocol", "ip version", "src_port" and "dst_port". In my test case, it is not duplicated.
2) Are we running a lot of commands when a different, one-liner will work?
>> Running one-liner may affect to other unexpected connection.

I will create the patch for my concept. Thank for your attention.

Revision history for this message
Ha Van Tu (tuhv) wrote :

@Brian Haley:
Thanks for considering, I have uploaded my patch as PoC in [1]. Could you please review it? Would you mind if I add you as reviewer.

[1] https://review.openstack.org/389654

description: updated
Ha Van Tu (tuhv)
description: updated
Revision history for this message
Kevin Benton (kevinbenton) wrote :

I think I'd like to see this implemented and improved upon in FWaaS before we adopt it for our SG implementation for a few reasons.

1. It's significantly more code for us to maintain, even without the functional tests the patch is missing.
2. It depends on privsep, which we haven't actually adopted yet in Neutron.
3. It's not clear to me if this ctype use is too runtime specific for deployer flexibility. https://review.openstack.org/#/c/389654/10/neutron_fwaas/privileged/nl_lib.py@86

Revision history for this message
Ha Van Tu (tuhv) wrote :

Hi Kevin Benton,
I have seen that the privsep has already been adopted in Neutron in [1]. My FWaaS patch [2] with Netlink solution is depended on [3] within Neutron. Could you please take a look at it?

[1] https://review.openstack.org/394741
[2] https://review.openstack.org/389654
[3] https://review.openstack.org/392014

Revision history for this message
shihanzhang (shihanzhang) wrote :

hi kevin, this patch has merged, what do you think adopting it for our SG in Pike?

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Let's bring this for discussion.

Changed in neutron:
status: Confirmed → Triaged
Revision history for this message
Ha Van Tu (tuhv) wrote :

Dear all,
This patch has been reverted[1], we are proposing another patch for it [2][3][4]. They are enhanced with namespace solution [5], unittests and functional tests.
[1]: https://review.openstack.org/434181
[2]: https://review.openstack.org/#/c/433598/
[3]: https://review.openstack.org/#/c/437311/
[4]: https://review.openstack.org/#/c/438445/
[5]: https://review.openstack.org/433633

Changed in neutron:
milestone: none → pike-1
assignee: nobody → Ha Van Tu (tuhv)
tags: added: rfe-approved
removed: rfe
tags: added: loadimpact
removed: needs-attention rfe-approved
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

My suggestion is to track this as a regular bug. Even though this is similar to bug 1492714, in reality we spearheaded efforts in neutron, so fwaas can just mirror what has been done already for adopting privsep.

Changed in neutron:
milestone: pike-1 → pike-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/469028

Changed in neutron:
assignee: Ha Van Tu (tuhv) → Cuong Nguyen (cuongnv)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Cuong Nguyen (<email address hidden>) on branch: master
Review: https://review.openstack.org/469028
Reason: will upload other ps

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/470912

Revision history for this message
songminglong (songminglong) wrote :

No one noticed that the step "_apply_synchronized()" at neutron/agent/linux/iptables_manager.py was time consuming when there were many firewall rules(>70), especially in function "changes = _generate_path_between_rules(old_rules, new_rules)" and "_generate_chain_diff_iptables_commands()"?
I have test the performance for function "_generate_path_between_rules()", it consumed about 30s when generating param "changes" for command iptable-restore with 130 firewall rules. So I think this is where we need to improve.

Revision history for this message
Cuong Nguyen (cuongnv) wrote :

Can you elaborate your steps/tools used for that performance test?
And we welcome any code contribution for this bug on gerrit.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/483864

Changed in neutron:
assignee: Cuong Nguyen (cuongnv) → songminglong (songminglong)
Changed in neutron:
assignee: songminglong (songminglong) → nobody
Revision history for this message
songminglong (songminglong) wrote :

hi cuongnv, I only tested func performance of "_generate_chain_diff_iptables_commands" and tool difflib in my local vm, and I found the reason of time consuming is the different sequences of iptables params between old_chain_rules and new_chain_rules, for example:
one rule in old_chain_rules(which extracting from router namespace using iptables-save cmd) is:
-A neutron-vpn-agen-ov4c70fcfc5 -p tcp -m tcp --dport 220 --sport 220 -s 192.168.1.0/24 -d 192.168.3.0/24 -j ACCEPT
but the same rule in new_chain_rules may be is:
-A neutron-vpn-agen-ov4c70fcfc5 -m tcp --dport 220 --sport 220 -p tcp -s 192.168.1.0/24 -d 192.168.3.0/24 -j ACCEPT

and tool difflib could not able to identify it as a same rule. when there are many rules(>70 or even more),the process of "_generate_chain_diff_iptables_commands"(time complexity is O(n^2)) may consume lots of times.

I don't know whether you can figure out my explations, and please forgive me my bad english, have a good day.

Changed in neutron:
assignee: nobody → songminglong (songminglong)
Changed in neutron:
assignee: songminglong (songminglong) → nobody
Changed in neutron:
assignee: nobody → songminglong (songminglong)
Changed in neutron:
assignee: songminglong (songminglong) → nobody
Revision history for this message
Brian Haley (brian-haley) wrote :

Hi Song,

I'm not sure that the thread you started at comment #13 belongs in this bug - can you please open another with that information if you haven't already?

If there is a difference in two iptables rules generated - '-p tcp -m tcp' versus '-m tcp' that could just be a bug somewhere in the code as it shouldn't happen.

Thanks, Brian

Changed in neutron:
assignee: nobody → Cuong Nguyen (cuongnv)
Revision history for this message
Kevin Benton (kevinbenton) wrote :

Hi Song,

There is a specific option you can turn on to detect when the internal representation of rules does not match the system representation. debug_iptables_rules http://git.openstack.org/cgit/openstack/neutron/tree/neutron/conf/agent/common.py#n73

As Brian said, it shouldn't happen and if you find a case like that we need to fix the way we form the rule.

Changed in neutron:
importance: Wishlist → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/483864
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
assignee: Cuong Nguyen (cuongnv) → Cao Xuan Hoang (hoangcx)
Revision history for this message
Cao Xuan Hoang (hoangcx) wrote :

The below is the info for performance testing:

Number of changed rules Time (s) Conntrack-tool times Performance improvement
500 1.040600538 19.11228633 94.555332
1000 2.515271902 31.07810974 91.90661233
5000 88.87204766 141.193157 37.05640589
10000 114.2987695 269.072845 57.52125434

Detail can be found here: https://docs.google.com/spreadsheets/d/1-eSf_NblcHqE2YqKSSe_vtTitQ1ouvNf5x8oW-qbsRQ/edit#gid=0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/537654

Revision history for this message
Brian Haley (brian-haley) wrote :

I changed the bug in the related fix I proposed in the last comment.

Changed in neutron:
assignee: Cao Xuan Hoang (hoangcx) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → Slawek Kaplonski (slaweq)
Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: Slawek Kaplonski (slaweq) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/470912
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
status: New → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.