cProfile - fix Security Groups hotfunctions

Bug #1492456 reported by Ramu Ramamurthy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Unassigned

Bug Description

I used cProfile to profile neutron-ovs-agent (from neutron kilo 2015.1.0) as VMs are provisioned (see code sample below to reproduce).

I find a couple of functions in the IptablesManager scaling poorly with # of VMs (_modify_rules, and its callee find_last_entry).
As the # of current VMs doubles, the time spent in these functions to provision 10 new VMs also roughly doubles,
While we wait for the new IptablesTables firewall driver:
https://blueprints.launchpad.net/neutron/+spec/new-iptables-driver
Can we improve the performance of the current iptables firewall on those 2 functions which do a lot of string processing on iptables rule strings checking for dups ?

Current: #VMs: 20, # iptables rules: 657,
provision 10 new VMs

  ncalls tottime percall cumtime percall filename:lineno(function)
       60 0.143 0.002 3.979 0.066 iptables_manager.py:511(_modify_rules)
    25989 2.752 0.000 3.332 0.000 iptables_manager.py:504(_find_last_entry)

Cumulative time spent in _find_last_entry: 3.3 sec

Current #VMs: 40, # iptables rules: 1277 ,
provision 10 new VMs

       65 0.220 0.003 7.974 0.123 iptables_manager.py:511(_modify_rules)
    38891 5.782 0.000 6.986 0.000 iptables_manager.py:504(_find_last_entry)

Cumulative time spent in _find_last_entry: 6.9 sec

Current #VMs: 80, # iptables rules: 2517 ,
provision 10 new VMs

       30 0.274 0.009 20.496 0.683 iptables_manager.py:511(_modify_rules)
    43862 15.920 0.000 19.292 0.000 iptables_manager.py:504(_find_last_entry)

Cumulative time spent in _find_last_entry: 19.2 sec

current #VMs: 160, # iptables rules: 4997,
provision 10 new VMs

       20 0.375 0.019 49.255 2.463 iptables_manager.py:511(_modify_rules)
    56478 39.275 0.001 47.629 0.001 iptables_manager.py:504(_find_last_entry)

Cumulative time spent in _find_last_entry: 47.6 sec

--------------------

To Reproduce: THis is one way where we can control start/stop of profiling
based on presence of a file (/tmp/cprof) in the file-system
-------
Make following change to neutron_ovs_agent.py to enable/disable cProfile for a given scenario.

import cProfile
import os.path
pr_enabled = False
pr = None

In OVSNeutronAgent add method:

     def toggle_cprofile(self):
         global pr, pr_enabled
         start = False
         data = ""
         fname = "vm.profile"
         try:
             if os.path.isfile("/tmp/cprof"):
                 start = True
         except IOError as e:
             LOG.warn("Error %s", e.strerror)

         if start and not pr_enabled:
             pr = cProfile.Profile()
             pr.enable()
             pr_enabled = True
             LOG.warn("enabled cprofile")
         if not start and pr_enabled:
             pr.disable()
             pr.create_stats()
             pr.dump_stats("/tmp/%s"%fname)
             pr_enabled = False
             LOG.warn("disabled cprofile")

In polling loop:
             self.toggle_cprofile()

-------------
This is another way to run the cProfile, but here, there is no way to control start/stop of profiling, and the profile includes the initialization also.

Run, neutron-ovs-agent as follows:

sudo -u neutron bash -c "/usr/bin/python -m cProfile -o /tmp/vm_1.profile /usr/bin/neutron-openvswitch-agent --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini --log-file /var/log/neutron/openvswitch-agent.log"

description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
Revision history for this message
Kyle Mestery (mestery) wrote :

Assigning to Miguel for an initial look.

Changed in neutron:
assignee: nobody → Miguel Angel Ajo (mangelajo)
status: New → Triaged
importance: Undecided → High
description: updated
description: updated
Revision history for this message
Brian Haley (brian-haley) wrote :

Can you retry this on the latest Liberty code? There was a change that probably fixed this, https://review.openstack.org/#/c/183719/

Revision history for this message
Ramu Ramamurthy (ramu-ramamurthy) wrote :

I applied the fix suggested above, and it improves things significantly (2x) as also noted in the patch comments.
However the problem still exists to a lesser extent, and further improvements are needed. As such I can close this as fix released - or leave it open for someone clever to take a next crack at this problem.

The following show the times to provision 10 VMs when there already N existing VMs.

                                        before patch after patch
# existing VMs _modify_rules _modify_rules
                                     cumulative time cumulative time
                                                  in sec in sec

                            0 0.34 0.20
                          10 1.68 0.94
                          20 4.27 2.12
                          40 11.8 6.44
                          80 20.2 13.6
                         160 50 23.2

Revision history for this message
Ramu Ramamurthy (ramu-ramamurthy) wrote :

The data formatted poorly in the previous comment. Reformatting it:

#VMs = 0, provision 10 VMs
cumulative time in _modify_rules : Before 0.34 After: 0.20

#VMs = 10, provision 10 VMs
cumulative time in _modify_rules : Before 1.68 After: 0.94

#VMs = 20, provision 10 VMs
cumulative time in _modify_rules : Before 4.27 After: 2.12

#VMs = 40, provision 10 VMs
cumulative time in _modify_rules : Before 11.8 After: 6.44

#VMs = 80, provision 10 VMs
cumulative time in _modify_rules : Before 20.2 After: 13.6

#VMs = 160, provision 10 VMs
cumulative time in _modify_rules : Before 50 After: 23.2

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

I think this should be handled as an RFE. I have plans to collaborate with Jakub Libosvar on moving security groups to OVS/CT during this cycle as pre previous POC.

This is IMO a (desirable) enhancement, more important in the container world where density is higher.

Changed in neutron:
assignee: Miguel Angel Ajo (mangelajo) → nobody
Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

BTW, I loved the python profiling, I'm wondering if we could also RFE to have an option to start agents with profiling, so we can *easily* gather profiling information, as we do with the debug flag.

Revision history for this message
Ramu Ramamurthy (ramu-ramamurthy) wrote :

As per the comment above, I will close this as fix-released (the patch noted above)
and create a RFE bug to track further improvements needed.

I will also create a RFE bug to provide an option to run neutron agents in a "profiled" mode.

Changed in neutron:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.