Activity log for bug #1453264

Date Who What changed Old value New value Message
2015-05-08 20:21:35 Brian Haley bug added bug
2015-05-08 20:21:35 Brian Haley attachment added Script to add 1000 security group rules https://bugs.launchpad.net/bugs/1453264/+attachment/4393817/+files/big-sec-rules.sh
2015-05-08 20:21:56 Brian Haley neutron: assignee Brian Haley (brian-haley)
2015-05-19 23:45:02 OpenStack Infra neutron: status New In Progress
2015-05-19 23:45:02 OpenStack Infra neutron: assignee Brian Haley (brian-haley) Kevin Benton (kevinbenton)
2015-05-22 18:17:16 OpenStack Infra neutron: assignee Kevin Benton (kevinbenton) Brian Haley (brian-haley)
2015-05-26 07:59:14 OpenStack Infra neutron: assignee Brian Haley (brian-haley) Kevin Benton (kevinbenton)
2015-05-28 22:58:28 OpenStack Infra neutron: status In Progress Fix Committed
2015-06-24 20:18:35 Thierry Carrez neutron: status Fix Committed Fix Released
2015-06-24 20:18:35 Thierry Carrez neutron: milestone liberty-1
2015-06-30 02:35:07 OpenStack Infra tags in-feature-pecan
2015-06-30 02:35:09 OpenStack Infra bug watch added http://bugs.python.org/issue21239
2015-09-19 20:06:10 OpenStack Infra tags in-feature-pecan in-feature-pecan in-stable-kilo
2015-10-11 18:31:50 Chuck Short nominated for series neutron/kilo
2015-10-11 18:31:50 Chuck Short bug task added neutron/kilo
2015-10-11 18:32:00 Chuck Short neutron/kilo: status New Fix Committed
2015-10-11 18:32:04 Chuck Short neutron/kilo: milestone 2015.1.2
2015-10-13 19:25:41 Chuck Short neutron/kilo: status Fix Committed Fix Released
2015-10-15 12:18:32 Thierry Carrez neutron: milestone liberty-1 7.0.0
2015-11-12 15:00:13 Evan Stoner bug added subscriber Evan Stoner
2016-08-29 22:11:15 Billy Olsen description We have customers that typically add a few hundred security group rules or more. We also typically run 30+ VMs per compute node. When about 10+ VMs with a large SG set all get scheduled to the same node, the L2 agent (OVS) can spend many minutes in the iptables_manager.apply() code, so much so that by the time all the rules are updated, the VM has already tried DHCP and failed, leaving it in an unusable state. While there have been some patches that tried to address this in Juno and Kilo, they've either not helped as much as necessary, or broken SGs completely due to re-ordering the of the iptables rules. I've been able to show some pretty bad scaling with just a handful of VMs running in devstack based on today's code (May 8th, 2015) from upstream Openstack. Here's what I tested: 1. I created a security group with 1000 TCP port rules (you could alternately have a smaller number of rules and more VMs, but it's quicker this way) 2. I booted VMs, specifying both the default and "large" SGs, and timed from the second it took Neutron to "learn" about the port until it completed it's work 3. I got a :( pretty quickly And here's some data: 1-3 VM - didn't time, less than 20 seconds 4th VM - 0:36 5th VM - 0:53 6th VM - 1:11 7th VM - 1:25 8th VM - 1:48 9th VM - 2:14 While it's busy adding the rules, the OVS agent is consuming pretty close to 100% of a CPU for most of this time (from top): PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25767 stack 20 0 157936 76572 4416 R 89.2 0.5 50:14.28 python And this is with only ~10K rules at this point! When we start crossing the 20K point VM boot failures start to happen. I'm filing this bug since we need to take a closer look at this in Liberty and fix it, it's been this way since Havana and needs some TLC. I've attached a simple script I've used to recreate this, and will start taking a look at options here. [Impact] We have customers that typically add a few hundred security group rules or more. We also typically run 30+ VMs per compute node. When about 10+ VMs with a large SG set all get scheduled to the same node, the L2 agent (OVS) can spend many minutes in the iptables_manager.apply() code, so much so that by the time all the rules are updated, the VM has already tried DHCP and failed, leaving it in an unusable state. While there have been some patches that tried to address this in Juno and Kilo, they've either not helped as much as necessary, or broken SGs completely due to re-ordering the of the iptables rules. I've been able to show some pretty bad scaling with just a handful of VMs running in devstack based on today's code (May 8th, 2015) from upstream Openstack. [Test Case] Here's what I tested: 1. I created a security group with 1000 TCP port rules (you could alternately have a smaller number of rules and more VMs, but it's quicker this way) 2. I booted VMs, specifying both the default and "large" SGs, and timed from the second it took Neutron to "learn" about the port until it completed it's work 3. I got a :( pretty quickly And here's some data: 1-3 VM - didn't time, less than 20 seconds 4th VM - 0:36 5th VM - 0:53 6th VM - 1:11 7th VM - 1:25 8th VM - 1:48 9th VM - 2:14 While it's busy adding the rules, the OVS agent is consuming pretty close to 100% of a CPU for most of this time (from top):   PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 25767 stack 20 0 157936 76572 4416 R 89.2 0.5 50:14.28 python And this is with only ~10K rules at this point! When we start crossing the 20K point VM boot failures start to happen. I'm filing this bug since we need to take a closer look at this in Liberty and fix it, it's been this way since Havana and needs some TLC. I've attached a simple script I've used to recreate this, and will start taking a look at options here. [Regression Potential] Minimal since this has been running in upstream stable for several releases now (Kilo, Liberty, Mitaka).
2016-08-29 22:11:52 Billy Olsen bug task added neutron (Ubuntu)
2016-08-29 22:13:26 Billy Olsen attachment added trusty patch based on -proposed https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1453264/+attachment/4730270/+files/lp1453264.debdiff
2016-08-29 22:14:39 Billy Olsen bug added subscriber Ubuntu Sponsors Team
2016-08-29 22:16:58 Billy Olsen bug task added cloud-archive
2016-08-29 22:17:10 Billy Olsen nominated for series cloud-archive/icehouse
2016-08-30 16:44:16 Mathew Hodson tags in-feature-pecan in-stable-kilo in-feature-pecan in-stable-kilo trusty
2016-08-30 16:48:16 Mathew Hodson bug watch removed http://bugs.python.org/issue21239
2016-08-30 16:48:47 Mathew Hodson neutron (Ubuntu): importance Undecided Medium
2016-08-31 01:51:28 Yoshi Kadokawa bug added subscriber Yoshi Kadokawa
2016-08-31 12:11:14 Corey Bryant bug task added cloud-archive/icehouse
2016-08-31 12:45:20 Launchpad Janitor branch linked lp:~ubuntu-server-dev/neutron/icehouse
2016-08-31 12:54:36 Corey Bryant cloud-archive/icehouse: status New Fix Committed
2016-08-31 12:55:53 Corey Bryant nominated for series Ubuntu Trusty
2016-08-31 12:55:53 Corey Bryant bug task added neutron (Ubuntu Trusty)
2016-08-31 12:56:18 Corey Bryant neutron (Ubuntu): status New Invalid
2016-08-31 12:56:25 Corey Bryant neutron (Ubuntu Trusty): status New Fix Committed
2016-08-31 12:56:31 Corey Bryant cloud-archive: status New Invalid
2016-08-31 12:56:41 Corey Bryant summary iptables_manager can run very slowly when a large number of security group rules are present [SRU] iptables_manager can run very slowly when a large number of security group rules are present
2016-08-31 20:18:33 Mathew Hodson neutron (Ubuntu Trusty): importance Undecided Medium
2016-08-31 20:22:50 Mathew Hodson neutron (Ubuntu): status Invalid Fix Released
2016-08-31 20:23:21 Mathew Hodson cloud-archive: status Invalid Fix Released
2016-08-31 20:25:07 Mathew Hodson neutron (Ubuntu Trusty): status Fix Committed In Progress
2016-09-06 12:41:44 Martin Pitt neutron (Ubuntu Trusty): status In Progress Fix Committed
2016-09-06 12:41:48 Martin Pitt bug added subscriber Ubuntu Stable Release Updates Team
2016-09-06 12:41:52 Martin Pitt bug added subscriber SRU Verification
2016-09-06 12:42:00 Martin Pitt tags in-feature-pecan in-stable-kilo trusty in-feature-pecan in-stable-kilo trusty verification-needed
2016-09-06 12:42:15 Martin Pitt removed subscriber Ubuntu Sponsors Team
2016-09-07 16:42:43 Billy Olsen tags in-feature-pecan in-stable-kilo trusty verification-needed in-feature-pecan in-stable-kilo trusty verification-done
2016-09-07 16:43:02 Billy Olsen tags in-feature-pecan in-stable-kilo trusty verification-done in-feature-pecan in-stable-kilo verification-done
2016-09-14 11:57:45 Martin Pitt removed subscriber Ubuntu Stable Release Updates Team
2016-09-14 12:02:01 Launchpad Janitor neutron (Ubuntu Trusty): status Fix Committed Fix Released