neutron

Bug #1253993
Comment #15

Comment 15 for bug 1253993

Revision history for this message

Sudhakar Gariganti (sudhakar-gariganti) wrote on 2014-03-27:

#15

There were concerns around the patch ‘https://review.openstack.org/#/c/77549/’ being properly tested as it involves logic change.

We have done significant testing with this patch and want to share few results from our experiments.

We were basically trying to see how many VMs we can scale with the OVS agent in use. With default security groups(which has remote security group), beyond 250-300 VMs, VMs were not able to get DHCP IPs. We were having 16 CNs, with VMs uniformly distributed across them. The VM image had a wait period of 120 secs to receive the DHCP response.
By the time we have around 18-19 VMs on each CN(there were around 6k Iptable rules), each RPC loop was taking close to 140 seconds(if there is any update). And the reason VMs were not getting IPs was that the Iptable rules required for the VM to send out the DHCP request were not in place before the 120 secs wait period. Upon further investigations we discovered that the "for loop searching iptable rules" in _modify_rules method of iptables_manger.py is eating a big chunk of the overall time spent.

After this patch, we were able to see close to 680 VMs were able to get IPs. The number of Iptable rules at this point was close to 20K, with around 40 VMs per CN.

To summarize, we were able to increase the processing capability of compute node from 6K Iptable rules to 20K Iptable rules, which helped more VMs get DHCP IP within the 120 sec wait period. You can imagine the situation when the wait time is less than 120 secs.

Carl and Salvatore opined that we should create a new bug/enhancement and link the patch to that. Makes sense as this patch doesn't solve this bug completely. But at the same time, I want to state that, this patch is not something totally unrelated to this bug. :)