Comment 4 for bug 1399168

Revision history for this message
Alexander Ignatov (aignatov) wrote :

Nastya, this is original description from the bug filed in upstream.

The RCA of this bug is algorithm for modifying iptables rules on each compute. So the main factor of this issue is a number of VMs per compute and number of security groups rules.

If we read comments from upstream issue we can find that bug submitter talks about 100 VMs per compute approximately:
"We have got 11 hosts, one is the controller, other 10 are compute nodes." (c)

So I did several checks to verify the patch:

TestCase1 (measuring a time of updating iptables rules):

Prerequisites:
- ~500 rules in security group "test-sg" to attach for VMs
- already deployed 96 VMs on the same compute using an availability-zone which contains the same compute for each VM
- flavor is not important here, cirros image

1. When all pre-deployed VMs are Active check that openvswich-agent doesn't consume 100% CPU
2. Start booting 1 new VM with "test-sg" security-group and start measuring time when ovs-agent consumes 100% CPU

Expected result: 100% CPU usage for ~10 sec, VM is booted success
Note: before this fix you could check that VM is being booted too long (8-10 mins and could run into Error state) and CPU about 100% for the whole period of booting

Second check is simple:

1. Deploy 10 VMs on dedicated compute, attach "test-sg" for each instance.
2. Deploy 20 VMs on dedicated compute, attach "test-sg" for each instance.
3 Deploy 50 VMs on dedicated compute, attach "test-sg" for each instance.

Expected result: all 80 VMs are up and runnig
Note: before fix number of failed VMs was about 90%-30%

All my tests are passed I assume that this bug is fixed correctly and proposing to merge it.