instance creation fails on compute nodes with > 175 instances with neutron security groups enabled

Bug #1314189 reported by James Page on 2014-04-29
32
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Medium
Unassigned
neutron (Ubuntu)
Medium
Unassigned

Bug Description

Setup: Neutron ML2 plugin, OpenvSwitch mechanism driver, Neutron Security Groups enabled.

When compute nodes get busy with lots of instances (> 175), the neutron-openvswitch-agents start to take a long time to parse and update iptables firewall rules; we see the agent state go inactive and after a while nova gives up waiting for the notification from neutron to say that everything is setup and just tears down the instance (which was in paused state awaiting port creation I think).

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: neutron-plugin-openvswitch-agent 1:2014.1-0ubuntu1
ProcVersionSignature: User Name 3.13.0-24.46-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
Date: Tue Apr 29 12:49:49 2014
PackageArchitecture: all
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: neutron
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.neutron.rootwrap.d.openvswitch.plugin.filters: [deleted]

James Page (james-page) wrote :
James Page (james-page) wrote :
Download full text (3.4 KiB)

Excerpt from agent log:

2014-04-29 12:32:47.261 2503 INFO neutron.agent.securitygroups_rpc [req-9440b702-7bfb-4e55-ae19-c8213c6d9f35 None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:32:47.432 2503 INFO neutron.agent.securitygroups_rpc [req-b628d8d3-198c-4cfe-a685-977256aa302c None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:32:48.152 2503 INFO neutron.agent.securitygroups_rpc [-] Refresh firewall rules
2014-04-29 12:35:24.912 2503 INFO neutron.agent.securitygroups_rpc [req-bd4577ec-c360-44df-bdc1-2fa6a5b1904e None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.913 2503 INFO neutron.agent.securitygroups_rpc [req-5779bb3f-053e-4348-9658-6c2dd6fcb9c8 None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.915 2503 INFO neutron.agent.securitygroups_rpc [req-d5e828ba-e06f-4248-bb2c-fda3d335e690 None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.915 2503 INFO neutron.agent.securitygroups_rpc [req-4e0f44e2-db23-4411-9ba8-fcdfb7d2b91e None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.916 2503 INFO neutron.agent.securitygroups_rpc [req-d0e230f5-f8c3-47c7-be66-47be3741a719 None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.917 2503 INFO neutron.agent.securitygroups_rpc [req-5975fd3e-6984-454d-9272-ab3fa0389455 None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.918 2503 INFO neutron.agent.securitygroups_rpc [req-d76365ad-19a2-40c3-8c33-4fe1fec9cfce None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.919 2503 INFO neutron.agent.securitygroups_rpc [req-d1fcd223-adb7-44b8-b693-55f0d6639604 None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.920 2503 INFO neutron.agent.securitygroups_rpc [req-ff06e8ab-10f2-4650-a077-016076aaee94 None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.921 2503 INFO neutron.agent.securitygroups_rpc [req-1420521e-17f4-4b48-8d4a-b5e21c4bead4 None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.922 2503 INFO neutron.agent.securitygroups_rpc [req-728af0ac-cbe9-4d8c-b777-126dadc03abc None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.922 2503 INFO neutron.agent.securitygroups_rpc [req-a6d4a096-f35c-4373-8275-572a6c46e27c None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.923 2503 INFO neutron.agent.securitygroups_rpc [req-a2de92e0-8223-4346-a052-e429ac3b1755 None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.924 2503 INFO neutron.agent.securitygroups_rpc [req-13199ed9-4115-40e4-9245-a6a8c3c5eb1a None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:30.378 2503 INFO neutron.agent.securitygroups_rpc [-] Preparing filters ...

Read more...

James Page (james-page) wrote :

Disabling neutron security groups in the openvswitch agent works around this with an obvious side effect. I can push the same compute nodes well pass 300 instances, some degradation in time to provision but seems to stablize.

Changed in neutron:
importance: Undecided → Medium
James Page (james-page) on 2014-05-11
tags: added: sm15k
James Page (james-page) on 2014-05-11
Changed in neutron (Ubuntu):
importance: Undecided → Medium
summary: - instance creation fails on compute nodes with > 175 instances
+ instance creation fails on compute nodes with > 175 instances with
+ neutron security groups enabled

Hello James,

Have you had a chance to look at https://review.openstack.org/#/c/77549/ ??

Looks similar.

Changed in neutron (Ubuntu):
assignee: nobody → Sudhakar (sudhakar-gariganti)
Changed in neutron:
assignee: nobody → Sudhakar (sudhakar-gariganti)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in neutron (Ubuntu):
status: New → Confirmed
tags: added: loadimpact
Changed in neutron:
status: New → Confirmed

This is a known issue in how neutron passes security group rules thru messaging queue. There are several spec proposals for Juno to solve the issue:

https://review.openstack.org/#/c/104522/
https://review.openstack.org/#/c/100761/

James Page (james-page) wrote :

This might be better in juno due to the use of ipset to management groups within iptables rulebases.

We'll re-test on the next big-ish load test we do and see how things fare!

Changed in neutron:
assignee: Sudhakar Gariganti (sudhakar-gariganti) → nobody
Changed in neutron (Ubuntu):
assignee: Sudhakar Gariganti (sudhakar-gariganti) → nobody
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers