instance creation fails on compute nodes with > 175 instances with neutron security groups enabled

Bug #1314189 reported by James Page
40
This bug affects 6 people
Affects Status Importance Assigned to Milestone
neutron
Expired
Medium
Unassigned
neutron (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Setup: Neutron ML2 plugin, OpenvSwitch mechanism driver, Neutron Security Groups enabled.

When compute nodes get busy with lots of instances (> 175), the neutron-openvswitch-agents start to take a long time to parse and update iptables firewall rules; we see the agent state go inactive and after a while nova gives up waiting for the notification from neutron to say that everything is setup and just tears down the instance (which was in paused state awaiting port creation I think).

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: neutron-plugin-openvswitch-agent 1:2014.1-0ubuntu1
ProcVersionSignature: User Name 3.13.0-24.46-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
ApportVersion: 2.14.1-0ubuntu3
Architecture: amd64
Date: Tue Apr 29 12:49:49 2014
PackageArchitecture: all
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: neutron
UpgradeStatus: No upgrade log present (probably fresh install)
modified.conffile..etc.neutron.rootwrap.d.openvswitch.plugin.filters: [deleted]

Revision history for this message
James Page (james-page) wrote :
Revision history for this message
James Page (james-page) wrote :
Download full text (3.4 KiB)

Excerpt from agent log:

2014-04-29 12:32:47.261 2503 INFO neutron.agent.securitygroups_rpc [req-9440b702-7bfb-4e55-ae19-c8213c6d9f35 None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:32:47.432 2503 INFO neutron.agent.securitygroups_rpc [req-b628d8d3-198c-4cfe-a685-977256aa302c None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:32:48.152 2503 INFO neutron.agent.securitygroups_rpc [-] Refresh firewall rules
2014-04-29 12:35:24.912 2503 INFO neutron.agent.securitygroups_rpc [req-bd4577ec-c360-44df-bdc1-2fa6a5b1904e None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.913 2503 INFO neutron.agent.securitygroups_rpc [req-5779bb3f-053e-4348-9658-6c2dd6fcb9c8 None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.915 2503 INFO neutron.agent.securitygroups_rpc [req-d5e828ba-e06f-4248-bb2c-fda3d335e690 None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.915 2503 INFO neutron.agent.securitygroups_rpc [req-4e0f44e2-db23-4411-9ba8-fcdfb7d2b91e None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.916 2503 INFO neutron.agent.securitygroups_rpc [req-d0e230f5-f8c3-47c7-be66-47be3741a719 None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.917 2503 INFO neutron.agent.securitygroups_rpc [req-5975fd3e-6984-454d-9272-ab3fa0389455 None] Security group member updated [u'92232d3d-170a-49e1-8f70-f83187ddf38f']
2014-04-29 12:35:24.918 2503 INFO neutron.agent.securitygroups_rpc [req-d76365ad-19a2-40c3-8c33-4fe1fec9cfce None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.919 2503 INFO neutron.agent.securitygroups_rpc [req-d1fcd223-adb7-44b8-b693-55f0d6639604 None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.920 2503 INFO neutron.agent.securitygroups_rpc [req-ff06e8ab-10f2-4650-a077-016076aaee94 None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.921 2503 INFO neutron.agent.securitygroups_rpc [req-1420521e-17f4-4b48-8d4a-b5e21c4bead4 None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.922 2503 INFO neutron.agent.securitygroups_rpc [req-728af0ac-cbe9-4d8c-b777-126dadc03abc None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.922 2503 INFO neutron.agent.securitygroups_rpc [req-a6d4a096-f35c-4373-8275-572a6c46e27c None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.923 2503 INFO neutron.agent.securitygroups_rpc [req-a2de92e0-8223-4346-a052-e429ac3b1755 None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:24.924 2503 INFO neutron.agent.securitygroups_rpc [req-13199ed9-4115-40e4-9245-a6a8c3c5eb1a None] Security group member updated [u'bd7d68e6-6598-4a65-afd4-395f7fd7aff9']
2014-04-29 12:35:30.378 2503 INFO neutron.agent.securitygroups_rpc [-] Preparing filters ...

Read more...

Revision history for this message
James Page (james-page) wrote :

Disabling neutron security groups in the openvswitch agent works around this with an obvious side effect. I can push the same compute nodes well pass 300 instances, some degradation in time to provision but seems to stablize.

Changed in neutron:
importance: Undecided → Medium
James Page (james-page)
tags: added: sm15k
James Page (james-page)
Changed in neutron (Ubuntu):
importance: Undecided → Medium
summary: - instance creation fails on compute nodes with > 175 instances
+ instance creation fails on compute nodes with > 175 instances with
+ neutron security groups enabled
Revision history for this message
Sudhakar Gariganti (sudhakar-gariganti) wrote :

Hello James,

Have you had a chance to look at https://review.openstack.org/#/c/77549/ ??

Looks similar.

Changed in neutron (Ubuntu):
assignee: nobody → Sudhakar (sudhakar-gariganti)
Changed in neutron:
assignee: nobody → Sudhakar (sudhakar-gariganti)
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in neutron (Ubuntu):
status: New → Confirmed
tags: added: loadimpact
Changed in neutron:
status: New → Confirmed
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

This is a known issue in how neutron passes security group rules thru messaging queue. There are several spec proposals for Juno to solve the issue:

https://review.openstack.org/#/c/104522/
https://review.openstack.org/#/c/100761/

Revision history for this message
James Page (james-page) wrote :

This might be better in juno due to the use of ipset to management groups within iptables rulebases.

We'll re-test on the next big-ish load test we do and see how things fare!

Changed in neutron:
assignee: Sudhakar Gariganti (sudhakar-gariganti) → nobody
Changed in neutron (Ubuntu):
assignee: Sudhakar Gariganti (sudhakar-gariganti) → nobody
Revision history for this message
Ryan Moats (rmoats) wrote :

This report has gone ~1 year without activity. Marking it as incomplete to start the 60 day clean up timer - if this issue is still valid, please update the defect

Changed in neutron:
status: Confirmed → Incomplete
Changed in neutron (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron (Ubuntu) because there has been no activity for 60 days.]

Changed in neutron (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for neutron because there has been no activity for 60 days.]

Changed in neutron:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.