Port creation is slow due to reconnections in Rabbit

Bug #1410797 reported by Ilya Shakhat
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Eugene Nikanorov
5.0.x
Invalid
High
MOS Oslo
5.1.x
Invalid
High
MOS Oslo
6.0.x
Fix Released
High
Alexey Khivin
6.1.x
Fix Released
High
Eugene Nikanorov

Bug Description

Neutron port creation takes 12 seconds, out of this 10 seconds are spent by Rabbit waiting for reconnection. Steps to reproduce:
 0. Have freshly installed MOS
 1. Do "neutron port-create net04"

While the command itself works fast the overall process takes ~12 sec (on virtual environment). There are 2 message-related warnings in neutron-server.log:
 * Failed to publish message to topic 'q-agent-notifier-security_group-update': [Errno 32] Broken pipe
 * Failed to publish message to topic 'reply_6c04138e198b4259b2dc8f5dd6bec697': [Errno 32] Broken pipe

Revision history for this message
Ilya Shakhat (shakhat) wrote :

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.0"
  api: "1.0"
  build_number: "56"
  build_id: "2014-12-18_01-32-01"
  astute_sha: "16b252d93be6aaa73030b8100cf8c5ca6a970a91"
  fuellib_sha: "73332192a257ea02c40a39885c502ad1ebdf3eda"
  ostf_sha: "a9afb68710d809570460c29d6c3293219d3624d4"
  nailgun_sha: "5f91157daa6798ff522ca9f6d34e7e135f150a90"
  fuelmain_sha: "45caacadb878abfbd9d60e134d72229698b469c9"

DEPLOYMENT: HA + Ubuntu + no services

(.venv)developer@fuel:stack$ fuel node
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|----------|------------------|---------|-----------|-------------------|------------|---------------|--------|---------
7 | ready | Untitled (42:eb) | 2 | 10.20.1.9 | 4e:2a:0d:fd:7d:45 | compute | | True | 2
6 | ready | Untitled (f0:b6) | 2 | 10.20.1.6 | ba:3d:63:41:ae:47 | controller | | True | 2
5 | ready | Untitled (14:c5) | 2 | 10.20.1.5 | 96:f2:db:f9:0c:41 | controller | | True | 2

Revision history for this message
Ilya Shakhat (shakhat) wrote :
Download full text (3.3 KiB)

The timeline of the issue is following:

node-5
---------
12:49:00.085 20403 INFO neutron.wsgi [-] (20403) accepted ('192.168.0.2', 59527) -- request from the client is accepted
12:49:00.333 20403 ERROR oslo.messaging._drivers.impl_rabbit [req-d1ffc34f-0920-4471-8138-25317c81ee40 ] Failed to publish message to topic 'q-agent-notifier-security_group-update': [Errno 32] Broken pipe -- server tries to notify the agent on node-7
12:49:00.343 20403 INFO oslo.messaging._drivers.impl_rabbit [req-d1ffc34f-0920-4471-8138-25317c81ee40 ] Delaying reconnect for 5.0 seconds ...

~~~~ 5 seconds ~~~~~~

12:49:05.366 20403 INFO oslo.messaging._drivers.impl_rabbit [req-d1ffc34f-0920-4471-8138-25317c81ee40 ] Connected to AMQP server on 127.0.0.1:5673

node-7
---------
12:49:05.374 556 DEBUG neutron.agent.securitygroups_rpc [req-d1ffc34f-0920-4471-8138-25317c81ee40 None] Security group member updated on remote: [u'78067271-205f-4204-a896-9ed4de2ab3b1'] security_groups_member_updated /usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py:134 -- got message from the server
12:49:05.376 556 DEBUG neutron.agent.securitygroups_rpc [req-d1ffc34f-0920-4471-8138-25317c81ee40 None] Adding [u'e1259852-4a2b-4b67-9f9f-9a8f598e44a1'] devices to the list of devices for which firewall needs to be refreshed _security_group_updated /usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py:246

waiting for the next rpc_loop

12:49:07.036 556 DEBUG neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-73968cf7-74ca-4c73-b734-05000a5255d0 None] Agent rpc_loop - iteration:4057 - port information retrieved. Elapsed:0.412 rpc_loop /usr/lib/python2.7/dist-packages/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py:1396
12:49:07.040 556 INFO neutron.agent.securitygroups_rpc [req-73968cf7-74ca-4c73-b734-05000a5255d0 None] Refresh firewall rules
12:49:07.041 556 DEBUG neutron.agent.securitygroups_rpc [req-73968cf7-74ca-4c73-b734-05000a5255d0 None] Get security group information for devices via rpc set([u'e1259852-4a2b-4b67-9f9f-9a8f598e44a1']) security_group_info_for_devices /usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py:96

node-6
---------
12:49:07.056 16444 DEBUG neutron.plugins.ml2.db [req-73968cf7-74ca-4c73-b734-05000a5255d0 None] get_port_and_sgs() called for port_id e1259852-4a2b-4b67-9f9f-9a8f598e44a1 get_port_and_sgs /usr/lib/python2.7/dist-packages/neutron/plugins/ml2/db.py:212
12:49:07.308 16444 ERROR oslo.messaging._drivers.impl_rabbit [-] Failed to publish message to topic 'reply_6c04138e198b4259b2dc8f5dd6bec697': [Errno 32] Broken pipe
12:49:07.325 16444 INFO oslo.messaging._drivers.impl_rabbit [-] Delaying reconnect for 5.0 seconds ...

~~~~ 5 seconds ~~~~~~

12:49:12.349 16444 INFO oslo.messaging._drivers.impl_rabbit [-] Connected to AMQP server on 192.168.0.3:5673

node-7
---------
12:49:12.358 556 DEBUG neutron.agent.securitygroups_rpc [req-73968cf7-74ca-4c73-b734-05000a5255d0 None] Update port filter for e1259852-4a2b-4b67-9f9f-9a8f598e44a1 refresh_firewall /usr/lib/python2.7/dist-packages/neutron/agent/securitygroups_rpc.py:291
12:49:12.844 556 DEBUG neutron.plugins.openvswitch.agent.ovs_neutron_agent [req-73968...

Read more...

description: updated
tags: added: neutron oslo
Changed in mos:
milestone: none → 6.1
importance: Undecided → Medium
Revision history for this message
Ilya Shakhat (shakhat) wrote :
Ilya Shakhat (shakhat)
Changed in mos:
assignee: nobody → MOS Oslo (mos-oslo)
Revision history for this message
Ilya Shakhat (shakhat) wrote :
Changed in mos:
importance: Medium → High
Revision history for this message
Ilya Pekelny (i159) wrote :

We are building a simple stress testing tool for RabbitMQ and Oslo Messaging RabbitMQ driver. The analysis and fixing processes of the bug will be much easier with the tool. If we could postpone fixing in a few days, maybe around a week, we will test the situation with our tool. Now we have lack of analysis tools and we have no proofs that the problem is in Rabbit itself or in Messaging or in the environment.

Revision history for this message
Alexander Ignatov (aignatov) wrote :

Any updates on this issue?

Revision history for this message
Ilya Shakhat (shakhat) wrote :

Any update after a month of investigations?

Revision history for this message
Dmitry Savenkov (dsavenkov) wrote :

It's very likely to be the case that it won't be fixed by any of 6.0.x releases due to a number of systemic issues that we hope to resolve when there are obtained some results of stress-testing.

tags: added: heartbeats
Revision history for this message
Alexander Ignatov (aignatov) wrote :

This is a problem in 6.x releases only with unworking olso.messaging heartbeats with Neutron.

Revision history for this message
OSCI Robot (oscirobot) wrote :

package neutron has been built for project openstack/neutron
Package version == 2014.2.2, package release == fuel6.1.mira18.git.2d2b62c.77552d2

Changeset: https://review.fuel-infra.org/4768
project: openstack/neutron
branch: openstack-ci/fuel-6.1/2014.2
author: Eugene Nikanorov
committer: Eugene Nikanorov
subject: Start rabbitmq driver heartbeat thread after a fork
status: patchset-created

Files placed on repository:

NOTE: Changeset is not merged, created temporary package repository.
 repository URL: /centos-fuel-6.1-stable-4768/

Revision history for this message
OSCI Robot (oscirobot) wrote :

package neutron has been built for project openstack/neutron
Package version == 2014.2.2, package release == fuel6.1~mira16+git.2d2b62c.77552d2

Changeset: https://review.fuel-infra.org/4768
project: openstack/neutron
branch: openstack-ci/fuel-6.1/2014.2
author: Eugene Nikanorov
committer: Eugene Nikanorov
subject: Start rabbitmq driver heartbeat thread after a fork
status: patchset-created

Files placed on repository:

NOTE: Changeset is not merged, created temporary package repository.
 repository URL: /trusty-fuel-6.1-stable-4768/

Revision history for this message
Timur Nurlygayanov (tnurlygayanov) wrote :

we have finished testing of MOS ISO with the custom fix for Neutron heartbeats (from Evgeny Nikonorov) and it works well under the manual destructive tests, looks like we are ready to merge the fix into MOS 6.1.

By the way, heartbeats will be disabled by default in MOS 6.1, but if users will face some issues on HA environments with Neutron and RabbitMQ, they can enable heartbeats and it will probably help to fix some issues and debug them.
(in MOS 6.0 heartbeats were enabled, but don't work for Neutron components and Neutron team found the way to fix it in 6.1).

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Shouldn't we just reduce the kombu reconnect delay from 5 seconds? Why disabling the heartbeats instead?

Revision history for this message
Alexander Ignatov (aignatov) wrote :

Bogdan this issue is not related to kombu_reconnect_delay parameter at all. Issue is that Neutron can't work with current rabbit heartbeats implementation. That's why port creation and other Neutron api calls were very slow.

Revision history for this message
Kristina Berezovskaia (kkuznetsova) wrote :
Download full text (4.7 KiB)

Verify 6.1

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  openstack_version: "2014.2.2-6.1"
  api: "1.0"
  build_number: "395"
  build_id: "2015-05-08_11-08-49"
  nailgun_sha: "46f55c293e4540d31bcaa6ca3fba77235fb27537"
  python-fuelclient_sha: "af6c9c3799b9ec107bcdc6dbf035cafc034526ce"
  astute_sha: "6a4dcd11c67af2917815f3678fb594c7412a4c97"
  fuel-library_sha: "f385d6a58298c702f8d4f14c452dcffdc0b1e2a3"
  fuel-ostf_sha: "740ded337bb2a8a9b3d505026652512257375c01"
  fuelmain_sha: "3eca5e8f7ca6a83faff5feeca92c21cff31c0af1"

ubuntu, neutron + vlan, 3 controllers, 1 compute

Steps:
1) Deploy env
2) Create test_network
3) time port-create test_network

time neutron port-create test_net
Created a new port:
+-----------------------+--------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | |
| binding:host_id | |
| binding:profile | {} |
| binding:vif_details | {} |
| binding:vif_type | unbound |
| binding:vnic_type | normal |
| device_id | |
| device_owner | |
| fixed_ips | |
| id | baeee48f-00fa-43e8-ae59-4d994c654d78 |
| mac_address | fa:16:3e:ab:5c:06 |
| name | |
| network_id | efc0b265-7a74-4522-aa45-ceffefc5f202 |
| security_groups | f68728aa-e5be-4c71-a4cc-f523781dd30d |
| status | DOWN |
| tenant_id | 9b5f71a86aa64c5599dcbd522d65d8c0 |
+-----------------------+--------------------------------------+

real 0m1.824s
user 0m0.594s
sys 0m0.171s
4) time neutron port-create net04 --allowed-address-pairs type=dict list=true mac_address=fa:16:3e:48:10:a1,ip_address=10.100.0.0/16
Created a new port:
+-----------------------+--------------------------------------------------------------------------------------+
| Field | Value |
+-----------------------+--------------------------------------------------------------------------------------+
| admin_state_up | True |
| allowed_address_pairs | {"ip_address": "10.100.0.0/16", "mac_address": "fa:16:3e:48:10:a1"} |
| binding:host_id | |
| binding:profile | {} |
| binding:vif_details | {} ...

Read more...

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Eugene Nikanorov <email address hidden>
Review: https://review.fuel-infra.org/8031

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/neutron (openstack-ci/fuel-7.0/2015.1.0)

Change abandoned by Eugene Nikanorov <email address hidden> on branch: openstack-ci/fuel-7.0/2015.1.0
Review: https://review.fuel-infra.org/8031
Reason: not applicable with current oslo.messaging

Revision history for this message
René Schultz Madsen (p-rm-l) wrote :

Is there any progress on fixing this for 6.0?

Any patches that can be tested?

I'm having the same issue for a long time now.

Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.0.x and nominated for 6.0-updates to include into maint updates for 6.0

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/neutron (openstack-ci/fuel-6.0-updates/2014.2)

Fix proposed to branch: openstack-ci/fuel-6.0-updates/2014.2
Change author: Eugene Nikanorov <email address hidden>
Review: https://review.fuel-infra.org/11439

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/neutron (openstack-ci/fuel-6.0-updates/2014.2)

Reviewed: https://review.fuel-infra.org/11439
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-6.0-updates/2014.2

Commit: 64af864bd8e7a2219eea60ed408987226519953d
Author: Eugene Nikanorov <email address hidden>
Date: Fri Sep 11 09:45:54 2015

Start rabbitmq driver heartbeat thread after a fork

Do it after child process's eventlet has been reset

Closes-Bug: #1410797
Change-Id: I5c7197b4c46b05fffcece0a70646b2f92d04d5b0
(cherry picked from commit 2d2b62ca4d4a083f294dd8ce8dc9c35d90ce5681)

Revision history for this message
Vadim Rovachev (vrovachev) wrote :

Verified on 6.0 Ubuntu:
Packets:
neutron-common,neutron-dhcp-agent,neutron-l3-agent,neutron-metadata-agent,neutron-plugin-ml2,neutron-plugin-openvswitch-agent,neutron-server
Version:
1:2014.2-fuel6.0~mira27

Revision history for this message
Vadim Rovachev (vrovachev) wrote :

Verified on 6.0 Ubuntu:
Packets:
neutron-common,neutron-dhcp-agent,neutron-l3-agent,neutron-metadata-agent,neutron-plugin-ml2,neutron-plugin-openvswitch-agent,neutron-server
Version:
1:2014.2-fuel6.0~mira28

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.