Rally tests failed with "NeutronClientException: <html><body><h1>504 Gateway Time-out</h1>"

Bug #1398777 reported by Sergey Galkin
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Ilya Pekelny

Bug Description

astute_sha: 1da516b88d1a8d0014d78ab0d796e5b08379a59b
auth_required: true
build_id: 2014-11-30_11-15-26
build_number: '24'
feature_groups:
- mirantis
fuellib_sha: bbf26b499bf47ca41302ba6f62c3ebc5a493013d
fuelmain_sha: f324b592399c544eace2f64cb499564da01ab38c
nailgun_sha: 58e5f47457a0e832c005ce350e01b75a0c01b90a
ostf_sha: dc66fd39d4d035bb972e4c0225591290593c459d
production: docker
release: '6.0'

Steps to reproduce
1. Deploy cluster with 3 controllers in HA with 97 compute nodes on Ubuntu with cinder + neiutron gre
2. Run rally scenario
{
  "kw": {
    "runner": {
      "type": "constant",
      "concurrency": 20,
      "times": 97
    },
    "args": {
      "network_create_args": {},
      "ports_per_network": 50,
      "port_create_args": {}
    },
    "context": {
      "quotas": {
        "neutron": {
          "network": -1,
          "port": -1
        }
      },
      "users": {
        "project_domain": "default",
        "users_per_tenant": 2,
        "user_domain": "default",
        "tenants": 2,
        "resource_management_workers": 30
      }
    }
  },
  "name": "NeutronNetworks.create_and_delete_ports",
  "pos": 0
}

During the last run there were 3 failures in Neutron create_and_delete_ports [97 iterations, 20 threads]. All failures have the same symptoms: "NeutronClientException: <html><body><h1>504 Gateway Time-out</h1>" meaning that haproxy terminated the request due to timeout.

The issue is caused by failure in connection to rabbit. Neutron tried to restore the connection, but it took 87 seconds to do this. Haproxy waited only 60 seconds and replied the client with 504 time out.

Log details:
~~~~~~~~~
1. Rally sends request to delete the port:
REQ: curl -i http://172.16.44.16:9696//v2.0/ports/2a64353b-4c56-4798-993f-31dc89ca0ca8.json -X DELETE -H "X-Auth-Token: 0ecdccedee894651af6a30a542219f89
" -H "User-Agent: python-neutronclient"
 http_log_req /opt/stack/.venv/lib/python2.7/site-packages/neutronclient/common/utils.py:140
2014-12-03 05:23:30.172 21874 ERROR rally.benchmark.runners.base [-] <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base Traceback (most recent call last):
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/runners/base.py",
line 73, in _run_scenario_once
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base method_name)(**kwargs) or scenario_output
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/scenarios/neutron/
network.py", line 293, in create_and_delete_ports
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base self._delete_port(port)
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/scenarios/base.py"
, line 256, in func_atomic_actions
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base f = func(self, *args, **kwargs)
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/scenarios/neutron/
utils.py", line 248, in _delete_port
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base self.clients("neutron").delete_port(port['port']['id'])
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py", lin
e 98, in with_params
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base ret = self.function(instance, *args, **kwargs)
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py", lin
e 332, in delete_port
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base return self.delete(self.port_path % (port))
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 1316, in delete
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base headers=headers, params=params)
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 1297, in retry_request
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base headers=headers, params=params)
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 1251, in do_request
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base self._handle_fault_response(status_code, replybody)
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 1216, in _handle_fault_response
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base exception_handler_v20(status_code, des_error_body)
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/neutronclient/v2_0/client.py", line 79, in exception_handler_v20
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base message=message)
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base NeutronClientException: <html><body><h1>504 Gateway Time-out</h1>
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base The server didn't respond in time.
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base </body></html>
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base
2014-12-03 05:23:30.172 21874 TRACE rally.benchmark.runners.base
2014-12-03 05:23:30.174 21874 INFO rally.benchmark.runners.base [-] Task 64177c4e-b85f-45c6-8d1d-a006a9a9fb12 | ITER: 36 END: Error NeutronClientException: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

2. Neutron fails to publish message into queue:
2014-12-03T05:22:30.964094+00:00 debug: 2014-12-03 05:22:30.966 8564 DEBUG neutron.plugins.ml2.plugin [req-3d4c5a65-c521-4c94-8183-b29122336bb9 None] Calling delete_port for acd1371c-7d20-40dd-9450-dddadd21473f owned by delete_port /usr/lib/python2.7/dist-packages/neutron/plugins/ml2/plugin.py:1018
2014-12-03T05:22:30.982134+00:00 err: 2014-12-03 05:22:30.983 8564 ERROR oslo.messaging._drivers.impl_rabbit [req-3d4c5a65-c521-4c94-8183-b29122336bb9] Failed to publish message to topic 'q-agent-notifier-security_group-update': [Errno 32] Broken pipe
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit Traceback (most recent call last):
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py",
 line 681, in ensure
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit return method()
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py",
 line 784, in _publish
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit publisher = cls(self.conf, self.channel, topic=topic, **kwargs)
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 410, in __init__
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit None, type='fanout', **options)
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 340, in __init__
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit self.reconnect(channel)
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/oslo/messaging/_drivers/impl_rabbit.py", line 348, in reconnect
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit routing_key=self.routing_key)
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 84, in __init__
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit self.revive(self._channel)
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 218, in revive
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit self.declare()
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/messaging.py", line 104, in declare
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit self.exchange.declare()
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/kombu/entity.py", line 166, in declare
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit nowait=nowait, passive=passive,
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/channel.py", line 613, in exchange_declare
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit self._send_method((40, 10), args)
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/abstract_channel.py", line 56, in _send_method
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit self.channel_id, method_sig, args, content,
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/method_framing.py", line 221, in write_method
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit write_frame(1, channel, payload)
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/amqp/transport.py", line 177, in write_frame
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit frame_type, channel, size, payload, 0xce,
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 357, in sendall
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit tail = self.send(data, flags)
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit File "/usr/lib/python2.7/dist-packages/eventlet/greenio.py", line 340, in send
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit total_sent += fd.send(data[total_sent:], flags)
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit error: [Errno 32] Broken pipe
2014-12-03 05:22:30.983 8564 TRACE oslo.messaging._drivers.impl_rabbit

3. Tries to reconnect:
2014-12-03T05:22:30.999167+00:00 info: 2014-12-03 05:22:30.989 8564 INFO oslo.messaging._drivers.impl_rabbit [req-3d4c5a65-c521-4c94-8183-b29122336bb9 ] Delaying reconnect for 5.0 seconds ...
2014-12-03T05:22:35.991066+00:00 info: 2014-12-03 05:22:35.992 8564 INFO oslo.messaging._drivers.impl_rabbit [req-3d4c5a65-c521-4c94-8183-b29122336bb9 ] Connecting to AMQP server on 127.0.0.1:5673

4. Haproxy timeout exceeded
2014-12-03T05:23:30.084202+00:00 info: 172.16.44.10:58698 [03/Dec/2014:05:22:30.085] neutron neutron/node-32 0/0/0/-1/60000 504 194 - - sH-- 17/17/17/8/0 0/0 "DELETE //v2.0/ports/acd1371c-7d20-40dd-9450-dddadd21473f.json HTTP/1.1"

5. Connection to rabbit restored, but haproxy already replied client with 504 error:
2014-12-03T05:23:57.802205+00:00 info: 2014-12-03 05:23:57.802 8564 INFO oslo.messaging._drivers.impl_rabbit [req-3d4c5a65-c521-4c94-8183-b29122336bb9 ] Connected to AMQP server on 127.0.0.1:5673
2014-12-03T05:23:57.821790+00:00 debug: 2014-12-03 05:23:57.820 8564 DEBUG neutron.policy [req-3d4c5a65-c521-4c94-8183-b29122336bb9 None] Enforcing rules: ['get_port:binding:host_id'] _build_match_rule /usr/lib/python2.7/dist-packages/neutron/policy.py:221
2014-12-03T05:23:57.821790+00:00 debug: 2014-12-03 05:23:57.821 8564 DEBUG neutron.policy [req-3d4c5a65-c521-4c94-8183-b29122336bb9 None] Enforcing rules: ['get_port:binding:profile'] _build_match_rule /usr/lib/python2.7/dist-packages/neutron/policy.py:221
2014-12-03T05:23:57.821790+00:00 debug: 2014-12-03 05:23:57.822 8564 DEBUG neutron.policy [req-3d4c5a65-c521-4c94-8183-b29122336bb9 None] Enforcing rules: ['get_port:binding:vif_details'] _build_match_rule /usr/lib/python2.7/dist-packages/neutron/policy.py:221
2014-12-03T05:23:57.821790+00:00 info: 2014-12-03 05:23:57.819 8572 INFO neutron.wsgi [req-e98983f1-681b-400a-9c14-1d38c4650ccf None] 172.16.44.16 - - [03/Dec/2014 05:23:57] "DELETE //v2.0/ports/159ab018-ddc4-4e64-8520-8dff27bcc112.json HTTP/1.1" 204 168 0.380729

Tags: scale
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Rabbit is considered to be the root cause, hence assigning to MOS Linux

Changed in mos:
assignee: nobody → MOS Linux (mos-linux)
Changed in mos:
milestone: none → 6.1
importance: Undecided → High
assignee: MOS Linux (mos-linux) → Pavel Boldin (pboldin)
Revision history for this message
Pavel Boldin (pboldin) wrote :

We require rabbit logs. Can you provide them?

Changed in mos:
status: New → Incomplete
assignee: Pavel Boldin (pboldin) → Sergey Galkin (sgalkin)
Sergey Galkin (sgalkin)
Changed in mos:
assignee: Sergey Galkin (sgalkin) → Aleksander Mogylchenko (amogylchenko)
Changed in mos:
assignee: Aleksander Mogylchenko (amogylchenko) → Ilya Pekelny (i159)
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

The bug is hanging in incomplete for more than a month, hence moving it to invalid

Changed in mos:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.