nova can't create neutron security group

Bug #1427345 reported by Leontii Istomin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
High
Dina Belova

Bug Description

[root@fuel ~]# fuel --fuel-version
api: '1.0'
astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
auth_required: true
build_id: 2014-12-26_14-25-46
build_number: '58'
feature_groups:
- mirantis
fuellib_sha: fde8ba5e11a1acaf819d402c645c731af450aff0
fuelmain_sha: 81d38d6f2903b5a8b4bee79ca45a54b76c1361b8
nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
production: docker
release: '6.0'
release_versions:
  2014.2-6.0:
    VERSION:
      api: '1.0'
      astute_sha: 16b252d93be6aaa73030b8100cf8c5ca6a970a91
      build_id: 2014-12-26_14-25-46
      build_number: '58'
      feature_groups:
      - mirantis
      fuellib_sha: fde8ba5e11a1acaf819d402c645c731af450aff0
      fuelmain_sha: 81d38d6f2903b5a8b4bee79ca45a54b76c1361b8
      nailgun_sha: 5f91157daa6798ff522ca9f6d34e7e135f150a90
      ostf_sha: a9afb68710d809570460c29d6c3293219d3624d4
      production: docker
      release: '6.0'

Baremetal,Ubuntu, HA, Neutron-gre,Ceph-all, Debug, nova-quotas,6.0_release
Controllers:3 Computes:96

deployment was successful, but during rally tests I I got 483 errors from 970 iterations:

REQ: curl -i 'http://172.16.44.16:8774/v2/4b047b6ac9c742dc82ae84f5c5715880/os-security-group-rules' -X POST -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: {SHA1}466c7
07bfbcc961dd44285e89fa894af53828955" -d '{"security_group_rule": {"from_port": 19, "ip_protocol": "tcp", "to_port": 19, "parent_group_id": "619b87e6-4b13-484b-af8e-02229381032e", "cidr": "0.0.0.0/0", "group_id": null}}'
RESP: [504] {'connection': 'close', 'content-type': 'text/html', 'cache-control': 'no-cache'}
RESP BODY: null

2015-02-27 13:44:57.845 12963 ERROR rally.benchmark.runners.base [-] Unknown Error (HTTP 504)
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base Traceback (most recent call last):
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/runners/base.py", line 77, in _run_scenario_once
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base method_name)(**kwargs) or scenario_output
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/scenarios/nova/security_group.py", line 51, in create_and_delete_secgroups
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base security_group_count)
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/rally/benchmark/scenarios/nova/utils.py", line 603, in _create_security_groups
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base sg_name)
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/novaclient/v1_1/security_groups.py", line 49, in create
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base return self._create('/os-security-groups', body, 'security_group')
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/novaclient/base.py", line 100, in _create
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base _resp, body = self.api.client.post(url, body=body)
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/novaclient/client.py", line 490, in post
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base return self._cs_request(url, 'POST', **kwargs)
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/novaclient/client.py", line 465, in _cs_request
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base resp, body = self._time_request(url, method, **kwargs)
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/novaclient/client.py", line 439, in _time_request
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base resp, body = self.request(url, method, **kwargs)
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base File "/opt/stack/.venv/lib/python2.7/site-packages/novaclient/client.py", line 433, in request
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base raise exceptions.from_response(resp, body, url, method)
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base ClientException: Unknown Error (HTTP 504)
2015-02-27 13:44:57.845 12963 TRACE rally.benchmark.runners.base
2015-02-27 13:44:57.847 12963 INFO rally.benchmark.runners.base [-] Task 417ade1f-4e6c-4ae6-a40a-a55b1049bb69 | ITER: 880 END: Error ClientException: Unknown Error (HTTP 504)

root@node-32:~# grep nova- /var/log/haproxy.log.2 | awk '$11 == 504' | wc -l
482

root@node-32:~# grep nova- /var/log/haproxy.log.2 | awk '$11 == 504' | tail
<134>Feb 27 13:44:25 node-32 haproxy[15009]: 172.16.44.10:48822 [27/Feb/2015:13:43:25.197] nova-api-2 nova-api-2/node-85 1/0/0/-1/60003 504 194 - - sH-- 732/9/9/9/0 0/0 "POST /v2/4b047b6ac9c742dc82ae84f5c5715880/os-security-group-rules HTTP/1.1"
<134>Feb 27 13:44:35 node-32 haproxy[15009]: 172.16.44.10:48975 [27/Feb/2015:13:43:35.895] nova-api-2 nova-api-2/node-85 0/0/0/-1/60000 504 194 - - sH-- 754/19/19/13/0 0/0 "POST /v2/a6dba437a6414f81a3f456a4642b5402/os-security-groups HTTP/1.1"
<134>Feb 27 13:44:36 node-32 haproxy[15009]: 172.16.44.10:48992 [27/Feb/2015:13:43:36.402] nova-api-2 nova-api-2/node-85 0/0/0/-1/60000 504 194 - - sH-- 754/19/19/11/0 0/0 "POST /v2/064fd0de2969481f97d553ef5ff8cb9c/os-security-groups HTTP/1.1"
<134>Feb 27 13:44:37 node-32 haproxy[15009]: 172.16.44.10:49016 [27/Feb/2015:13:43:37.221] nova-api-2 nova-api-2/node-85 0/0/0/-1/60000 504 194 - - sH-- 754/19/19/10/0 0/0 "POST /v2/5017f17004194497b095efdfad3455a0/os-security-groups HTTP/1.1"
<134>Feb 27 13:44:40 node-32 haproxy[15009]: 172.16.44.10:49065 [27/Feb/2015:13:43:40.232] nova-api-2 nova-api-2/node-85 0/0/0/-1/60000 504 194 - - sH-- 754/19/19/9/0 0/0 "POST /v2/064fd0de2969481f97d553ef5ff8cb9c/os-security-group-rules HTTP/1.1"
<134>Feb 27 13:44:42 node-32 haproxy[15009]: 172.16.44.10:49096 [27/Feb/2015:13:43:42.487] nova-api-2 nova-api-2/node-85 0/0/0/-1/60001 504 194 - - sH-- 729/5/5/4/0 0/0 "POST /v2/cd279d5c0a7a4b3883d8cc131bcd228a/os-security-groups HTTP/1.1"
<134>Feb 27 13:44:51 node-32 haproxy[15009]: 172.16.44.10:49228 [27/Feb/2015:13:43:51.557] nova-api-2 nova-api-2/node-85 0/0/1/-1/60002 504 194 - - sH-- 728/4/4/3/0 0/0 "POST /v2/5017f17004194497b095efdfad3455a0/os-security-group-rules HTTP/1.1"
<134>Feb 27 13:44:57 node-32 haproxy[15009]: 172.16.44.10:49329 [27/Feb/2015:13:43:57.841] nova-api-2 nova-api-2/node-85 0/0/0/-1/60001 504 194 - - sH-- 736/12/10/6/0 0/0 "POST /v2/228d1ea6058e4500925f18bf5c584cd6/os-security-groups HTTP/1.1"
<134>Feb 27 13:44:59 node-32 haproxy[15009]: 172.16.44.10:49362 [27/Feb/2015:13:43:59.045] nova-api-2 nova-api-2/node-85 0/0/0/-1/60000 504 194 - - sH-- 761/19/19/8/0 0/0 "POST /v2/064fd0de2969481f97d553ef5ff8cb9c/os-security-group-rules HTTP/1.1"
<134>Feb 27 13:45:06 node-32 haproxy[15009]: 172.16.44.10:49553 [27/Feb/2015:13:44:06.600] nova-api-2 nova-api-2/node-85 0/0/0/-1/60000 504 194 - - sH-- 763/19/19/5/0 0/0 "POST /v2/11aec4e9e43942039f96b8a195166267/os-security-group-rules HTTP/1.1"

DG is coming

Tags: nova scale
Revision history for this message
Leontii Istomin (listomin) wrote :
Changed in mos:
importance: Undecided → High
milestone: none → 6.1
status: New → Confirmed
tags: added: nova
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

This is really weird: from haproxy logs it's clear that nova-api doesn't respond in time, so haproxy closes a connection and returns 504 to a client. At the same time, there are no errors in nova-api logs. Moreover, there is no corresponding POST entry in nova-api log, like if the request never happened or, more precisely, never completed.

The interesting fact is that nova-api (which didn't respond in time) process has a lot of open FDs:

  root@node-200:~# lsof -p 11529 |grep "can't identify protocol" | wc -l
  625

and, curiously, memcache connections:

  root@node-200:~# lsof -p 11529 |grep ":11211" | wc -l
  171

My current understanding is that in some rare case eventlet green threads get stuck and never finish, which results in timeout on haproxy side.

Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

We haven't managed to reproduce it on virtualized environment since then. If you guys (scale team) will reproduce it, please ping us, so that we can debug this on the lab.

Changed in mos:
status: Confirmed → Incomplete
assignee: MOS Nova (mos-nova) → Dina Belova (dbelova)
Revision history for this message
Roman Podoliaka (rpodolyaka) wrote :

This looks like a sporadic failure, let's try to reproduce and fix it. And if we can't reproduce it again, it should probably not be a big deal.

Changed in mos:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.