FAIL: tempest test_large_ops_scenario - failed to get to expected status. In ERROR state.

Bug #1232303 reported by Nachi Ueno
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Critical
Unassigned

Bug Description

http://logs.openstack.org/13/48513/2/check/check-tempest-devstack-vm-large-ops/8ce3344/console.html

2013-09-27 21:08:58.498 | ======================================================================
2013-09-27 21:08:58.498 | FAIL: tempest.scenario.test_large_ops.TestLargeOpsScenario.test_large_ops_scenario[compute,image]
2013-09-27 21:08:58.498 | tags: worker-0
2013-09-27 21:08:58.498 | ----------------------------------------------------------------------
2013-09-27 21:08:58.498 | Empty attachments:
2013-09-27 21:08:58.498 | stderr
2013-09-27 21:08:58.499 | stdout
2013-09-27 21:08:58.499 |
2013-09-27 21:08:58.499 | pythonlogging:'': {{{
2013-09-27 21:08:58.499 | 2013-09-27 21:04:54,743 Starting new HTTP connection (1): 127.0.0.1
2013-09-27 21:08:58.499 | 2013-09-27 21:04:54,864 Starting new HTTP connection (1): 127.0.0.1
2013-09-27 21:08:58.499 | }}}
2013-09-27 21:08:58.500 |
2013-09-27 21:08:58.500 | Traceback (most recent call last):
2013-09-27 21:08:58.500 | File "tempest/scenario/test_large_ops.py", line 105, in test_large_ops_scenario
2013-09-27 21:08:58.500 | self.nova_boot()
2013-09-27 21:08:58.500 | File "tempest/scenario/test_large_ops.py", line 98, in nova_boot
2013-09-27 21:08:58.500 | self._wait_for_server_status('ACTIVE')
2013-09-27 21:08:58.501 | File "tempest/scenario/test_large_ops.py", line 42, in _wait_for_server_status
2013-09-27 21:08:58.501 | self.compute_client.servers, server.id, status)
2013-09-27 21:08:58.501 | File "tempest/scenario/manager.py", line 290, in status_timeout
2013-09-27 21:08:58.501 | self._status_timeout(things, thing_id, expected_status=expected_status)
2013-09-27 21:08:58.501 | File "tempest/scenario/manager.py", line 338, in _status_timeout
2013-09-27 21:08:58.501 | self.config.compute.build_interval):
2013-09-27 21:08:58.502 | File "tempest/test.py", line 237, in call_until_true
2013-09-27 21:08:58.502 | if func():
2013-09-27 21:08:58.502 | File "tempest/scenario/manager.py", line 329, in check_status
2013-09-27 21:08:58.502 | raise exceptions.BuildErrorException(message)
2013-09-27 21:08:58.502 | BuildErrorException: Server %(server_id)s failed to build and is in ERROR status
2013-09-27 21:08:58.503 | Details: <Server: scenario-server-1949702596-c60de1b1-d03e-4214-b08b-458cb26bb7e8> failed to get to expected status. In ERROR state.

Revision history for this message
Matt Riedemann (mriedem) wrote : Re: FAIL: tempest.scenario.test_large_ops.TestLargeOpsScenario.test_large_ops_scenario[compute,image]
Download full text (3.4 KiB)

A little different but hitting a failure in the same test here:

http://logs.openstack.org/77/59277/2/check/gate-tempest-dsvm-neutron-large-ops/4150461/console.html

This time it's a timeout:

2013-12-03 03:56:03.327 | Traceback (most recent call last):
2013-12-03 03:56:03.327 | File "tempest/scenario/test_large_ops.py", line 105, in test_large_ops_scenario
2013-12-03 03:56:03.327 | self.nova_boot()
2013-12-03 03:56:03.328 | File "tempest/scenario/test_large_ops.py", line 98, in nova_boot
2013-12-03 03:56:03.328 | self._wait_for_server_status('ACTIVE')
2013-12-03 03:56:03.328 | File "tempest/scenario/test_large_ops.py", line 42, in _wait_for_server_status
2013-12-03 03:56:03.328 | self.compute_client.servers, server.id, status)
2013-12-03 03:56:03.328 | File "tempest/scenario/manager.py", line 304, in status_timeout
2013-12-03 03:56:03.328 | not_found_exception=not_found_exception)
2013-12-03 03:56:03.329 | File "tempest/scenario/manager.py", line 361, in _status_timeout
2013-12-03 03:56:03.329 | raise exceptions.TimeoutException(message)
2013-12-03 03:56:03.329 | TimeoutException: Request timed out
2013-12-03 03:56:03.329 | Details: Timed out waiting for thing dbfa26dd-b91c-4061-96f2-7b05dbe8705d to become ACTIVE

I noticed that this merged around the same time, not sure if it had any impact:

https://review.openstack.org/#/c/56260/

I do see this in the n-api log around the time of the failure:

2013-12-03 03:56:02.844 INFO nova.api.openstack.wsgi [req-1a7878d0-5a88-48c9-b804-0acd1fc40162 demo demo] HTTP exception thrown: 404-{u'NeutronError': {u'message': u'Security group 2eebf16f-5578-4a48-92bd-f6b1aeaba6dd does not exist', u'type': u'SecurityGroupNotFound', u'detail': u''}}

And this is in the q-svc log around that same time:

2013-12-03 03:56:02.839 1301 ERROR neutron.api.v2.resource [req-228780c3-ceca-48e7-a5f8-1ca7bd357619 6edcc50ab776423d91db1138830921ca e24c6992e2f0408c920ba65ef6e567ed] show failed
2013-12-03 03:56:02.839 1301 TRACE neutron.api.v2.resource Traceback (most recent call last):
2013-12-03 03:56:02.839 1301 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/api/v2/resource.py", line 84, in resource
2013-12-03 03:56:02.839 1301 TRACE neutron.api.v2.resource result = method(request=request, **args)
2013-12-03 03:56:02.839 1301 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/api/v2/base.py", line 296, in show
2013-12-03 03:56:02.839 1301 TRACE neutron.api.v2.resource parent_id=parent_id),
2013-12-03 03:56:02.839 1301 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/api/v2/base.py", line 264, in _item
2013-12-03 03:56:02.839 1301 TRACE neutron.api.v2.resource obj = obj_getter(request.context, id, **kwargs)
2013-12-03 03:56:02.839 1301 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/db/securitygroups_db.py", line 180, in get_security_group
2013-12-03 03:56:02.839 1301 TRACE neutron.api.v2.resource context, id), fields)
2013-12-03 03:56:02.839 1301 TRACE neutron.api.v2.resource File "/opt/stack/new/neutron/neutron/db/securitygroups_db.py", line 194, in _get_security_group
2013-12...

Read more...

tags: added: network testing
Revision history for this message
Joe Gordon (jogo) wrote :

It looks like this bug is hitting both nova-network and neutron

Revision history for this message
Joe Gordon (jogo) wrote :

This looks like the nova-network bug: http://logs.openstack.org/13/48513/2/check/check-tempest-devstack-vm-large-ops/8ce3344/logs/screen-n-cpu.txt.gz?level=TRACE#_2013-09-27_21_06_35_145

2013-09-27 21:06:35.145 23052 TRACE nova.compute.manager Timeout: Timeout while waiting on RPC response - topic: "network", RPC method: "allocate_for_instance" info: "<unknown>"
2013-09-27 21:06:35.145 23052 TRACE nova.compute.manager

Revision history for this message
Joe Gordon (jogo) wrote :
summary: - FAIL:
- tempest.scenario.test_large_ops.TestLargeOpsScenario.test_large_ops_scenario[compute,image]
+ FAIL: tempest test_large_ops_scenario
Revision history for this message
Matt Riedemann (mriedem) wrote : Re: FAIL: tempest test_large_ops_scenario

Just hit it again here but the guy did a 'recheck no bug' :(

http://logs.openstack.org/25/54925/11/check/gate-tempest-dsvm-large-ops/03e2b4e/

Revision history for this message
Matt Riedemann (mriedem) wrote :

Wondering if this recent change to security groups in nova has anything to do with the recent problems:

https://review.openstack.org/#/c/47651/

Revision history for this message
Matt Riedemann (mriedem) wrote :

You'll also notice that in the above patch, it failed tempest here:

http://logs.openstack.org/51/47651/10/check/gate-tempest-devstack-vm-neutron-large-ops/701402d/console.html.gz

But rather than investigate it or recheck against a specific bug, it was 'recheck no bug'...

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

Nevermind, looks more like 11/20 is when it started showing up, I changed my logstash query to go back to 11/17 and it starts on 11/20.

Revision history for this message
Matt Riedemann (mriedem) wrote :

This neutron patch was merged on 11/20 but I thought that was to fix a different gate issue: https://review.openstack.org/#/c/57290/

This neutron patch merged on 11/20 also and relies on minimize_polling=True, which was set in the previous patch on the same day: https://review.openstack.org/57475

Looking at Tempest commits for 11/20, there wasn't much related that happened that day, it's when were were disabling test_create_backup and V3 API tests for nova to try and get a handle on bug 1251920, which is also when this nova patch was merged: https://review.openstack.org/#/c/57509/

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Please see https://bugs.launchpad.net/nova/+bug/1254890 which is a separate bug for the "Timed out waiting for thing" fingerprint from comment #2

Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

jog0,

the following query, has only one hit in the last 7 days...

"failed to get to expected status. In ERROR state." AND build_name:"gate-tempest-devstack-vm-neutron-large-ops"

summary: - FAIL: tempest test_large_ops_scenario
+ FAIL: tempest test_large_ops_scenario - failed to get to expected
+ status. In ERROR state.
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Let's leave this bug for "failed to get to expected status. In ERROR state." and use the 1254890 for "Timed out waiting for thing"

-- dims

Sean Dague (sdague)
no longer affects: tempest
Revision history for this message
Jeremy Stanley (fungi) wrote :
Revision history for this message
Matt Riedemann (mriedem) wrote :

If we're going to only keep this one on gate-tempest-dsvm-large-ops, then we should remove neutron and mark it critical for nova, because it's puking on nova-network:

http://logs.openstack.org/58/57358/7/check/gate-tempest-dsvm-large-ops/19f1639/logs/screen-n-cpu.txt.gz?level=TRACE#_2014-01-08_20_34_33_226

no longer affects: neutron
Changed in nova:
importance: Undecided → Critical
tags: added: gate-failure
Revision history for this message
Matt Riedemann (mriedem) wrote :
Joe Gordon (jogo)
Changed in nova:
status: New → Confirmed
milestone: none → icehouse-2
Revision history for this message
Matt Riedemann (mriedem) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.