tempest.thirdparty.boto tests fail with TearDownException -- Invalid bug report

Bug #1182679 reported by melanie witt
70
This bug affects 14 people
Affects Status Importance Assigned to Milestone
neutron
Invalid
Undecided
Unassigned
tempest
Invalid
High
Attila Fazekas

Bug Description

Excerpt of log:

======================================================================
ERROR: test suite for <class 'tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest'>
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/suite.py", line 227, in run
    self.tearDown()
  File "/usr/lib/python2.7/dist-packages/nose/suite.py", line 350, in tearDown
    self.teardownContext(ancestor)
  File "/usr/lib/python2.7/dist-packages/nose/suite.py", line 366, in teardownContext
    try_run(context, names)
  File "/usr/lib/python2.7/dist-packages/nose/util.py", line 478, in try_run
    return func()
  File "/opt/stack/new/tempest/tempest/thirdparty/boto/test.py", line 263, in tearDownClass
    raise exceptions.TearDownException(num=fail_count)
TearDownException: %(num)d cleanUp operation failed
Details: 2 cleanUp operation failed

Full log:

http://logs.openstack.org/29469/3/check/gate-tempest-devstack-vm-quantum/24106/console.html

Tags: ec2
melanie witt (melwitt)
description: updated
Sean Dague (sdague)
Changed in tempest:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Attila Fazekas (afazekas)
Revision history for this message
Attila Fazekas (afazekas) wrote :

The volume stacked in creating status.
Adding nova and cinder as effected component.

tags: added: ec2
Revision history for this message
Attila Fazekas (afazekas) wrote :

2013-05-21 21:32:49.739 | ======================================================================
2013-05-21 21:32:49.739 | FAIL: tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_integration_1[gate,smoke]
2013-05-21 21:32:49.739 | ----------------------------------------------------------------------
2013-05-21 21:32:49.739 | _StringException: Traceback (most recent call last):
2013-05-21 21:32:49.739 | File "/opt/stack/new/tempest/tempest/thirdparty/boto/test_ec2_instance_run.py", line 180, in test_integration_1
2013-05-21 21:32:49.739 | self.assertInstanceStateWait(instance, "running")
2013-05-21 21:32:49.739 | File "/opt/stack/new/tempest/tempest/thirdparty/boto/test.py", line 349, in assertInstanceStateWait
2013-05-21 21:32:49.739 | state = self.waitInstanceState(lfunction, wait_for)
2013-05-21 21:32:49.739 | File "/opt/stack/new/tempest/tempest/thirdparty/boto/test.py", line 334, in waitInstanceState
2013-05-21 21:32:49.739 | self.valid_instance_state)
2013-05-21 21:32:49.739 | File "/opt/stack/new/tempest/tempest/thirdparty/boto/test.py", line 324, in state_wait_gone
2013-05-21 21:32:49.739 | state = state_wait(lfunction, final_set, valid_set)
2013-05-21 21:32:49.740 | File "/opt/stack/new/tempest/tempest/thirdparty/boto/utils/wait.py", line 58, in state_wait
2013-05-21 21:32:49.740 | final_set, status))
2013-05-21 21:32:49.740 | AssertionError: State change timeout exceeded!(400s) While waitingfor set(['running', '_GONE']) at "pending"

Revision history for this message
Attila Fazekas (afazekas) wrote :

n-api

2013-05-21 21:32:29.882 ERROR nova.api.ec2 [req-1bb3e9a2-285a-4631-84ca-34b2b4ffa746 demo demo] Unexpected error raised: Invalid input received: Invalid volume: Volume status must be available or error
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 Traceback (most recent call last):
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 File "/opt/stack/new/nova/nova/api/ec2/__init__.py", line 485, in __call__
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 result = api_request.invoke(context)
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 File "/opt/stack/new/nova/nova/api/ec2/apirequest.py", line 78, in invoke
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 result = method(context, **args)
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 File "/opt/stack/new/nova/nova/api/ec2/cloud.py", line 861, in delete_volume
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 self.volume_api.delete(context, volume_id)
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 File "/opt/stack/new/nova/nova/volume/cinder.py", line 181, in wrapper
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 res = method(self, ctx, volume_id, *args, **kwargs)
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 File "/opt/stack/new/nova/nova/volume/cinder.py", line 313, in delete
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 cinderclient(context).volumes.delete(volume_id)
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 File "/opt/stack/new/python-cinderclient/cinderclient/v1/volumes.py", line 189, in delete
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 self._delete("/volumes/%s" % base.getid(volume))
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 File "/opt/stack/new/python-cinderclient/cinderclient/base.py", line 158, in _delete
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 resp, body = self.api.client.delete(url)
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 File "/opt/stack/new/python-cinderclient/cinderclient/client.py", line 205, in delete
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 return self._cs_request(url, 'DELETE', **kwargs)
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 File "/opt/stack/new/python-cinderclient/cinderclient/client.py", line 164, in _cs_request
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 **kwargs)
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 File "/opt/stack/new/python-cinderclient/cinderclient/client.py", line 147, in request
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 raise exceptions.from_response(resp, body)
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2 InvalidInput: Invalid input received: Invalid volume: Volume status must be available or error
2013-05-21 21:32:29.882 19783 TRACE nova.api.ec2

Revision history for this message
Attila Fazekas (afazekas) wrote :

http://logs.openstack.org/29469/3/check/gate-tempest-devstack-vm-quantum/24106/logs/screen-c-vol.txt.gz

After
 2013-05-21 21:12:15 DEBUG [cinder.volume.drivers.lvm] Updating volume status

No another update volume status message, but two volume creation command issued.

c-vol does not logs anything after 2013-05-21 21:25:41.
Periodic task expected to happens frequently (in every every 10 sec).

2013-05-21 21:06:28 DEBUG [cinder.manager] Running periodic task VolumeManager._report_driver_status
2013-05-21 21:06:28 INFO [cinder.volume.manager] Updating volume status
2013-05-21 21:06:28 DEBUG [cinder.volume.drivers.lvm] Updating volume status
2013-05-21 21:06:28 DEBUG [cinder.utils] Running cmd (subprocess): sudo cinder-rootwrap /etc/cinder/rootwrap.conf vgs --noheadings --nosuffix --unit=G -o name,size,free stack-volumes
2013-05-21 21:06:38 DEBUG [cinder.manager] Running periodic task VolumeManager._publish_service_capabilities
2013-05-21 21:06:38 DEBUG [cinder.manager] Notifying Schedulers of capabilities ...
2013-05-21 21:06:38 DEBUG [cinder.openstack.common.rpc.amqp] Making asynchronous fanout cast...
2013-05-21 21:06:38 DEBUG [cinder.openstack.common.rpc.amqp] UNIQUE_ID is aab394557d6e4886a20e53fe26df7182.

Before the silence the above type of log messages appears in every 10 second.

The cinder volume process expected to be alive until 2013-05-21 21:33.

~ 8 minute silence, instead of a logging in every 10 second.

Revision history for this message
Sean Dague (sdague) wrote :

Attilla, this is currently the #1 cause of gate resets, 55 rechecks on this in the last week. Can you make it a priority to get to the bottom of this?

Changed in tempest:
importance: Medium → Critical
Revision history for this message
Attila Fazekas (afazekas) wrote :

YES

Revision history for this message
Chmouel Boudjnah (chmouel) wrote :

How does that affect swift?

Revision history for this message
Attila Fazekas (afazekas) wrote :

Where are the another 55 recheck related log ?

Revision history for this message
Jeremy Stanley (fungi) wrote :

http://status.openstack.org/rechecks is a good place to start. You can also query http://logstash.openstack.org/

Revision history for this message
Attila Fazekas (afazekas) wrote :

They are multiple different situations.

Not all of them related to volume.

The TearDown exception not the root cause,
It exists in order to try to delete everything, instead leaking on the first failure and report all of the failures.

It refers to multiple another exception into the log.

The second result when you are searching usually the first exception.
'Traceback (most recent call last):'

I'll try to find the real offender type.

Revision history for this message
Attila Fazekas (afazekas) wrote :
Revision history for this message
Attila Fazekas (afazekas) wrote :

Many recheck issues are not related to this issue at all, and even not ec2, they are not correctly categorized here.

Mostly The Instance vs. SecurityGroup issues are visible and the cinder behaves strangely.

The common thing are:
- Quantum SMOKE job
- They are related to the security groups

tempest.scenario.test_minimum_basic.TestMinimumBasicScenario.test_minimum_basic_scenario is the closest nova test in functionality.
It has servers volumes and security groups, but looks like, it does not adds the security group to the server.
It changes the default security group.
NOT SMOKE

tempest/scenario/test_network_basic_ops.py
Adds a security group, but it does not have a volume.
NOT SMOKE

tempest.api.compute.security_groups.test_security_groups.SecurityGroupsTestJSON.test_server_security_groups
Adds a security group to the server without rules.
NOT SMOKE

The volume tests (tempest.thirdparty.boto.test_ec2_volumes.EC2VolumesTest.test_create_get_delete) fails after the
tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.test_integration_1.

I did not found any other test case with the 'smoke' attribute, which touches the security groups.
Just only the ec2 test does it as SMOKE.

On tempest side we could wait between several steps, but where are they missing ?
1. create sec group (no wait)
2. add rules (no wait)
3. starting a server with security group (Hanging with timeout , before the ACTIVE state)

The meaning of this line:
AssertionError: State change timeout exceeded!(400s) While waitingfor set(['running', '_GONE']) at "pending"

The _GONE special state, means the listing does not shows the status.
The 'running' is the nova 'ACTIVE' status.
The 'pending' is a general transitional state in this case it is the nova ' BUILD' status.

Revision history for this message
Attila Fazekas (afazekas) wrote :

Most likely it is Critical in quantum.
Most likely it is Invalid in tempest and cinder.
It can be nova bug as well.

Changed in tempest:
status: Triaged → Incomplete
Revision history for this message
Giulio Fidente (gfidente) wrote :

I submitted a simple stress tests for cinder but it don't seem to be showing anything bad on that side

instead, the only thing I'm noticing is that this _only_ seems to fail on the quantum-vm

no longer affects: cinder
no longer affects: swift
Revision history for this message
Attila Fazekas (afazekas) wrote :

tempest.thirdparty.boto.test_ec2_volumes.EC2VolumesTest.test_create_get_delete does not FAIL alone even after 6097 representative run.

After continuous run
tempest.thirdparty.boto.test_ec2_instance_run.InstanceRunTest.setUpClass even in a non-quantum system can fail in the cls.ec2_client.register_image.
It is nova nova-objectstore ->glance related issue.

The glance is not responsive.
The endpoint 9292 port seams open, but no response.

Revision history for this message
Attila Fazekas (afazekas) wrote :

So far I was not able to reproduce the exact issue.

Glance issue will be handled as different issue.
The test case is skipped for the time being.

Another possible issue, this test uploads and register images to nova-objectstore->glance->swift before the server start.

Changed in tempest:
importance: Critical → High
Revision history for this message
Attila Fazekas (afazekas) wrote :

Now, https://bugs.launchpad.net/tempest/+bug/1205344 is the blocker of re-enabling this test case.

Revision history for this message
Joe Gordon (jogo) wrote :

This bug has been confused for https://bugs.launchpad.net/nova/+bug/1223161

Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :
Revision history for this message
Joe Gordon (jogo) wrote :

There are too many reasons why a test can fail TearDown, cases that are covered by more specific bugs. Marking this as invalid

no longer affects: nova
Changed in neutron:
status: New → Invalid
Changed in tempest:
status: Incomplete → Invalid
summary: - tempest.thirdparty.boto tests fail with TearDownException
+ tempest.thirdparty.boto tests fail with TearDownException -- Invalid
+ bug report
Revision history for this message
Zhikun Liu (zhikunliu) wrote : AUTO: Zhi Kun ZK Liu is on vacation

I am out of the office until 10/07/2013.

I will take vacation from 28th Sep to 7th Oct . If have any urgent, please
call 13910806810

Note: This is an automated response to your message "[Bug 1182679] Re:
tempest.thirdparty.boto tests fail with TearDownException" sent on
10/02/2013 23:37:02.

This is the only notification you will receive while this person is away.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.