tempest.api.compute.servers.test_server_rescue.ServerRescueTestJSON.test_rescued_vm_attach_volume is nondeterministic

Bug #1240728 reported by James E. Blair
58
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Cinder
Invalid
High
Unassigned
OpenStack Compute (nova)
Invalid
High
Unassigned

Bug Description

Traceback (most recent call last):
  File "tempest/api/compute/servers/test_server_rescue.py", line 111, in _unrescue
    self.servers_client.wait_for_server_status(server_id, 'ACTIVE')
  File "tempest/services/compute/json/servers_client.py", line 156, in wait_for_server_status
    return waiters.wait_for_server_status(self, server_id, status)
  File "tempest/common/waiters.py", line 80, in wait_for_server_status
    raise exceptions.TimeoutException(message)
TimeoutException: Request timed out
Details: Server 802897a6-6793-4af2-9d84-8750be518380 failed to reach ACTIVE status within the required time (400 s). Current status: SHUTOFF.

Sample failure: http://logs.openstack.org/51/52151/1/gate/gate-tempest-devstack-vm-full/6b393f5/

Basic query for the failure string:

http://logstash.openstack.org/#eyJzZWFyY2giOiJAbWVzc2FnZTpcIkZBSUw6IHRlbXBlc3QuYXBpLmNvbXB1dGUuc2VydmVycy50ZXN0X3NlcnZlcl9yZXNjdWUuU2VydmVyUmVzY3VlVGVzdEpTT04udGVzdF9yZXNjdWVkX3ZtX2F0dGFjaF92b2x1bWVcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiYWxsIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4MTk2MTIyMjkwMSwibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

James E. Blair (corvus)
description: updated
Revision history for this message
Giulio Fidente (gfidente) wrote :

this seems to affect other tests operating on rescued vm too, like test_rescued_vm_associate_dissociate_floating_ip

see http://logs.openstack.org/52/50052/4/check/check-tempest-devstack-vm-full/605e680/console.html

Revision history for this message
John Griffith (john-griffith) wrote :

Not seeing anything to indicate a failure from Cinder. The only thing I'm seeing is the instance available time-out here:
http://logs.openstack.org/51/52151/1/gate/gate-tempest-devstack-vm-full/6b393f5/logs/screen-n-cpu.txt.gz#_2013-10-16_19_53_06_176

The "attach failures" in the cinder API log are from the test_attach_volumes_with_nonexistent_volume_id

Changed in cinder:
status: New → Invalid
Revision history for this message
Matt Riedemann (mriedem) wrote :

This was merged in tempest on 10/14:

https://github.com/openstack/tempest/commit/da30650ff6857c0807a56d8f00e6156909862966

The test in question is a negative test involving multiple servers in the setup of the test class. If this started showing up around that time, it might be a suspect. With the suspicion for bug 1251920 being that the root cause comes from parallel tests in Tempest not doing a good job of cleaning up after themselves (waiting for resources to be cleaned up before starting the next tests), and the Tempest commit above is related to re-using servers for negative tests, it makes me wonder if they are related.

Looking in the n-cpu log for this fail, there are 436 instances of 802897a6-6793-4af2-9d84-8750be518380 showing up in the log, so the server is being re-used for the tests. Toward the end when it fails, I'm seeing this:

2013-10-16 19:50:08.327 DEBUG nova.virt.libvirt.driver [req-0653871c-9d92-43af-a5e8-65f5001a6838 ServerRescueTestJSON433538631-user ServerRescueTestJSON433538631-tenant] [instance: 802897a6-6793-4af2-9d84-8750be518380] Could not determine fibre channel world wide port names get_volume_connector /opt/stack/new/nova/nova/virt/libvirt/driver.py:1006

"Could not determine fibre channel world wide port names" shows up 12 times in the n-cpu log, I'm not sure if that's normal or not.

Matt Riedemann (mriedem)
tags: added: libvirt testing volumes
Sean Dague (sdague)
Changed in nova:
status: New → Confirmed
importance: Undecided → High
Changed in cinder:
importance: Undecided → High
status: Invalid → Confirmed
Revision history for this message
Sean Dague (sdague) wrote :

It's really unclear to me if there is a tempest change needed here. This seems to be another one of the state race bugs in OpenStack services where the resource doesn't get to it's intended state. Hence tempest is doing exactly it's mission, exposing these.

Changed in tempest:
status: New → Incomplete
Sean Dague (sdague)
no longer affects: tempest
Revision history for this message
Joe Gordon (jogo) wrote :

no hits on this bug, this bug hasn't been updated in some time. If it still exists we are probably tracking it via another bug.

Changed in nova:
status: Confirmed → Incomplete
Sean Dague (sdague)
Changed in nova:
status: Incomplete → Invalid
Revision history for this message
Boris Pavlovic (boris-42) wrote :

Here is the way to reproduce it in gates:

https://review.openstack.org/#/c/137885/

*Read the commit message for clarification*

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Boris Pavlovic (<email address hidden>) on branch: master
Review: https://review.openstack.org/137885
Reason: No more need in this patch

Revision history for this message
Sean McGinnis (sean-mcginnis) wrote :

Doesn't appear to be valid for cinder anymore.

Changed in cinder:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.