OpenStack Compute (nova)

tempest.api.compute.servers.test_server_rescue.ServerRescueTestJSON.test_rescued_vm_attach_volume is nondeterministic

Bug #1240728 reported by James E. Blair on 2013-10-16

This bug affects 9 people

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Invalid	High	Unassigned
	OpenStack Compute (nova)	Invalid	High	Unassigned

Bug Description

Traceback (most recent call last):
  File "tempest/api/compute/servers/test_server_rescue.py", line 111, in _unrescue
    self.servers_client.wait_for_server_status(server_id, 'ACTIVE')
  File "tempest/services/compute/json/servers_client.py", line 156, in wait_for_server_status
    return waiters.wait_for_server_status(self, server_id, status)
  File "tempest/common/waiters.py", line 80, in wait_for_server_status
    raise exceptions.TimeoutException(message)
TimeoutException: Request timed out
Details: Server 802897a6-6793-4af2-9d84-8750be518380 failed to reach ACTIVE status within the required time (400 s). Current status: SHUTOFF.

Sample failure: http://logs.openstack.org/51/52151/1/gate/gate-tempest-devstack-vm-full/6b393f5/

Basic query for the failure string:

http://logstash.openstack.org/#eyJzZWFyY2giOiJAbWVzc2FnZTpcIkZBSUw6IHRlbXBlc3QuYXBpLmNvbXB1dGUuc2VydmVycy50ZXN0X3NlcnZlcl9yZXNjdWUuU2VydmVyUmVzY3VlVGVzdEpTT04udGVzdF9yZXNjdWVkX3ZtX2F0dGFjaF92b2x1bWVcIiIsImZpZWxkcyI6W10sIm9mZnNldCI6MCwidGltZWZyYW1lIjoiYWxsIiwiZ3JhcGhtb2RlIjoiY291bnQiLCJ0aW1lIjp7InVzZXJfaW50ZXJ2YWwiOjB9LCJzdGFtcCI6MTM4MTk2MTIyMjkwMSwibW9kZSI6IiIsImFuYWx5emVfZmllbGQiOiIifQ==

See original description

Tags:

James E. Blair (corvus) on 2013-10-16

description:

updated

Revision history for this message

Giulio Fidente (gfidente) wrote on 2013-10-16:

this seems to affect other tests operating on rescued vm too, like test_rescued_vm_associate_dissociate_floating_ip

see http://logs.openstack.org/52/50052/4/check/check-tempest-devstack-vm-full/605e680/console.html

Revision history for this message

John Griffith (john-griffith) wrote on 2013-10-16:

Not seeing anything to indicate a failure from Cinder. The only thing I'm seeing is the instance available time-out here:
http://logs.openstack.org/51/52151/1/gate/gate-tempest-devstack-vm-full/6b393f5/logs/screen-n-cpu.txt.gz#_2013-10-16_19_53_06_176

The "attach failures" in the cinder API log are from the test_attach_volumes_with_nonexistent_volume_id

Changed in cinder:
status:	New → Invalid

Revision history for this message

Matt Riedemann (mriedem) wrote on 2013-11-23:

This was merged in tempest on 10/14:

https://github.com/openstack/tempest/commit/da30650ff6857c0807a56d8f00e6156909862966

The test in question is a negative test involving multiple servers in the setup of the test class. If this started showing up around that time, it might be a suspect. With the suspicion for bug 1251920 being that the root cause comes from parallel tests in Tempest not doing a good job of cleaning up after themselves (waiting for resources to be cleaned up before starting the next tests), and the Tempest commit above is related to re-using servers for negative tests, it makes me wonder if they are related.

Looking in the n-cpu log for this fail, there are 436 instances of 802897a6-6793-4af2-9d84-8750be518380 showing up in the log, so the server is being re-used for the tests. Toward the end when it fails, I'm seeing this:

2013-10-16 19:50:08.327 DEBUG nova.virt.libvirt.driver [req-0653871c-9d92-43af-a5e8-65f5001a6838 ServerRescueTestJSON433538631-user ServerRescueTestJSON433538631-tenant] [instance: 802897a6-6793-4af2-9d84-8750be518380] Could not determine fibre channel world wide port names get_volume_connector /opt/stack/new/nova/nova/virt/libvirt/driver.py:1006

"Could not determine fibre channel world wide port names" shows up 12 times in the n-cpu log, I'm not sure if that's normal or not.

Matt Riedemann (mriedem) on 2013-11-23

tags:

added: libvirt testing volumes

Sean Dague (sdague) on 2013-12-12

Changed in nova:
status:	New → Confirmed
importance:	Undecided → High
Changed in cinder:
importance:	Undecided → High
status:	Invalid → Confirmed

Revision history for this message

Sean Dague (sdague) wrote on 2013-12-13:

It's really unclear to me if there is a tempest change needed here. This seems to be another one of the state race bugs in OpenStack services where the resource doesn't get to it's intended state. Hence tempest is doing exactly it's mission, exposing these.

Changed in tempest:
status:	New → Incomplete

Sean Dague (sdague) on 2014-02-21

no longer affects:

tempest

Revision history for this message

Joe Gordon (jogo) wrote on 2014-07-18:

no hits on this bug, this bug hasn't been updated in some time. If it still exists we are probably tracking it via another bug.

Changed in nova:
status:	Confirmed → Incomplete

Sean Dague (sdague) on 2014-09-04

Changed in nova:
status:	Incomplete → Invalid

Revision history for this message

Boris Pavlovic (boris-42) wrote on 2014-11-29:

Here is the way to reproduce it in gates:

https://review.openstack.org/#/c/137885/

*Read the commit message for clarification*

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-01-10: Change abandoned on cinder (master)

Change abandoned by Boris Pavlovic (<email address hidden>) on branch: master
Review: https://review.openstack.org/137885
Reason: No more need in this patch

Revision history for this message

Sean McGinnis (sean-mcginnis) wrote on 2015-07-12:

Doesn't appear to be valid for cinder anymore.

Changed in cinder:
status:	Confirmed → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.