test_rescue_unrescue_instance fails intermittently

Bug #1191417 reported by Matt Riedemann
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tempest
In Progress
Undecided
Unassigned

Bug Description

Tempest compute test is failing intermittently in Jenkins jobs:

2013-06-15 21:14:11.748 | ======================================================================
2013-06-15 21:14:11.749 | ERROR: tempest.api.compute.servers.test_server_rescue.ServerRescueTestJSON.test_rescue_unrescue_instance[gate,smoke]
2013-06-15 21:14:11.749 | ----------------------------------------------------------------------
2013-06-15 21:14:11.750 | _StringException: Traceback (most recent call last):
2013-06-15 21:14:11.750 | File "/opt/stack/new/tempest/tempest/api/compute/servers/test_server_rescue.py", line 118, in test_rescue_unrescue_instance
2013-06-15 21:14:11.750 | self.servers_client.wait_for_server_status(self.server_id, 'RESCUE')
2013-06-15 21:14:11.750 | File "/opt/stack/new/tempest/tempest/services/compute/json/servers_client.py", line 174, in wait_for_server_status
2013-06-15 21:14:11.750 | raise exceptions.TimeoutException(message)
2013-06-15 21:14:11.750 | TimeoutException: Request timed out
2013-06-15 21:14:11.750 | Details: Server aaa6a580-2818-42d8-887e-59e1ef364664 failed to reach RESCUE status within the required time (400 s). Current status: ACTIVE.

http://logs.openstack.org/33171/1/check/gate-tempest-devstack-vm-quantum/31018/console.html

Matt Riedemann (mriedem)
affects: nova → tempest
Revision history for this message
Attila Fazekas (afazekas) wrote :

We have several fault type what only happens on the quantum gate.
For example cinder volume service stops working. https://bugs.launchpad.net/tempest/+bug/1182679

Please think about what can be blocked by network (iptables, ebtables, virt filter, temporary incorrect interface config ) which can cause this kind of issue.

The nova status /task_state management duality is not handled correctly at tempest side, but it seams like a different issue.
https://bugs.launchpad.net/tempest/+bug/1170118

Revision history for this message
Matt Riedemann (mriedem) wrote :

I did a recheck on this bug and the tests passed:

https://review.openstack.org/#/c/33171/

The only changes in this patch are to the powervm virt driver in nova which tempest isn't using for these gate jobs, so I doubt it's caused by my change.

Revision history for this message
Attila Fazekas (afazekas) wrote :

I do not think it either.

The quantum jobs now generally unstable, for example
https://bugs.launchpad.net/nova/+bug/1185834
This can be nova side issue as well.

Revision history for this message
Sean Dague (sdague) wrote :

An important difference between the quantum jobs and the nova-network jobs is that it's smoke vs. full, so the tests have different timings. That's tripped up cinder quota issues in the past.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/33500

Changed in tempest:
assignee: nobody → Attila Fazekas (afazekas)
status: New → In Progress
Revision history for this message
Hans Lindgren (hanlind) wrote :

This is in fact a duplicate of bug 1185834 (there are KeyErrors for floatingips reported in the n-cpu log.)

This is one of several ways this bug is shown in console.log.

- For spawning of new instances, the failure causes an abort and the instance set to ERROR state (see original bug 1185834).
- For rescue operations, the end result is a timeout while the instance stays in RESCUE state (see bug 1186261).
- For delete operations, something similar with a timeout, see bug 1187916.

Changed in tempest:
assignee: Attila Fazekas (afazekas) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.