test_stamp_pattern timing out waiting for attached device to show up in guest

Bug #1664793 reported by Matt Riedemann
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Undecided
Unassigned
tempest
Fix Released
Undecided
Attila Fazekas

Bug Description

The Tempest scenario test TestStampPattern was unskipped on 2/13:

https://review.openstack.org/#/c/431800/

Since then the ceph jobs have been failing, e.g.:

http://logs.openstack.org/25/433825/1/check/gate-tempest-dsvm-full-devstack-plugin-ceph-ubuntu-trusty/4a58a2e/console.html#_2017-02-14_23_40_42_737153

2017-02-14 23:40:42.736898 | tempest.scenario.test_stamp_pattern.TestStampPattern.test_stamp_pattern[compute,id-10fd234a-515c-41e5-b092-8323060598c5,image,network,volume]
2017-02-14 23:40:42.736948 | ---------------------------------------------------------------------------------------------------------------------------------------------
2017-02-14 23:40:42.736959 |
2017-02-14 23:40:42.736975 | Captured traceback:
2017-02-14 23:40:42.736992 | ~~~~~~~~~~~~~~~~~~~
2017-02-14 23:40:42.737013 | Traceback (most recent call last):
2017-02-14 23:40:42.737037 | File "tempest/test.py", line 103, in wrapper
2017-02-14 23:40:42.737061 | return f(self, *func_args, **func_kwargs)
2017-02-14 23:40:42.737094 | File "tempest/scenario/test_stamp_pattern.py", line 112, in test_stamp_pattern
2017-02-14 23:40:42.737115 | keypair['private_key'])
2017-02-14 23:40:42.737153 | File "tempest/scenario/test_stamp_pattern.py", line 89, in _wait_for_volume_available_on_the_system
2017-02-14 23:40:42.737176 | raise lib_exc.TimeoutException
2017-02-14 23:40:42.737203 | tempest.lib.exceptions.TimeoutException: Request timed out
2017-02-14 23:40:42.737218 | Details: None

They fail while waiting for the attached volume to show up on the guest, and it never does.

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22_wait_for_volume_available_on_the_system%5C%22%20AND%20tags%3A%5C%22console%5C%22%20AND%20build_name%3A*ceph*&from=7d

There are only 11 hits in the ceph jobs in the last 24 hours but that's still pretty high. The ceph xenial job which runs on newton, ocata and master is non-voting so people probably aren't noticing, but the ceph trusty job runs on stable/mitaka and is voting and is failing.

Revision history for this message
Matt Riedemann (mriedem) wrote :

We can probably pull some debug code out of this older patch I had:

https://review.openstack.org/#/c/218355/23/tempest/scenario/test_stamp_pattern.py

Also, in that change I'm trying to be smarter about the device name used for the volume attachment rather than rely on the tempest.conf since the libvirt driver in nova completely ignores the device name passed in when attaching the volume.

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

FYI: This issue also hits Quobyte CIs (e.g. http://78.46.57.153:8081/refs-changes-07-433707-1/console.log.out)

Silvan Kaiser (2-silvan)
tags: added: quobyte
Revision history for this message
Matt Riedemann (mriedem) wrote :

This isn't just in ceph jobs.

summary: - test_stamp_pattern failing in ceph jobs
+ test_stamp_pattern timing out waiting for attached device to show up in
+ guest
Revision history for this message
Matt Riedemann (mriedem) wrote :

I also wonder if we could be hitting something related to bug 1633236 where the detach fails.

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
Silvan Kaiser (2-silvan) wrote :

Note: As Matthew pointed out in the duplicate bug #1665053, this seems to be a long going issue.

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

Further note: found this issue in more third party CIs (Tintri): http://openstack-ci.tintri.com/tintri/refs-changes-60-434660-2/

So definitely a more general issue.

Revision history for this message
Michal Ptacek (michalx-ptacek) wrote :
Revision history for this message
Mikhail S Medvedev (msmedved) wrote :

IBM PowerKVM CI was affected, with the same signature. We found that it was due to wrong attach device name. After correcting the name, the test no longer times out. In our case we had to set it explicitly to sdb. In tempest.conf:

  [compute]
  volume_device_name = sdb

Revision history for this message
Silvan Kaiser (2-silvan) wrote :

FYI: Setting volume_device_name explicitly yielded no improvement in my tests with Quobyte.

Revision history for this message
Michal Ptacek (michalx-ptacek) wrote :

FYI - hint with explicit setting of volume_device_name didn't work for us on intel-nfv-ci. According to my observation volume is in state "attaching", then it went into "in use" and then it disappears. I didn't find any error/warning in cinder logs, which might be related to that.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/437379

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master)

Change abandoned by Masayuki Igawa (<email address hidden>) on branch: master
Review: https://review.openstack.org/437379
Reason: oops, thanks

Revision history for this message
chandan kumar (chkumar246) wrote :

Related fix https://review.openstack.org/#/c/437216/ to skip this bug until it is fixed.

Changed in tempest:
status: New → Confirmed
tags: added: upstream-gate
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/218355

Changed in tempest:
assignee: nobody → Attila Fazekas (afazekas)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.opendev.org/615434
Committed: https://git.openstack.org/cgit/openstack/tempest/commit/?id=ba18426fd990fad19f429e0aa1673f549f2c77e8
Submitter: Zuul
Branch: master

commit ba18426fd990fad19f429e0aa1673f549f2c77e8
Author: Attila Fazekas <email address hidden>
Date: Sun Nov 4 13:54:30 2018 +0100

    Unskip test_stamp_pattern

    test_stamp_pattern had issues before because the test attached volumes
    in VM state when it does not detects hotplug events.

    This change have the test to ssh the machine first,
    alternativly a pci rescan could be forced.

    Notes:
    https://docs.google.com/presentation/d/1Im-iYVzroKwXKP23p12Q5vsUGdk2V26SPpLWF3I5dbA/edit#slide=id.p

    Closes-bug: #1664793
    Change-Id: Iaff1e01dd7ffab238ec73668ae4eee0683f70ffd

Changed in tempest:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tempest 22.0.0

This issue was fixed in the openstack/tempest 22.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.