mkfs error in test_stamp_pattern

Bug #1205344 reported by Brian Elliott
48
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Medium
John Griffith
tempest
Fix Released
Medium
Paras Babbar

Bug Description

http://logs.openstack.org/09/38709/5/check/gate-tempest-devstack-vm-postgres-full/35583/console.html

Error during gate-tempest-devstack-vm-postgres-full:

======================================================================
2013-07-26 14:03:32.073 | ERROR: tempest.scenario.test_stamp_pattern.TestStampPattern.test_stamp_pattern
2013-07-26 14:03:32.073 | ----------------------------------------------------------------------
2013-07-26 14:03:32.110 | _StringException: Traceback (most recent call last):
2013-07-26 14:03:32.110 | File "/opt/stack/new/tempest/tempest/scenario/test_stamp_pattern.py", line 218, in test_stamp_pattern
2013-07-26 14:03:32.110 | self._create_timestamp(ip_for_server)
2013-07-26 14:03:32.110 | File "/opt/stack/new/tempest/tempest/scenario/test_stamp_pattern.py", line 188, in _create_timestamp
2013-07-26 14:03:32.110 | ssh_client.exec_command('sudo /usr/sbin/mkfs.ext4 /dev/vdb')
2013-07-26 14:03:32.110 | File "/opt/stack/new/tempest/tempest/common/ssh.py", line 138, in exec_command
2013-07-26 14:03:32.110 | strerror=''.join(err_data))
2013-07-26 14:03:32.110 | SSHExecCommandFailed: Command 'sudo /usr/sbin/mkfs.ext4 /dev/vdb', exit status: 1, Error:
2013-07-26 14:03:32.110 | mke2fs 1.42.2 (27-Mar-2012)
2013-07-26 14:03:32.110 | Could not stat /dev/vdb --- No such file or directory
2013-07-26 14:03:32.110 |
2013-07-26 14:03:32.110 | The device apparently does not exist; did you specify it correctly?

Revision history for this message
Sean Dague (sdague) wrote :
Changed in tempest:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/38841

Changed in tempest:
assignee: nobody → Attila Fazekas (afazekas)
status: Confirmed → In Progress
Revision history for this message
Attila Fazekas (afazekas) wrote :

Looks like not a tempest issue.

The "SCSI Initiator over TCP/IP" appears in the syslog.txt.gz , but the volume not visible in the guest even after a longer wait.

Changed in tempest:
assignee: Attila Fazekas (afazekas) → nobody
status: In Progress → Incomplete
Revision history for this message
Attila Fazekas (afazekas) wrote :

http://logs.openstack.org/41/38841/2/gate/gate-tempest-devstack-vm-full/36336/

- The volume reported as 'in-use' (2013-07-27 03:38:42)
- The host system has a syslog entry about the iSCSI initiator attachment as sda (Jul 27 03:38:40)
- The volume not visible on the guest even after 400 sec wait (2013-07-27 03:38:54 .. 2013-07-27 03:45:33)

The issue cannot be fixed and/or workaround on tempest side.
My attempt was based on the guest system may need several ms to realize it has a new device.

This is very likely a nova issue or an issue in a component related to nova.

Revision history for this message
Sean Dague (sdague) wrote :

So this bug was exposed by a new test in tempest, that tries to verify that the volume can be attached to a guest. Turns out, in some low single digit set of cases, it can't. I think this is a real race in nova / cinder that needs fixing.

Changed in nova:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/38923

Changed in tempest:
assignee: nobody → Sean Dague (sdague)
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tempest (master)

Reviewed: https://review.openstack.org/38923
Committed: http://github.com/openstack/tempest/commit/fe8a609481f87ad8ef4c29047c2c17e6b4cd3c2c
Submitter: Jenkins
Branch: master

commit fe8a609481f87ad8ef4c29047c2c17e6b4cd3c2c
Author: Sean Dague <email address hidden>
Date: Sat Jul 27 08:15:55 2013 -0400

    skip stamp test until race in nova can be debugged

    This test is causing a lot of gate resets, and as far as I can
    tell they are really nova races in volume attachments. However
    we should skip the test for now until these can be sorted out.

    Related to bug #1205344

    Change-Id: I983779841ac3d3f79442e96b37a0cc8da85ae55b

Changed in tempest:
status: In Progress → Fix Released
Revision history for this message
John Griffith (john-griffith) wrote :

Looking at this I'm wondering if the attach is possibly being made on a different dev? Regardless, this brings up an issue of not having any good logging info on the cinder-attach side. I'll improve the logging in Cinder's attach calls to help with this sort of thing and I'll look at this a bit more. We might be better off after improving the logging if we turn the test back on.

I'll update after I have some time to fix the logging and look at this a bit closer.

Revision history for this message
Sean Dague (sdague) wrote :

Great, thanks John.

jiji (zzww-666)
information type: Public → Public Security
Revision history for this message
Jeremy Stanley (fungi) wrote :

jiji, what's exploitable security vulnerability are you seeing here?

Revision history for this message
Thierry Carrez (ttx) wrote :

Setting back to public until some explanation is given.

information type: Public Security → Public
Joe Gordon (jogo)
Changed in nova:
milestone: none → havana-rc1
Joe Gordon (jogo)
Changed in nova:
importance: High → Critical
Changed in nova:
importance: Critical → Medium
Revision history for this message
David Kranz (david-kranz) wrote :

There seems to be a disagreement about what to do about this bug. Given that there are currently more than 100 nova bugs marked High or Critical, I think setting it to Medium means it will not get fixed any time soon. Russell, is that correct?

There is a proposed patch to stop skipping this test in tempset https://review.openstack.org/#/c/47434/1. I am not sure if I approve of that or not.

Revision history for this message
Russell Bryant (russellb) wrote :

Yes, unless someone takes a specific interest to it, I don't expect anyone to jump on it. Part of it is because it's not obvious what the problem is. It looks like there's still a good amount of work to do to figure out where the problem is.

Changed in nova:
milestone: havana-rc1 → none
tags: added: havana-rc-potential
tags: added: havana-rc-proposed
removed: havana-rc-potential
tags: added: havana-rc-potential
removed: havana-rc-proposed
Sean Dague (sdague)
Changed in tempest:
milestone: none → havana-3
Revision history for this message
Adam Spiers (adam.spiers) wrote :

For the convenience of future reference, after the skipping of this test was introduced as per #7 above, it was removed (i.e. "unskipped") on 2013/09/19 via:

    https://review.openstack.org/47434

and then re-introduced on 2013/10/01 by:

    https://review.openstack.org/49181

So at the time of writing this, the number of rechecks should not be increasing (IIUC).

Revision history for this message
Zhikun Liu (zhikunliu) wrote : AUTO: Zhi Kun ZK Liu is on vacation

I am out of the office until 10/07/2013.

I will take vacation from 28th Sep to 7th Oct . If have any urgent, please
call 13910806810

Note: This is an automated response to your message "[Bug 1205344] Re:
mkfs error in test_stamp_pattern" sent on 10/03/2013 23:44:07.

This is the only notification you will receive while this person is away.

Revision history for this message
John Griffith (john-griffith) wrote :

Root cause in Cinder should be same as #1226337, and addressed by: Change-Id: I3cabab0d214c051267638a627664df2b673236e3

Changed in cinder:
status: New → Fix Committed
Thierry Carrez (ttx)
Changed in cinder:
status: Fix Committed → Fix Released
tags: removed: havana-rc-potential
Revision history for this message
Attila Fazekas (afazekas) wrote :

The bug is not fully fixed, the test cannot be unskipped: https://review.openstack.org/#/c/62886/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/152224

Revision history for this message
Matt Riedemann (mriedem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master)

Change abandoned by Sean M. Collins (<email address hidden>) on branch: master
Review: https://review.openstack.org/196198

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.openstack.org/218355

Matt Riedemann (mriedem)
Changed in tempest:
status: Fix Released → In Progress
assignee: Sean Dague (sdague) → Matt Riedemann (mriedem)
importance: Critical → Medium
milestone: havana-3 → none
Changed in cinder:
assignee: nobody → John Griffith (john-griffith)
importance: Undecided → Medium
no longer affects: nova
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to cinder (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/218664

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to cinder (master)

Reviewed: https://review.openstack.org/218664
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=a661f96b72d88c80f75470551975c6ca761c963e
Submitter: Jenkins
Branch: master

commit a661f96b72d88c80f75470551975c6ca761c963e
Author: Matt Riedemann <email address hidden>
Date: Sun Aug 30 08:27:45 2015 -0700

    Add debug logging before attaching volume in driver

    It'd be helpful to log the volume/instance/mount/host
    information in attach_volume before calling the volume
    driver to do the attachment to know we have the right
    values.

    Change-Id: I8e1111e900032d19d5c46bad266b79b050b40851
    Related-Bug: #1205344

Revision history for this message
Benny Kopilov (bkopilov) wrote :
Download full text (5.1 KiB)

Hi ,
I managed to reproduce it with tempest run , attaching logs and timestamp .

Hi ,
I managed to reproduce this issue locally with tempest .

Here are the steps :
#1 Create a volume 1G size
#2 Create an instance from image
#3 attach the volume to the instance immediatly after server is up .
Nova triggers attach request to cinder.

From nova.api:
2015-11-18 06:58:04.631 get attach request

2015-11-18 06:58:04.631 3734 INFO nova.api.openstack.compute.contrib.volumes [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] Attach volume 2b7ae21e-4528-46eb-af21-a821b90aa6c1 to instance cbd09d10-d629-4d56-b15b-12b33a340ada at /dev/vdb
2015-11-18 06:58:05.133 3734 INFO nova.osapi_compute.wsgi.server [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] 192.168.100.1 "POST /v2/8f0b9aaa1c084e8e94a27a3de81a5f73/servers/cbd09d10-d629-4d56-b15b-12b33a340ada/os-volume_attachments HTTP/1.1" status: 200 len: 386 time: 0.5057290

From nova.compute logs :
2015-11-18 06:58:04.808 - attach request
2015-11-18 06:58:08.259 2334 - volume not found

2015-11-18 06:58:02.459 2334 INFO nova.compute.manager [req-e15172cf-5414-46cd-a187-984fc84b3fdc - - - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] During sync_power_state the instance has a pending task (spawning). Skip.
2015-11-18 06:58:02.460 2334 INFO nova.compute.manager [req-e15172cf-5414-46cd-a187-984fc84b3fdc - - - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] VM Resumed (Lifecycle Event)
2015-11-18 06:58:02.581 2334 INFO nova.compute.manager [-] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] During sync_power_state the instance has a pending task (spawning). Skip.
2015-11-18 06:58:02.770 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Auditing locally available compute resources for node compute1
2015-11-18 06:58:03.080 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Total usable vcpus: 2, total allocated vcpus: 1
2015-11-18 06:58:03.081 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Final resource view: name=compute1 phys_ram=3791MB used_ram=1024MB phys_disk=6GB used_disk=1GB total_vcpus=2 used_vcpus=1 pci_stats=<nova.pci.stats.PciDeviceStats object at 0x5063e10>
2015-11-18 06:58:03.104 2334 INFO nova.scheduler.client.report [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Compute_service record updated for ('compute1', 'compute1')
2015-11-18 06:58:03.104 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Compute_service record updated for compute1:compute1
2015-11-18 06:58:04.808 2334 INFO nova.compute.manager [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] Attaching volume 2b7ae21e-4528-46eb-af21-a821b90aa6c1 to /dev/vdb
2015-11-18 06:58:08.259 2334 WARNING nova.virt.libvirt.volume [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master)

Change abandoned by Matthew Treinish (<email address hidden>) on branch: master
Review: https://review.openstack.org/152224

Revision history for this message
Imran Hayder (hayderimran7) wrote :

Hello, we have this test passing on our system using scaleIO backend of cinder.

"""
tempest.scenario.test_stamp_pattern.TestStampPattern
    test_stamp_pattern[compute,id-10fd234a-515c-41e5-b092-8323060598c5,image,network,volume]OK 579.78

Slowest 1 tests took 579.78 secs:
tempest.scenario.test_stamp_pattern.TestStampPattern
    test_stamp_pattern[compute,id-10fd234a-515c-41e5-b092-8323060598c5,image,network,volume] 579.78

Ran 1 test in 602.344s

OK
"""
can this test be unskipped?

I have manually tested too, login to cirros and run 'sudo /usr/sbin/mkfs.ext4 /dev/vdc' and it completes successfully.
followed by taking snapshot, attaching and verifying the data written persists -

Im not sure its good idea to skip this test as a whole, it seems to be a bug with default cinder backend used with upstream.
however when run with scaleIO backend , this test surely passes .

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tempest (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/299085

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tempest (master)

Change abandoned by Imran Hayder Malik (<email address hidden>) on branch: master
Review: https://review.openstack.org/299085

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Ken'ichi Ohmichi (<email address hidden>) on branch: master
Review: https://review.openstack.org/52740
Reason: As http://lists.openstack.org/pipermail/openstack-dev/2016-May/096204.html , we start abandoning patches have gotten old without any updating after negative feedback. Please restore if necessary to restart this again.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/218355
Reason: Apparently this was much easier than I was trying to do:

https://review.openstack.org/#/c/431800/

Changed in tempest:
assignee: Matt Riedemann (mriedem) → nobody
status: In Progress → Incomplete
Paras Babbar (pbabbar)
Changed in tempest:
assignee: nobody → Paras Babbar (pbabbar)
Revision history for this message
Paras Babbar (pbabbar) wrote :

This has been fixed in https://review.opendev.org/#/c/698598/.

Thanks

Changed in tempest:
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.