mkfs error in test_stamp_pattern

Bug #1205344 reported by Brian Elliott on 2013-07-26
48
This bug affects 10 people
Affects Status Importance Assigned to Milestone
Cinder
Medium
John Griffith
tempest
Medium
Unassigned

Bug Description

http://logs.openstack.org/09/38709/5/check/gate-tempest-devstack-vm-postgres-full/35583/console.html

Error during gate-tempest-devstack-vm-postgres-full:

======================================================================
2013-07-26 14:03:32.073 | ERROR: tempest.scenario.test_stamp_pattern.TestStampPattern.test_stamp_pattern
2013-07-26 14:03:32.073 | ----------------------------------------------------------------------
2013-07-26 14:03:32.110 | _StringException: Traceback (most recent call last):
2013-07-26 14:03:32.110 | File "/opt/stack/new/tempest/tempest/scenario/test_stamp_pattern.py", line 218, in test_stamp_pattern
2013-07-26 14:03:32.110 | self._create_timestamp(ip_for_server)
2013-07-26 14:03:32.110 | File "/opt/stack/new/tempest/tempest/scenario/test_stamp_pattern.py", line 188, in _create_timestamp
2013-07-26 14:03:32.110 | ssh_client.exec_command('sudo /usr/sbin/mkfs.ext4 /dev/vdb')
2013-07-26 14:03:32.110 | File "/opt/stack/new/tempest/tempest/common/ssh.py", line 138, in exec_command
2013-07-26 14:03:32.110 | strerror=''.join(err_data))
2013-07-26 14:03:32.110 | SSHExecCommandFailed: Command 'sudo /usr/sbin/mkfs.ext4 /dev/vdb', exit status: 1, Error:
2013-07-26 14:03:32.110 | mke2fs 1.42.2 (27-Mar-2012)
2013-07-26 14:03:32.110 | Could not stat /dev/vdb --- No such file or directory
2013-07-26 14:03:32.110 |
2013-07-26 14:03:32.110 | The device apparently does not exist; did you specify it correctly?

Sean Dague (sdague) wrote :
Changed in tempest:
status: New → Confirmed
importance: Undecided → Critical

Fix proposed to branch: master
Review: https://review.openstack.org/38841

Changed in tempest:
assignee: nobody → Attila Fazekas (afazekas)
status: Confirmed → In Progress
Attila Fazekas (afazekas) wrote :

Looks like not a tempest issue.

The "SCSI Initiator over TCP/IP" appears in the syslog.txt.gz , but the volume not visible in the guest even after a longer wait.

Changed in tempest:
assignee: Attila Fazekas (afazekas) → nobody
status: In Progress → Incomplete
Attila Fazekas (afazekas) wrote :

http://logs.openstack.org/41/38841/2/gate/gate-tempest-devstack-vm-full/36336/

- The volume reported as 'in-use' (2013-07-27 03:38:42)
- The host system has a syslog entry about the iSCSI initiator attachment as sda (Jul 27 03:38:40)
- The volume not visible on the guest even after 400 sec wait (2013-07-27 03:38:54 .. 2013-07-27 03:45:33)

The issue cannot be fixed and/or workaround on tempest side.
My attempt was based on the guest system may need several ms to realize it has a new device.

This is very likely a nova issue or an issue in a component related to nova.

Sean Dague (sdague) wrote :

So this bug was exposed by a new test in tempest, that tries to verify that the volume can be attached to a guest. Turns out, in some low single digit set of cases, it can't. I think this is a real race in nova / cinder that needs fixing.

Changed in nova:
status: New → Confirmed
importance: Undecided → High

Fix proposed to branch: master
Review: https://review.openstack.org/38923

Changed in tempest:
assignee: nobody → Sean Dague (sdague)
status: Incomplete → In Progress

Reviewed: https://review.openstack.org/38923
Committed: http://github.com/openstack/tempest/commit/fe8a609481f87ad8ef4c29047c2c17e6b4cd3c2c
Submitter: Jenkins
Branch: master

commit fe8a609481f87ad8ef4c29047c2c17e6b4cd3c2c
Author: Sean Dague <email address hidden>
Date: Sat Jul 27 08:15:55 2013 -0400

    skip stamp test until race in nova can be debugged

    This test is causing a lot of gate resets, and as far as I can
    tell they are really nova races in volume attachments. However
    we should skip the test for now until these can be sorted out.

    Related to bug #1205344

    Change-Id: I983779841ac3d3f79442e96b37a0cc8da85ae55b

Changed in tempest:
status: In Progress → Fix Released
John Griffith (john-griffith) wrote :

Looking at this I'm wondering if the attach is possibly being made on a different dev? Regardless, this brings up an issue of not having any good logging info on the cinder-attach side. I'll improve the logging in Cinder's attach calls to help with this sort of thing and I'll look at this a bit more. We might be better off after improving the logging if we turn the test back on.

I'll update after I have some time to fix the logging and look at this a bit closer.

Sean Dague (sdague) wrote :

Great, thanks John.

jiji (zzww-666) on 2013-07-31
information type: Public → Public Security
Jeremy Stanley (fungi) wrote :

jiji, what's exploitable security vulnerability are you seeing here?

Thierry Carrez (ttx) wrote :

Setting back to public until some explanation is given.

information type: Public Security → Public
Joe Gordon (jogo) on 2013-09-18
Changed in nova:
milestone: none → havana-rc1
Joe Gordon (jogo) on 2013-09-18
Changed in nova:
importance: High → Critical
Changed in nova:
importance: Critical → Medium
David Kranz (david-kranz) wrote :

There seems to be a disagreement about what to do about this bug. Given that there are currently more than 100 nova bugs marked High or Critical, I think setting it to Medium means it will not get fixed any time soon. Russell, is that correct?

There is a proposed patch to stop skipping this test in tempset https://review.openstack.org/#/c/47434/1. I am not sure if I approve of that or not.

Russell Bryant (russellb) wrote :

Yes, unless someone takes a specific interest to it, I don't expect anyone to jump on it. Part of it is because it's not obvious what the problem is. It looks like there's still a good amount of work to do to figure out where the problem is.

Changed in nova:
milestone: havana-rc1 → none
tags: added: havana-rc-potential
tags: added: havana-rc-proposed
removed: havana-rc-potential
tags: added: havana-rc-potential
removed: havana-rc-proposed
Sean Dague (sdague) on 2013-09-23
Changed in tempest:
milestone: none → havana-3
Adam Spiers (adam.spiers) wrote :

For the convenience of future reference, after the skipping of this test was introduced as per #7 above, it was removed (i.e. "unskipped") on 2013/09/19 via:

    https://review.openstack.org/47434

and then re-introduced on 2013/10/01 by:

    https://review.openstack.org/49181

So at the time of writing this, the number of rechecks should not be increasing (IIUC).

I am out of the office until 10/07/2013.

I will take vacation from 28th Sep to 7th Oct . If have any urgent, please
call 13910806810

Note: This is an automated response to your message "[Bug 1205344] Re:
mkfs error in test_stamp_pattern" sent on 10/03/2013 23:44:07.

This is the only notification you will receive while this person is away.

John Griffith (john-griffith) wrote :

Root cause in Cinder should be same as #1226337, and addressed by: Change-Id: I3cabab0d214c051267638a627664df2b673236e3

Changed in cinder:
status: New → Fix Committed
Thierry Carrez (ttx) on 2013-10-14
Changed in cinder:
status: Fix Committed → Fix Released
tags: removed: havana-rc-potential
Attila Fazekas (afazekas) wrote :

The bug is not fully fixed, the test cannot be unskipped: https://review.openstack.org/#/c/62886/

Change abandoned by Sean M. Collins (<email address hidden>) on branch: master
Review: https://review.openstack.org/196198

Matt Riedemann (mriedem) on 2015-08-28
Changed in tempest:
status: Fix Released → In Progress
assignee: Sean Dague (sdague) → Matt Riedemann (mriedem)
importance: Critical → Medium
milestone: havana-3 → none
Changed in cinder:
assignee: nobody → John Griffith (john-griffith)
importance: Undecided → Medium
no longer affects: nova

Reviewed: https://review.openstack.org/218664
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=a661f96b72d88c80f75470551975c6ca761c963e
Submitter: Jenkins
Branch: master

commit a661f96b72d88c80f75470551975c6ca761c963e
Author: Matt Riedemann <email address hidden>
Date: Sun Aug 30 08:27:45 2015 -0700

    Add debug logging before attaching volume in driver

    It'd be helpful to log the volume/instance/mount/host
    information in attach_volume before calling the volume
    driver to do the attachment to know we have the right
    values.

    Change-Id: I8e1111e900032d19d5c46bad266b79b050b40851
    Related-Bug: #1205344

Benny Kopilov (bkopilov) wrote :
Download full text (5.1 KiB)

Hi ,
I managed to reproduce it with tempest run , attaching logs and timestamp .

Hi ,
I managed to reproduce this issue locally with tempest .

Here are the steps :
#1 Create a volume 1G size
#2 Create an instance from image
#3 attach the volume to the instance immediatly after server is up .
Nova triggers attach request to cinder.

From nova.api:
2015-11-18 06:58:04.631 get attach request

2015-11-18 06:58:04.631 3734 INFO nova.api.openstack.compute.contrib.volumes [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] Attach volume 2b7ae21e-4528-46eb-af21-a821b90aa6c1 to instance cbd09d10-d629-4d56-b15b-12b33a340ada at /dev/vdb
2015-11-18 06:58:05.133 3734 INFO nova.osapi_compute.wsgi.server [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] 192.168.100.1 "POST /v2/8f0b9aaa1c084e8e94a27a3de81a5f73/servers/cbd09d10-d629-4d56-b15b-12b33a340ada/os-volume_attachments HTTP/1.1" status: 200 len: 386 time: 0.5057290

From nova.compute logs :
2015-11-18 06:58:04.808 - attach request
2015-11-18 06:58:08.259 2334 - volume not found

2015-11-18 06:58:02.459 2334 INFO nova.compute.manager [req-e15172cf-5414-46cd-a187-984fc84b3fdc - - - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] During sync_power_state the instance has a pending task (spawning). Skip.
2015-11-18 06:58:02.460 2334 INFO nova.compute.manager [req-e15172cf-5414-46cd-a187-984fc84b3fdc - - - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] VM Resumed (Lifecycle Event)
2015-11-18 06:58:02.581 2334 INFO nova.compute.manager [-] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] During sync_power_state the instance has a pending task (spawning). Skip.
2015-11-18 06:58:02.770 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Auditing locally available compute resources for node compute1
2015-11-18 06:58:03.080 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Total usable vcpus: 2, total allocated vcpus: 1
2015-11-18 06:58:03.081 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Final resource view: name=compute1 phys_ram=3791MB used_ram=1024MB phys_disk=6GB used_disk=1GB total_vcpus=2 used_vcpus=1 pci_stats=<nova.pci.stats.PciDeviceStats object at 0x5063e10>
2015-11-18 06:58:03.104 2334 INFO nova.scheduler.client.report [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Compute_service record updated for ('compute1', 'compute1')
2015-11-18 06:58:03.104 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Compute_service record updated for compute1:compute1
2015-11-18 06:58:04.808 2334 INFO nova.compute.manager [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] Attaching volume 2b7ae21e-4528-46eb-af21-a821b90aa6c1 to /dev/vdb
2015-11-18 06:58:08.259 2334 WARNING nova.virt.libvirt.volume [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e...

Read more...

Change abandoned by Matthew Treinish (<email address hidden>) on branch: master
Review: https://review.openstack.org/152224

Imran Hayder (hayderimran7) wrote :

Hello, we have this test passing on our system using scaleIO backend of cinder.

"""
tempest.scenario.test_stamp_pattern.TestStampPattern
    test_stamp_pattern[compute,id-10fd234a-515c-41e5-b092-8323060598c5,image,network,volume]OK 579.78

Slowest 1 tests took 579.78 secs:
tempest.scenario.test_stamp_pattern.TestStampPattern
    test_stamp_pattern[compute,id-10fd234a-515c-41e5-b092-8323060598c5,image,network,volume] 579.78

Ran 1 test in 602.344s

OK
"""
can this test be unskipped?

I have manually tested too, login to cirros and run 'sudo /usr/sbin/mkfs.ext4 /dev/vdc' and it completes successfully.
followed by taking snapshot, attaching and verifying the data written persists -

Im not sure its good idea to skip this test as a whole, it seems to be a bug with default cinder backend used with upstream.
however when run with scaleIO backend , this test surely passes .

Change abandoned by Imran Hayder Malik (<email address hidden>) on branch: master
Review: https://review.openstack.org/299085

Change abandoned by Ken'ichi Ohmichi (<email address hidden>) on branch: master
Review: https://review.openstack.org/52740
Reason: As http://lists.openstack.org/pipermail/openstack-dev/2016-May/096204.html , we start abandoning patches have gotten old without any updating after negative feedback. Please restore if necessary to restart this again.

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/218355
Reason: Apparently this was much easier than I was trying to do:

https://review.openstack.org/#/c/431800/

Changed in tempest:
assignee: Matt Riedemann (mriedem) → nobody
status: In Progress → Incomplete
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers