Bug #1205344 “mkfs error in test_stamp_pattern” : Bugs : tempest

Revision history for this message

Sean Dague (sdague) wrote on 2013-07-26:

#1

Also in http://logs.openstack.org/09/38709/5/check/gate-tempest-devstack-vm-postgres-full/35583/console.html

Changed in tempest:
status:	New → Confirmed
importance:	Undecided → Critical

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-07-26: Fix proposed to tempest (master)

#2

Fix proposed to branch: master
Review: https://review.openstack.org/38841

Changed in tempest:
assignee:	nobody → Attila Fazekas (afazekas)
status:	Confirmed → In Progress

Revision history for this message

Attila Fazekas (afazekas) wrote on 2013-07-27:

#3

Looks like not a tempest issue.

The "SCSI Initiator over TCP/IP" appears in the syslog.txt.gz , but the volume not visible in the guest even after a longer wait.

Attila Fazekas (afazekas) on 2013-07-27

Changed in tempest:
assignee:	Attila Fazekas (afazekas) → nobody
status:	In Progress → Incomplete

Revision history for this message

Attila Fazekas (afazekas) wrote on 2013-07-27:

#4

http://logs.openstack.org/41/38841/2/gate/gate-tempest-devstack-vm-full/36336/

- The volume reported as 'in-use' (2013-07-27 03:38:42)
- The host system has a syslog entry about the iSCSI initiator attachment as sda (Jul 27 03:38:40)
- The volume not visible on the guest even after 400 sec wait (2013-07-27 03:38:54 .. 2013-07-27 03:45:33)

The issue cannot be fixed and/or workaround on tempest side.
My attempt was based on the guest system may need several ms to realize it has a new device.

This is very likely a nova issue or an issue in a component related to nova.

Revision history for this message

Sean Dague (sdague) wrote on 2013-07-27:

#5

So this bug was exposed by a new test in tempest, that tries to verify that the volume can be attached to a guest. Turns out, in some low single digit set of cases, it can't. I think this is a real race in nova / cinder that needs fixing.

Changed in nova:
status:	New → Confirmed
importance:	Undecided → High

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-07-27:

#6

Fix proposed to branch: master
Review: https://review.openstack.org/38923

Changed in tempest:
assignee:	nobody → Sean Dague (sdague)
status:	Incomplete → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-07-29: Fix merged to tempest (master)

#7

Reviewed: https://review.openstack.org/38923
Committed: http://github.com/openstack/tempest/commit/fe8a609481f87ad8ef4c29047c2c17e6b4cd3c2c
Submitter: Jenkins
Branch: master

commit fe8a609481f87ad8ef4c29047c2c17e6b4cd3c2c
Author: Sean Dague <email address hidden>
Date: Sat Jul 27 08:15:55 2013 -0400

skip stamp test until race in nova can be debugged

    This test is causing a lot of gate resets, and as far as I can
    tell they are really nova races in volume attachments. However
    we should skip the test for now until these can be sorted out.

Related to bug #1205344

Change-Id: I983779841ac3d3f79442e96b37a0cc8da85ae55b

Changed in tempest:
status:	In Progress → Fix Released

Revision history for this message

John Griffith (john-griffith) wrote on 2013-07-30:

#8

Looking at this I'm wondering if the attach is possibly being made on a different dev? Regardless, this brings up an issue of not having any good logging info on the cinder-attach side. I'll improve the logging in Cinder's attach calls to help with this sort of thing and I'll look at this a bit more. We might be better off after improving the logging if we turn the test back on.

I'll update after I have some time to fix the logging and look at this a bit closer.

Revision history for this message

Sean Dague (sdague) wrote on 2013-07-30:

#9

Great, thanks John.

jiji (zzww-666) on 2013-07-31

information type:

Public → Public Security

Revision history for this message

Jeremy Stanley (fungi) wrote on 2013-07-31:

#10

jiji, what's exploitable security vulnerability are you seeing here?

Revision history for this message

Thierry Carrez (ttx) wrote on 2013-08-01:

#11

Setting back to public until some explanation is given.

information type:

Public Security → Public

Joe Gordon (jogo) on 2013-09-18

Changed in nova:
milestone:	none → havana-rc1

Joe Gordon (jogo) on 2013-09-18

Changed in nova:
importance:	High → Critical

Russell Bryant (russellb) on 2013-09-18

Changed in nova:
importance:	Critical → Medium

Revision history for this message

David Kranz (david-kranz) wrote on 2013-09-19:

#12

There seems to be a disagreement about what to do about this bug. Given that there are currently more than 100 nova bugs marked High or Critical, I think setting it to Medium means it will not get fixed any time soon. Russell, is that correct?

There is a proposed patch to stop skipping this test in tempset https://review.openstack.org/#/c/47434/1. I am not sure if I approve of that or not.

Revision history for this message

Russell Bryant (russellb) wrote on 2013-09-19:

#13

Yes, unless someone takes a specific interest to it, I don't expect anyone to jump on it. Part of it is because it's not obvious what the problem is. It looks like there's still a good amount of work to do to figure out where the problem is.

Russell Bryant (russellb) on 2013-09-23

Changed in nova:
milestone:	havana-rc1 → none
tags:	added: havana-rc-potential
tags:	added: havana-rc-proposed removed: havana-rc-potential
tags:	added: havana-rc-potential removed: havana-rc-proposed

Sean Dague (sdague) on 2013-09-23

Changed in tempest:
milestone:	none → havana-3

Revision history for this message

Adam Spiers (adam.spiers) wrote on 2013-10-03:

#14

For the convenience of future reference, after the skipping of this test was introduced as per #7 above, it was removed (i.e. "unskipped") on 2013/09/19 via:

https://review.openstack.org/47434

and then re-introduced on 2013/10/01 by:

https://review.openstack.org/49181

So at the time of writing this, the number of rechecks should not be increasing (IIUC).

Revision history for this message

Zhikun Liu (zhikunliu) wrote on 2013-10-03: AUTO: Zhi Kun ZK Liu is on vacation

#15

I am out of the office until 10/07/2013.

I will take vacation from 28th Sep to 7th Oct . If have any urgent, please
call 13910806810

Note: This is an automated response to your message "[Bug 1205344] Re:
mkfs error in test_stamp_pattern" sent on 10/03/2013 23:44:07.

This is the only notification you will receive while this person is away.

Revision history for this message

John Griffith (john-griffith) wrote on 2013-10-10:

#16

Root cause in Cinder should be same as #1226337, and addressed by: Change-Id: I3cabab0d214c051267638a627664df2b673236e3

Changed in cinder:
status:	New → Fix Committed

Thierry Carrez (ttx) on 2013-10-14

Changed in cinder:
status:	Fix Committed → Fix Released
tags:	removed: havana-rc-potential

Revision history for this message

Attila Fazekas (afazekas) wrote on 2014-02-28:

#17

The bug is not fully fixed, the test cannot be unskipped: https://review.openstack.org/#/c/62886/

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-02-02: Related fix proposed to tempest (master)

#18

Related fix proposed to branch: master
Review: https://review.openstack.org/152224

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-07-15:

#19

See the ML thread here on a solution:

http://lists.openstack.org/pipermail/openstack-dev/2015-July/069607.html

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-07-17: Change abandoned on tempest (master)

#20

Change abandoned by Sean M. Collins (<email address hidden>) on branch: master
Review: https://review.openstack.org/196198

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-28: Fix proposed to tempest (master)

#21

Fix proposed to branch: master
Review: https://review.openstack.org/218355

Matt Riedemann (mriedem) on 2015-08-28

Changed in tempest:
status:	Fix Released → In Progress
assignee:	Sean Dague (sdague) → Matt Riedemann (mriedem)
importance:	Critical → Medium
milestone:	havana-3 → none
Changed in cinder:
assignee:	nobody → John Griffith (john-griffith)
importance:	Undecided → Medium
no longer affects:	nova

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-08-30: Related fix proposed to cinder (master)

#22

Related fix proposed to branch: master
Review: https://review.openstack.org/218664

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-09-01: Related fix merged to cinder (master)

#23

Reviewed: https://review.openstack.org/218664
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=a661f96b72d88c80f75470551975c6ca761c963e
Submitter: Jenkins
Branch: master

commit a661f96b72d88c80f75470551975c6ca761c963e
Author: Matt Riedemann <email address hidden>
Date: Sun Aug 30 08:27:45 2015 -0700

Add debug logging before attaching volume in driver

    It'd be helpful to log the volume/instance/mount/host
    information in attach_volume before calling the volume
    driver to do the attachment to know we have the right
    values.

Change-Id: I8e1111e900032d19d5c46bad266b79b050b40851
Related-Bug: #1205344

Revision history for this message

Benny Kopilov (bkopilov) wrote on 2015-11-18:

#24

Download full text (5.1 KiB)

Hi ,
I managed to reproduce it with tempest run , attaching logs and timestamp .

Hi ,
I managed to reproduce this issue locally with tempest .

Here are the steps :
#1 Create a volume 1G size
#2 Create an instance from image
#3 attach the volume to the instance immediatly after server is up .
Nova triggers attach request to cinder.

From nova.api:
2015-11-18 06:58:04.631 get attach request

2015-11-18 06:58:04.631 3734 INFO nova.api.openstack.compute.contrib.volumes [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] Attach volume 2b7ae21e-4528-46eb-af21-a821b90aa6c1 to instance cbd09d10-d629-4d56-b15b-12b33a340ada at /dev/vdb
2015-11-18 06:58:05.133 3734 INFO nova.osapi_compute.wsgi.server [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] 192.168.100.1 "POST /v2/8f0b9aaa1c084e8e94a27a3de81a5f73/servers/cbd09d10-d629-4d56-b15b-12b33a340ada/os-volume_attachments HTTP/1.1" status: 200 len: 386 time: 0.5057290

From nova.compute logs :
2015-11-18 06:58:04.808 - attach request
2015-11-18 06:58:08.259 2334 - volume not found

2015-11-18 06:58:02.459 2334 INFO nova.compute.manager [req-e15172cf-5414-46cd-a187-984fc84b3fdc - - - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] During sync_power_state the instance has a pending task (spawning). Skip.
2015-11-18 06:58:02.460 2334 INFO nova.compute.manager [req-e15172cf-5414-46cd-a187-984fc84b3fdc - - - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] VM Resumed (Lifecycle Event)
2015-11-18 06:58:02.581 2334 INFO nova.compute.manager [-] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] During sync_power_state the instance has a pending task (spawning). Skip.
2015-11-18 06:58:02.770 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Auditing locally available compute resources for node compute1
2015-11-18 06:58:03.080 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Total usable vcpus: 2, total allocated vcpus: 1
2015-11-18 06:58:03.081 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Final resource view: name=compute1 phys_ram=3791MB used_ram=1024MB phys_disk=6GB used_disk=1GB total_vcpus=2 used_vcpus=1 pci_stats=<nova.pci.stats.PciDeviceStats object at 0x5063e10>
2015-11-18 06:58:03.104 2334 INFO nova.scheduler.client.report [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Compute_service record updated for ('compute1', 'compute1')
2015-11-18 06:58:03.104 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Compute_service record updated for compute1:compute1
2015-11-18 06:58:04.808 2334 INFO nova.compute.manager [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] Attaching volume 2b7ae21e-4528-46eb-af21-a821b90aa6c1 to /dev/vdb
2015-11-18 06:58:08.259 2334 WARNING nova.virt.libvirt.volume [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e...

Hi , 
I managed to reproduce it with tempest run , attaching logs and timestamp .

Hi ,  
I managed to reproduce this issue locally with tempest .

Here are the steps :
#1 Create a volume 1G size
#2 Create an instance from image 
#3 attach the volume to the instance immediatly after server is up .
Nova triggers attach request to cinder.

From nova.api:
2015-11-18 06:58:04.631  get attach request

2015-11-18 06:58:04.631 3734 INFO nova.api.openstack.compute.contrib.volumes [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] Attach volume 2b7ae21e-4528-46eb-af21-a821b90aa6c1 to instance cbd09d10-d629-4d56-b15b-12b33a340ada at /dev/vdb
2015-11-18 06:58:05.133 3734 INFO nova.osapi_compute.wsgi.server [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] 192.168.100.1 "POST /v2/8f0b9aaa1c084e8e94a27a3de81a5f73/servers/cbd09d10-d629-4d56-b15b-12b33a340ada/os-volume_attachments HTTP/1.1" status: 200 len: 386 time: 0.5057290

From nova.compute logs :
2015-11-18 06:58:04.808 - attach request
2015-11-18 06:58:08.259 2334 - volume not found

2015-11-18 06:58:02.459 2334 INFO nova.compute.manager [req-e15172cf-5414-46cd-a187-984fc84b3fdc - - - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] During sync_power_state the instance has a pending task (spawning). Skip.
2015-11-18 06:58:02.460 2334 INFO nova.compute.manager [req-e15172cf-5414-46cd-a187-984fc84b3fdc - - - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] VM Resumed (Lifecycle Event)
2015-11-18 06:58:02.581 2334 INFO nova.compute.manager [-] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] During sync_power_state the instance has a pending task (spawning). Skip.
2015-11-18 06:58:02.770 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Auditing locally available compute resources for node compute1
2015-11-18 06:58:03.080 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Total usable vcpus: 2, total allocated vcpus: 1
2015-11-18 06:58:03.081 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Final resource view: name=compute1 phys_ram=3791MB used_ram=1024MB phys_disk=6GB used_disk=1GB total_vcpus=2 used_vcpus=1 pci_stats=<nova.pci.stats.PciDeviceStats object at 0x5063e10>
2015-11-18 06:58:03.104 2334 INFO nova.scheduler.client.report [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Compute_service record updated for ('compute1', 'compute1')
2015-11-18 06:58:03.104 2334 INFO nova.compute.resource_tracker [req-09ce08b2-1862-4ca6-b0e3-89e70afb5898 - - - - -] Compute_service record updated for compute1:compute1
2015-11-18 06:58:04.808 2334 INFO nova.compute.manager [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] [instance: cbd09d10-d629-4d56-b15b-12b33a340ada] Attaching volume 2b7ae21e-4528-46eb-af21-a821b90aa6c1 to /dev/vdb
2015-11-18 06:58:08.259 2334 WARNING nova.virt.libvirt.volume [req-fad984bb-317e-4ae3-b723-4554d0a7a318 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] ISCSI volume not yet found at: vdb. Will rescan & retry.  Try number: 0

from cinder logs:

2015-11-18 06:58:10.033 4135 INFO cinder.api.v1.volumes [req-bfc29a66-9356-45e6-9cc6-bb7b7a15de99 72fe1c96402a48dc9e9279c281580021 8f0b9aaa1c084e8e94a27a3de81a5f73 - - -] vol={'migration_status': None, 'provider_id': None, 'availability_zone': u'nova', 'terminated_at': None, 'updated_at': datetime.datetime(2015, 11, 18, 11, 58, 9), 'provider_geometry': None, 'replication_extended_status': None, 'replication_status': u'disabled', 'snapshot_id': None, 'ec2_id': None, 'deleted_at': None, 'id': u'2b7ae21e-4528-46eb-af21-a821b90aa6c1', 'size': 1L, 'user_id': u'72fe1c96402a48dc9e9279c281580021', 'volume_attachment': u'[<cinder.db.sqlalchemy.models.VolumeAttachment object at 0x611df50>]', 'display_description': None, 'volume_admin_metadata': u'[<cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x611dd90>, <cinder.db.sqlalchemy.models.VolumeAdminMetadata object at 0x59e39d0>]', 'project_id': u'8f0b9aaa1c084e8e94a27a3de81a5f73', 'launched_at': datetime.datetime(2015, 11, 18, 11, 57, 41), 'scheduled_at': datetime.datetime(2015, 11, 18, 11, 57, 40), 'status': u'in-use', 'volume_type_id': None, 'multiattach': False, 'deleted': False, 'provider_location': u'192.168.101.193:3260,iqn.2010-10.org.openstack:volume-2b7ae21e-4528-46eb-af21-a821b90aa6c1 iqn.2010-10.org.openstack:volume-2b7ae21e-4528-46eb-af21-a821b90aa6c1 0', 'host': u'controller1@lvm1#lvm1', 'consistencygroup_id': None, 'source_volid': None, 'provider_auth': u'CHAP 2KkzGFBKsCAryaN5PNTV LAqD5jCt5wtYbyG3', 'display_name': u'tempest-TestStampPattern-1838133173', 'bootable': False, 'created_at': datetime.datetime(2015, 11, 18, 11, 57, 40), 'attach_status': u'attached', 'volume_type': None, 'consistencygroup': None, 'volume_metadata': [], '_name_id': None, 'encryption_key_id': None, 'replication_driver_data': None, 'metadata': {u'readonly': u'False', u'attached_mode': u'rw'}}

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-11-20: Change abandoned on tempest (master)

#25

Change abandoned by Matthew Treinish (<email address hidden>) on branch: master
Review: https://review.openstack.org/152224

Revision history for this message

Imran Hayder (hayderimran7) wrote on 2016-03-29:

#26

Hello, we have this test passing on our system using scaleIO backend of cinder.

"""
tempest.scenario.test_stamp_pattern.TestStampPattern
test_stamp_pattern[compute,id-10fd234a-515c-41e5-b092-8323060598c5,image,network,volume]OK 579.78

Slowest 1 tests took 579.78 secs:
tempest.scenario.test_stamp_pattern.TestStampPattern
test_stamp_pattern[compute,id-10fd234a-515c-41e5-b092-8323060598c5,image,network,volume] 579.78

Ran 1 test in 602.344s

OK
"""
can this test be unskipped?

I have manually tested too, login to cirros and run 'sudo /usr/sbin/mkfs.ext4 /dev/vdc' and it completes successfully.
followed by taking snapshot, attaching and verifying the data written persists -

Im not sure its good idea to skip this test as a whole, it seems to be a bug with default cinder backend used with upstream.
however when run with scaleIO backend , this test surely passes .

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-03-30: Related fix proposed to tempest (master)

#27

Related fix proposed to branch: master
Review: https://review.openstack.org/299085

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-04-11: Change abandoned on tempest (master)

#28

Change abandoned by Imran Hayder Malik (<email address hidden>) on branch: master
Review: https://review.openstack.org/299085

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-03:

#29

Change abandoned by Ken'ichi Ohmichi (<email address hidden>) on branch: master
Review: https://review.openstack.org/52740
Reason: As http://lists.openstack.org/pipermail/openstack-dev/2016-May/096204.html , we start abandoning patches have gotten old without any updating after negative feedback. Please restore if necessary to restart this again.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-02-14:

#30

Change abandoned by Matt Riedemann (<email address hidden>) on branch: master
Review: https://review.openstack.org/218355
Reason: Apparently this was much easier than I was trying to do:

https://review.openstack.org/#/c/431800/

Samantha Blanco (sblanco1) on 2017-06-28

Changed in tempest:
assignee:	Matt Riedemann (mriedem) → nobody
status:	In Progress → Incomplete

Paras Babbar (pbabbar) on 2020-02-27

Changed in tempest:
assignee:	nobody → Paras Babbar (pbabbar)

Revision history for this message

Paras Babbar (pbabbar) wrote on 2020-02-27:

#31

This has been fixed in https://review.opendev.org/#/c/698598/.

Thanks

Changed in tempest:
status:	Incomplete → Fix Released

tempest

mkfs error in test_stamp_pattern

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	Cinder	Fix Released	Medium	John Griffith
	tempest	Fix Released	Medium	Paras Babbar