test_volume_boot_pattern tempest test failure for glusterfs backend

Bug #1441050 reported by Deepak C Shetty
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Undecided
Bharat Kumar Kobagana

Bug Description

1) test_volume_boot_pattern tempest testcase fails for glusterfs backend

NOTE that it was disabled, until the lvm/generic driver issue was fixed, post which the test was enabled and its been failing for ceph and glusterfs tho' the failures are different.

For ceph case: refer to lp bug https://bugs.launchpad.net/cinder/+bug/1439371
For glusterfs case: this is the place to be :)

2) Per my debug, this test for glusterfs backend fails at the last leg where a new volume is created from snapshot and
nova is asked to boot a instance using boot-from-volume (BFV) semantics, passing this new volume as the source.

Changed in cinder:
assignee: nobody → Deepak C Shetty (dpkshetty)
Changed in cinder:
assignee: Deepak C Shetty (dpkshetty) → Bharat Kumar Kobagana (bharat-kobagana)
Revision history for this message
Deepak C Shetty (dpkshetty) wrote :
Download full text (18.1 KiB)

nova cli error:

Server building... 0% complete
Error building server
ERROR (InstanceInErrorState): Build of instance 595b6b3e-5460-496d-ad53-49fb92fcff95 aborted: Failure prepping block device.

[stack@devstack-f21 ~]$ [admin] nova list
+--------------------------------------+------------------------------------+--------+----------------------+-------------+------------------+
| ID | Name | Status | Task State | Power State | Networks |
+--------------------------------------+------------------------------------+--------+----------------------+-------------+------------------+
| bd5e17bd-1ddb-4e5a-ac2c-959e169af66e | vm1 | ACTIVE | - | Running | private=10.0.0.3 |
| 30cf5a36-f259-47c8-9da4-97f0d647a620 | vm2 | ACTIVE | - | Running | private=10.0.0.4 |
| 595b6b3e-5460-496d-ad53-49fb92fcff95 | vm3-using-volume-created-from-snap | ERROR | block_device_mapping | NOSTATE | |
+--------------------------------------+------------------------------------+--------+----------------------+-------------+------------------+

c-vol excp:

2015-04-07 08:22:18.926 ERROR cinder.volume.manager [-] Unable to fetch connection information from backend: md5() argument 1 must be string or buffer, not None
2015-04-07 08:22:18.927 ERROR oslo_messaging.rpc.dispatcher [-] Exception during message handling: Bad or unexpected response from the storage volume backend API: Unable to fetch connection information from backend: md5() argument 1 must be string or buffer, not None
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher Traceback (most recent call last):
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 142, in _dispatch_and_reply
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher executor_callback))
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 186, in _dispatch
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher executor_callback)
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 130, in _do_dispatch
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher result = func(ctxt, **new_args)
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 105, in wrapper
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher return f(*args, **kwargs)
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher File "/opt/stack/cinder/cinder/volume/manager.py", line 1076, in initialize_connection
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher raise exception.VolumeBackendAPIException(data=err_msg)
2015-04-07 08:22:18.927 TRACE oslo_messaging.rpc.dispatcher VolumeBackendAPIException: Bad or unexpected response from the storage volume backend API: Unable to fet...

description: updated
Revision history for this message
John Griffith (john-griffith) wrote :

Have you looked in glusterfs.py (on the Cinder side) to see what the None argument it's complaining about is? The driver.initialize_connection call appears to be an issue here.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/171482

Changed in cinder:
status: New → In Progress
Revision history for this message
Bharat Kumar Kobagana (bharat-kobagana) wrote :

This issue is because, in glusterfs.py "create_volume_from_snapshot" method is not return the provider location of the volume which is created from Snapshot.

Because of this while attaching the volume to a VM, it is failing to get mount path of that volume.

Sent the patch to resolve this issue: https://review.openstack.org/171482

Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

So looks like with the above patch, we still fail the tempest test case, but for a different reason now

1) The above patch fixes the provider_location issue and hence with that patch we don't see error we used to get for initialize_connection

2) Now we are seeing another issue, where it looks like the tempest testcase actually deletes a snapshot (of a volume thats attached to a instance) when the instance is in OFF state, which causes Nova/libvirt to throw:

2015-04-08 07:22:44.376 ERROR nova.virt.libvirt.driver [req-1c865254-dafc-4f76-bee3-5ccc075e1034 nova service] Error occurred during volume_snapshot_delete, sending error status to Cinder.
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver Traceback (most recent call last):
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1953, in volume_snapshot_delete
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver snapshot_id, delete_info=delete_info)
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 1893, in _volume_snapshot_delete
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver rebase_bw, rebase_flags)
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver result = proxy_call(self._autowrap, f, *args, **kwargs)
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver rv = execute(f, *args, **kwargs)
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver six.reraise(c, e, tb)
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver rv = meth(*args, **kwargs)
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver File "/usr/local/lib/python2.7/dist-packages/libvirt.py", line 792, in blockRebase
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver if ret == -1: raise libvirtError ('virDomainBlockRebase() failed', dom=self)
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver libvirtError: Requested operation is not valid: domain is not running
2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver

3) Note that glusterfs (unlike other fs drivers) uses hyp assisted snap creation/deletion, hence the testcase should ideally delete the snap, detach the volume and then shutdown the instance. But it probably doesn't do that in that order hence the issue.

@Bharat is investigating more.

Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

Patch for the 2nd issue posted at
https://review.openstack.org/#/c/171597/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/171482
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=ff50c28386e27654688a7ef255f70f2fc3bcde37
Submitter: Jenkins
Branch: master

commit ff50c28386e27654688a7ef255f70f2fc3bcde37
Author: Bharat Kumar Kobagana <email address hidden>
Date: Wed Apr 8 12:02:32 2015 +0530

    GlusterFS: Returning provider location of volume from snapshot

    This patch returns the provider location of volume, which is
    created from snapshot.

    Partial-Bug: 1441050
    Reported-By: Deepak C Shetty <email address hidden>
    Change-Id: I5994f2f6c125888d442055d9e65355f0af1b22f4

Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

For the problem seen in #2 of comment #5, i started a discussion thread on openstack-dev
which can be followed upon from here:

http://lists.openstack.org/pipermail/openstack-dev/2015-April/060983.html

Per eblake of libvirt, we don't yet have offline snap deletion support in libvirt, so I propose to be able to skip
this testcase for affected fs-backed cinder drivers (eg: glusterfs) so that the CI job doesn't keep failing
just for this 1 testcase issue ?

Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

A related patch for fixing alleged race condition in this testcase was posted and it has some interesting discussion around this testcase, so just dumping that patch link as FYI here

https://review.openstack.org/#/c/171569/

Revision history for this message
Deepak C Shetty (dpkshetty) wrote :
Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

Will open another bug to track the libvirt issue that glusterfs is hitting, since it easier to track 1 issue in 1 bug.

Changed in cinder:
status: In Progress → Fix Committed
Revision history for this message
Deepak C Shetty (dpkshetty) wrote :

Opened https://bugs.launchpad.net/cinder/+bug/1444806 to track the Part 2 of the issue.

Thierry Carrez (ttx)
Changed in cinder:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Revision history for this message
Einst Crazy (yu-changcai) wrote :

This bug affect branch stable/juno too. As, it came failed to attach the volume to a vm, which was created by snapshot.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/224404

Thierry Carrez (ttx)
Changed in cinder:
milestone: liberty-1 → 7.0.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (stable/juno)

Change abandoned by Alan Pevec (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/224404

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.