Attach volume sometimes incorrectly reports success

Bug #673756 reported by Devin Carlen
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Unassigned

Bug Description

Recently while I was refactoring the smoke tests, I found that attach volume is reporting success even when the operation fails. I was able to work around this in the smoke tests by waiting longer before attaching. My initial sense is that the volume is reporting its state as "available" some small amount of time (5-15 seconds?) before it is really available.

Here is a snippet from the volume smoke tests that show how I was forced to work around the issue:

http://paste.openstack.org/show/100/

Tags: attach volume
Revision history for this message
Soren Hansen (soren) wrote :

The volume driver's create_export method should block until the device is actually ready. There must be some way to check when vblade (assuming the you're using AoE) has actually gotten its act together and exported the device?

Revision history for this message
Vish Ishaya (vishvananda) wrote : Re: [Bug 673756] Re: Attach volume sometimes incorrectly reports success

This is apparently happening with iscsi. I'd love to get some tracebacks because I'm not sure what exactly is going to quickly. I guess it is possible that the iscsi target shell is returning before the target is actually created, but this seems unlikely.

Vish

On Nov 22, 2010, at 1:36 AM, Soren Hansen wrote:

> The volume driver's create_export method should block until the device
> is actually ready. There must be some way to check when vblade (assuming
> the you're using AoE) has actually gotten its act together and exported
> the device?
>
> --
> Attach volume sometimes incorrectly reports success
> https://bugs.launchpad.net/bugs/673756
> You received this bug notification because you are a member of Nova
> Bugs, which is subscribed to OpenStack Compute (nova).
>
> Status in OpenStack Compute (Nova): New
>
> Bug description:
> Recently while I was refactoring the smoke tests, I found that attach volume is reporting success even when the operation fails. I was able to work around this in the smoke tests by waiting longer before attaching. My initial sense is that the volume is reporting its state as "available" some small amount of time (5-15 seconds?) before it is really available.
>
> Here is a snippet from the volume smoke tests that show how I was forced to work around the issue:
>
> http://paste.openstack.org/show/100/
>
>

Soren Hansen (soren)
Changed in nova:
status: New → Confirmed
Thierry Carrez (ttx)
Changed in nova:
importance: High → Medium
Revision history for this message
Mark McLoughlin (markmc) wrote :

Ok, so there's two things allegedly doing on here:

  1) A volume is marked as 'available' before it is actually possible to attach a volume

  2) Attaching a volume sometimes succeeds even when we actually failed to attach because it wasn't ready

Now, in VolumeManager.create_volume(), we mark the volume as available just before returning successfully. That implies to me that the call shouldn't return until the volume is available. And, as Soren points out, that means create_export() should block until it's ready.

That means that this loop in test_002_can_attach_volume() doesn't make sense:

        for x in xrange(10):
            volume.update()
            if volume.status.startswith('available'):
                break
            time.sleep(1)
        else:
            self.fail('cannot attach volume with state %s' % volume.status)

i.e. create_volume() has already allegedly succeeded, so the volume should have 'available' status

To reproduce the bug, I think you just need to remove the two five second sleeps between create_volume() and attach_volume() in the test. I haven't tried reproducing yet, so it'd be nice to know what cases are failing - iSCSI only, or all volume drivers? tgtadm or ietadm? Xen/KVM? With an instance running on the volumes host, or remotely, or both?

In the case of ISCSIDriver, the last thing we do in create_export() is run 'tgtdadm --op new --mode=logicalunit ...'. So, we need to understand the semantics of that. Is it supposed to block until the LUN is available? If not, how do we poll for completion?

And we also need to figure out why attach_volume returns successfully when it has failed.

Revision history for this message
yong sheng gong (gongysh) wrote :

I am trying to reproduce it with ISCSIDriver

Ilya Pekelny (i159)
Changed in nova:
status: Confirmed → In Progress
Ilya Pekelny (i159)
Changed in nova:
assignee: nobody → Ilya Pekelny (i159)
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → havana-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-1 → 2013.2
Ilya Pekelny (i159)
Changed in nova:
assignee: Ilya Pekelny (i159) → nobody
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.