Fibre Channel Multipath attach race condition

Bug #1175366 reported by Walt Boring
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Walt Boring

Bug Description

When the system is under load and a Fibre Channel attach happens with multipath installed, sometimes only 1 devices shows up under multipath -l command. This causes the detach to fail to detach all of the devices since the libvirt volume code runs multipath -l at attach time, and not all devices show up at that time due to system load and/or FC fabric traffic.

We need to rediscover all of the devices at detach time and ensure we are removing all of those devices. The devices should exist at detach time.

Revision history for this message
Walt Boring (walter-boring) wrote :

Here are example output for multipath -l /dev/sdl at different times.

ATTACH time
350002ac110e2383d dm-5 3PARdata,VV
size=4.8G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  `- 2:0:0:5 sdl 8:176 active undef running

DETACH TIME
350002ac110e2383d dm-5 3PARdata,VV
size=4.8G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=-1 status=active
  |- 2:0:0:5 sdl 8:176 active undef running
  `- 1:0:0:5 sdm 8:192 active undef running

affects: cinder → nova
Changed in nova:
assignee: nobody → Walt Boring (walter-boring)
Michael Still (mikal)
Changed in nova:
status: New → Triaged
importance: Undecided → Medium
Changed in nova:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/29320
Committed: http://github.com/openstack/nova/commit/080476b2d383b148f6fc8d202c3b0509f9bb1d66
Submitter: Jenkins
Branch: master

commit 080476b2d383b148f6fc8d202c3b0509f9bb1d66
Author: Walter A. Boring IV <email address hidden>
Date: Wed May 15 16:00:43 2013 -0700

    Fix dangling LUN issue under load with multipath

    This fixes an issue where not all of the LUNs are seen
    by the kernel at attach time, but later become available.
    We now rescan the list of devices seen by multipath at
    detach time.

    Also added another unit test case.

    Fixes Bug: #1175366

    Change-Id: Id5b313b17454ec32672373b7b564b9450466b7a2

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → havana-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-2 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.