mount nbd hangs because of previous umount failure

Bug #973413 reported by Unmesh Gurjar
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Davanum Srinivas (DIMS)

Bug Description

Scenario:
While injecting data in an instance, if the umount command fails, the
qemu-nbd disconnect command also fails and the device becomes unusable.
However, the device (/dev/nbd15) gets added back to the available device list.
So, if another instance is launched (with some injected data), it tries to use the
same nbd device (/dev/nbd15) which contains a stale connection.

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Pádraig Brady (p-draigbrady) wrote :

Why does the umount command fail (any errors in logs)?
Looking at the code, if "qemu-nbd -d" fails, then an exception is thrown
and we will _not_ add that nbd device into the available list.
Similarly if the umount commands fails, we won't even try
to run the "qemu-nbd -d" command.

So both "umount" and "qemu-nbd -d" would have to fail silently for this to happen?

What kernel/distro was this on?

Revision history for this message
Michael Still (mikal) wrote :

Note that we no longer keep a list of available devices, so that bit at least is now fixed.

Michael Still (mikal)
Changed in nova:
status: Confirmed → Incomplete
importance: Medium → Undecided
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Unmesh,
We cannot solve the issue you reported without more information. Could you please provide the requested information ?

Revision history for this message
Unmesh Gurjar (unmesh-gurjar) wrote :

AFAIR, the issue was seen on Ubuntu. Since then, I have not come across such issue. So, IMO, we can mark this bug as Invalid now.

Changed in nova:
status: Incomplete → Invalid
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

Seeing a bunch of failures in gate - see https://bugs.launchpad.net/nova/+bug/1254890 for the same behavior we see there that was originally reported in this bug.

summary: - refactoring of qemu-nbd error handling
+ mount nbd hangs because of previous umount failure
Changed in nova:
status: Invalid → Confirmed
importance: Undecided → High
Changed in nova:
assignee: nobody → Davanum Srinivas (DIMS) (dims-v)
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
Davanum Srinivas (DIMS) (dims-v) wrote :

possible cause of this issue is that the file system is not sync'ed :
http://markmail.org/message/hqinrqnw5eex7wue

So adding an explicit 'sync' command seems to work well.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/64383
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dd3f96e91581f465a52d10a212eb51b147dc38b5
Submitter: Jenkins
Branch: master

commit dd3f96e91581f465a52d10a212eb51b147dc38b5
Author: Davanum Srinivas <email address hidden>
Date: Sun Dec 29 19:18:04 2013 -0500

    Fix for qemu-nbd hang

    NBD device once used seem to run into intermittent trouble when
    used with mount repeatedly. Adding code to explicitly flush the
    device buffers using 'blockdev --flushbufs'.

    Closes-Bug: #973413
    Partial-Bug: #1254890
    Change-Id: I2b7053b9a069d6e82f6f6baf9ad480efa4388d91

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → icehouse-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/67662

Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/66740
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a0891ad0af316f5423c106e11ca2af7b17b76dd3
Submitter: Jenkins
Branch: master

commit a0891ad0af316f5423c106e11ca2af7b17b76dd3
Author: Davanum Srinivas <email address hidden>
Date: Tue Jan 14 19:29:46 2014 -0500

    Additional check for qemu-nbd hang

    /sys/block/*device*/pid check is not enough. I see that the unix
    socket used by the device my be stuck as well, so let's add another
    check for the path to the unix socket for the device as well to
    figure out if the device is free. Complain loud and clear that the
    qemu-nbd is leaking resources.

    Change-Id: I28cedffba7a9915ef6f7888989e40e4d0de475c6
    Closes-Bug: #973413
    Partial-Bug: #1254890

Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-2 → 2014.1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/67662
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1681b957bfa236c69dc233bc7661fff9b5113ad0
Submitter: Jenkins
Branch: stable/havana

commit 1681b957bfa236c69dc233bc7661fff9b5113ad0
Author: Davanum Srinivas <email address hidden>
Date: Sun Dec 29 19:18:04 2013 -0500

    Fix for qemu-nbd hang

    NBD device once used seem to run into intermittent trouble when
    used with mount repeatedly. Adding code to explicitly flush the
    device buffers using 'blockdev --flushbufs'.

    NOTE(dmllr): Also merged regression fix:

        Consolidate the blockdev related filters

        Seeing a "/usr/local/bin/nova-rootwrap: Unauthorized command" error
        in the logs when "blockdev --flushbufs" is executed because of the
        existing blockdev in compute.filters. We need to merge both into a
        single RegExpFilter

        Change-Id: Ic323ff00e5c23786a6e376e67a4ad08f92708aef

    (cherry-picked from dd3f96e91581f465a52d10a212eb51b147dc38b5)
    (cherry-picked from 31b2791e6be1768c410de5fa32283fcb637bda57)

    Closes-Bug: #973413
    Partial-Bug: #1254890
    Change-Id: I2b7053b9a069d6e82f6f6baf9ad480efa4388d91

tags: added: in-stable-havana
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.