OpenStack Compute (nova)

mount nbd hangs because of previous umount failure

Bug #973413 reported by Unmesh Gurjar on 2012-04-04

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	High	Davanum Srinivas (DIMS)	OpenStack Compute (nova) 2014.1 "icehouse"

Bug Description

Scenario:
While injecting data in an instance, if the umount command fails, the
qemu-nbd disconnect command also fails and the device becomes unusable.
However, the device (/dev/nbd15) gets added back to the available device list.
So, if another instance is launched (with some injected data), it tries to use the
same nbd device (/dev/nbd15) which contains a stale connection.

Tags:

Russell Bryant (russellb) on 2012-04-21

Changed in nova:
status:	New → Confirmed
importance:	Undecided → Medium

Revision history for this message

Pádraig Brady (p-draigbrady) wrote on 2012-09-07:

Why does the umount command fail (any errors in logs)?
Looking at the code, if "qemu-nbd -d" fails, then an exception is thrown
and we will _not_ add that nbd device into the available list.
Similarly if the umount commands fails, we won't even try
to run the "qemu-nbd -d" command.

So both "umount" and "qemu-nbd -d" would have to fail silently for this to happen?

What kernel/distro was this on?

Revision history for this message

Michael Still (mikal) wrote on 2012-12-13:

Note that we no longer keep a list of available devices, so that bit at least is now fixed.

Michael Still (mikal) on 2013-02-28

Changed in nova:
status:	Confirmed → Incomplete
importance:	Medium → Undecided

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2013-03-18:

Unmesh,
We cannot solve the issue you reported without more information. Could you please provide the requested information ?

Revision history for this message

Unmesh Gurjar (unmesh-gurjar) wrote on 2013-03-18:

AFAIR, the issue was seen on Ubuntu. Since then, I have not come across such issue. So, IMO, we can mark this bug as Invalid now.

Davanum Srinivas (DIMS) (dims-v) on 2013-03-18

Changed in nova:
status:	Incomplete → Invalid

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2013-12-20:

Seeing a bunch of failures in gate - see https://bugs.launchpad.net/nova/+bug/1254890 for the same behavior we see there that was originally reported in this bug.

summary:	- refactoring of qemu-nbd error handling + mount nbd hangs because of previous umount failure
Changed in nova:
status:	Invalid → Confirmed
importance:	Undecided → High

Davanum Srinivas (DIMS) (dims-v) on 2013-12-21

Changed in nova:
assignee:	nobody → Davanum Srinivas (DIMS) (dims-v)

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2013-12-27:

review is here - https://review.openstack.org/#/c/64177/

Changed in nova:
status:	Confirmed → In Progress

Revision history for this message

Davanum Srinivas (DIMS) (dims-v) wrote on 2013-12-28:

possible cause of this issue is that the file system is not sync'ed :
http://markmail.org/message/hqinrqnw5eex7wue

So adding an explicit 'sync' command seems to work well.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2013-12-31: Fix merged to nova (master)

Reviewed: https://review.openstack.org/64383
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=dd3f96e91581f465a52d10a212eb51b147dc38b5
Submitter: Jenkins
Branch: master

commit dd3f96e91581f465a52d10a212eb51b147dc38b5
Author: Davanum Srinivas <email address hidden>
Date: Sun Dec 29 19:18:04 2013 -0500

Fix for qemu-nbd hang

    NBD device once used seem to run into intermittent trouble when
    used with mount repeatedly. Adding code to explicitly flush the
    device buffers using 'blockdev --flushbufs'.

    Closes-Bug: #973413
    Partial-Bug: #1254890
    Change-Id: I2b7053b9a069d6e82f6f6baf9ad480efa4388d91

Changed in nova:
status:	In Progress → Fix Committed

Russell Bryant (russellb) on 2014-01-13

Changed in nova:
milestone:	none → icehouse-2

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-18: Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/67662

Thierry Carrez (ttx) on 2014-01-22

Changed in nova:
status:	Fix Committed → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-01-25: Fix merged to nova (master)

#10

Reviewed: https://review.openstack.org/66740
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a0891ad0af316f5423c106e11ca2af7b17b76dd3
Submitter: Jenkins
Branch: master

commit a0891ad0af316f5423c106e11ca2af7b17b76dd3
Author: Davanum Srinivas <email address hidden>
Date: Tue Jan 14 19:29:46 2014 -0500

Additional check for qemu-nbd hang

    /sys/block/*device*/pid check is not enough. I see that the unix
    socket used by the device my be stuck as well, so let's add another
    check for the path to the unix socket for the device as well to
    figure out if the device is free. Complain loud and clear that the
    qemu-nbd is leaking resources.

    Change-Id: I28cedffba7a9915ef6f7888989e40e4d0de475c6
    Closes-Bug: #973413
    Partial-Bug: #1254890

Thierry Carrez (ttx) on 2014-04-17

Changed in nova:
milestone:	icehouse-2 → 2014.1

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2014-07-31: Fix merged to nova (stable/havana)

#11

Reviewed: https://review.openstack.org/67662
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1681b957bfa236c69dc233bc7661fff9b5113ad0
Submitter: Jenkins
Branch: stable/havana

commit 1681b957bfa236c69dc233bc7661fff9b5113ad0
Author: Davanum Srinivas <email address hidden>
Date: Sun Dec 29 19:18:04 2013 -0500

Fix for qemu-nbd hang

    NBD device once used seem to run into intermittent trouble when
    used with mount repeatedly. Adding code to explicitly flush the
    device buffers using 'blockdev --flushbufs'.

NOTE(dmllr): Also merged regression fix:

Consolidate the blockdev related filters

        Seeing a "/usr/local/bin/nova-rootwrap: Unauthorized command" error
        in the logs when "blockdev --flushbufs" is executed because of the
        existing blockdev in compute.filters. We need to merge both into a
        single RegExpFilter

Change-Id: Ic323ff00e5c23786a6e376e67a4ad08f92708aef

(cherry-picked from dd3f96e91581f465a52d10a212eb51b147dc38b5)
(cherry-picked from 31b2791e6be1768c410de5fa32283fcb637bda57)

    Closes-Bug: #973413
    Partial-Bug: #1254890
    Change-Id: I2b7053b9a069d6e82f6f6baf9ad480efa4388d91

tags:

added: in-stable-havana

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.