yakkety: behavior change in `qemu-nbd -c $DEV $FILENAME`: doesn't automatically create partion devices

Bug #1630341 reported by Jason Gerard DeRose
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
qemu (Debian)
Fix Released
Unknown
qemu (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

On Ubuntu 12.10 through Ubuntu 16.04, something like this worked with 100% reliability in my experience:

sudo modprobe nbd
sudo qemu-nbd -c /dev/nbd0 /my/vm/image.qcow2
sudo mount /dev/nbd0p1 /mnt

But on Yakkety, this *almost* always fails because the /dev/nbd0p1 device doesn't exist by the time the `sudo mount /dev/nbd0p1 /mnt` command is run.

I found an existing Debian bug about this:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=824553

Unfortunately, I feel the bug reporter got the brush off there, but I can confirm that `qemu-nbd` previously worked exactly as the bug reporter claims (on Ubuntu anyway, not sure about Debian). I have automated tooling that has done this exact thing many thousands of times on Ubuntu 12.10 through 16.04 without issue. Although in defense of the Debian maintainer, my hunch is this behavior change likely isn't the result of changes in the `qemu-img` package itself.

A curiosity of note is that although this usually fails on Yakkety, it did work for me a few times (in my testing thus far, I'd say it succeeds less than 5% of the time). I haven't yet found a way to reproduce these successful cases, nor have I yet spotted any pattern in the cases in which it works vs. cases in which it doesn't work.

On Yakkety I can *mostly* work-around this by calling `sudo partprobe /dev/nbd0` prior to trying to mount the device:

sudo modprobe nbd
sudo qemu-nbd -c /dev/nbd0 /my/vm/image.qcow2
sudo partprobe /dev/nbd0
sudo mount /dev/nbd0p1 /mnt

At first I thought this was a 100% reliable solution on Yakkety, but it still sometimes fails. I haven't had as much time testing this yet, but I'd say it probably succeeds 80% of the time or so, fails the remaining 20%.

I hate to say it, but on the surface this feels like a systemd bug as I've encountered systemd + qemu-nbd problems in the past:

https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1435428

Plus whenever I hear of "weird problems mounting or unmounting devices" plus "non-determinism"... systemd comes to mind :P

So my hunch is this isn't actually a `qemu-img` bug. My best guess is systemd, but it could also be the Linux 4.4 vs 4.8 kernel or something else I haven't thought of. Still, without stronger evidence, to me it seems `qemu-img` is the best package to initially file this bug against.

Also, I'm not yet sure how best to debug this, but the most helpful logging I've found thus far is in /var/log/syslog.

Thanks!

ProblemType: Bug
DistroRelease: Ubuntu 16.10
Package: qemu-utils 1:2.6.1+dfsg-0ubuntu5
ProcVersionSignature: Ubuntu 4.8.0-17.19-generic 4.8.0-rc7
Uname: Linux 4.8.0-17-generic x86_64
NonfreeKernelModules: nvidia_uvm nvidia_drm nvidia_modeset nvidia
ApportVersion: 2.20.3-0ubuntu7
Architecture: amd64
CurrentDesktop: Unity:Unity7
Date: Tue Oct 4 10:44:58 2016
KvmCmdLine:
 COMMAND STAT EUID RUID PID PPID %CPU COMMAND
 kvm-irqfd-clean S< 0 0 18420 2 0.0 [kvm-irqfd-clean]
MachineType: System76, Inc. Serval WS
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.8.0-17-generic.efi.signed root=UUID=4b57bce3-eed0-426b-adfe-4f7a834e2ddf ro net.ifnames=0 quiet splash vt.handoff=7
SourcePackage: qemu
UpgradeStatus: Upgraded to yakkety on 2016-10-03 (0 days ago)
dmi.bios.date: 06/09/2015
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.03.03RSY2
dmi.board.asset.tag: Tag 12345
dmi.board.name: Serval WS
dmi.board.vendor: System76, Inc.
dmi.board.version: serw8-17g
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 9
dmi.chassis.vendor: System76, Inc.
dmi.chassis.version: Serval WS
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.03.03RSY2:bd06/09/2015:svnSystem76,Inc.:pnServalWS:pvrserw8-17g:rvnSystem76,Inc.:rnServalWS:rvrserw8-17g:cvnSystem76,Inc.:ct9:cvrServalWS:
dmi.product.name: Serval WS
dmi.product.version: serw8-17g
dmi.sys.vendor: System76, Inc.

Revision history for this message
Jason Gerard DeRose (jderose) wrote :
Changed in qemu (Debian):
status: Unknown → Fix Released
Revision history for this message
Robie Basak (racb) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better.

I don't think this is a bug. qemu-nbd's job is to create /dev/nbd0, which it is doing. It is the kernel's job to create /dev/nbd0p1 in your case, which it is doing. There is nothing to say that it must do this before you expect it to exist in your mount call. Similarly, partprobe is not defined to block until the kernel has finished re-reading devices. It only requests that the kernel start doing so.

If you need to block until /dev/nbd0p1 exists, you will need to wait for it to appear yourself, or arrange to listen to the kernel's defined interfaces for notification of it appearing, or just poll.

You might be interested in the mount-image-callback command from the cloud-image-utils package, which does something very similar to what you are doing. That would be the correct place to have handling for this kind of blocking if it doesn't do it correctly already.

Or you could hook into udev, or use something like inotifywait to wait for /dev/nbd0p1 to appear.

Since I don't think this is a valid bug, I'm marking this as Invalid. But I'll subscribe to this bug and welcome further discussion.

Changed in qemu (Ubuntu):
status: New → Invalid
Revision history for this message
Jason Gerard DeRose (jderose) wrote :

Thanks for the feedback, Robie! I believe the underlying problem was this kernel bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1628336

The /dev/nbd0p1, etc, partition devices were never getting creating automatically after connecting the device, no matter how long you waited. So you had to call partprobe, whereas on Xenial and older you didn't.

And you bring up some good points about my scripts being fragile unless I do some sort of polling or using inotify (although in practice I haven't had problems with this).

But I agree this shouldn't really be treated as a bug, should be marked invalid.

Thanks again!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.