curtain fails with "Device is busy" error during unmount
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
curtin |
Fix Released
|
Medium
|
Unassigned | ||
curtin (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Trusty |
Fix Released
|
Medium
|
Unassigned | ||
Vivid |
Fix Released
|
Medium
|
Unassigned |
Bug Description
=== Begin SRU Template ===
[Description]
A race condition can occur when invoking grub-install inside the target environment.
The failure case shows logs like this:
| Installing for i386-pc platform.
| Installation finished. No error reported.
| umount: /tmp/tmpM4R1dI/
| (In some cases useful info about processes that use
| the device is found by lsof(8) or fuser(1))
| Unexpected error while running command.
| Command: ['umount', '/tmp/tmpM4R1dI
| Exit code: 1
| Reason: -
| Stdout: ''
| Stderr: ''
| Installation failed with exception: Unexpected error while running command.
| Command: ['curtin', 'curthooks']
This is believed to be because some process (likely spawned by udev) has open filehandles on /dev when curtin went to clean up the target mounts.
The solution is to run 'udevadm settle' before unmounting '/dev/' from the target.
[Impact]
The impact is transient failure to install. This race condition is very rarely seen on hardware, but was somewhat easily reproduced in a heavily loaded vmware environment.
[Test Case]
In the original bug-opener's environment it fails fairly reliably under heavy host load using vmware. He would do a deploy to several guests on the same host at the same time and this would reproduce. Unfortunately I was unable to come up with a test case in a less complex environment.
[Regression Potential]
Regression potential should be very low here. The most likely fallout is just additional time for the install as a result of running 'udevadm settle'. A system that did not exhibit this bug will install a small fraction of a second slower.
$ sudo bash -c 'time for x in "$@"; do udevadm settle; done' -- $(seq 1 100)
real 0m0.214s
user 0m0.012s
sys 0m0.008s
As shown above, that is likely to be on the order of 1/100th of a second.
=== End SRU Template ===
Here's the relevant part of the curtin output:
Installing for i386-pc platform.
Installation finished. No error reported.
umount: /tmp/tmpM4R1dI/
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))
Unexpected error while running command.
Command: ['umount', '/tmp/tmpM4R1dI
Exit code: 1
Reason: -
Stdout: ''
Stderr: ''
Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'curthooks']
I spoke with Scott Moser on IRC about this, and he suggested the attached patch, which fixed the problem 100%.
I checked the installation output of a few of my MAAS nodes, and didn't see any lsof output. So I assume the "udevadm settle" command is the fix.
(Thanks Scott! I hope we can land this in time for MAAS 1.8.)
Related branches
- curtin developers: Pending requested
-
Diff: 10 lines (+1/-0)1 file modifiedcurtin/util.py (+1/-0)
affects: | ubuntu (Ubuntu) → curtin (Ubuntu) |
Changed in curtin (Ubuntu): | |
importance: | Undecided → Medium |
status: | New → Confirmed |
Changed in curtin (Ubuntu Trusty): | |
status: | New → Confirmed |
Changed in curtin (Ubuntu Vivid): | |
status: | New → Confirmed |
Changed in curtin (Ubuntu Trusty): | |
importance: | Undecided → Medium |
Changed in curtin (Ubuntu Vivid): | |
importance: | Undecided → Medium |
description: | updated |
Changed in curtin (Ubuntu): | |
status: | Confirmed → Fix Released |
tags: |
added: verification-done removed: verification-needed |
Changed in curtin: | |
importance: | Undecided → Medium |
I was unable to reproduce this bug in testing, so I'm unfortunately not 100% certain the fix is doing more than effectively a 'sleep'. cloud-images. ubuntu. com/daily/ server/ trusty/ current/ trusty- server- cloudimg- amd64-root. tar.gz server- cloudimg- amd64-root. tar.gz
Heres what I did:
wget http://
mkdir x
tar -C x -Sxpzf trusty-
$ cat go.py
#!/usr/bin/python
from curtin import util
import sys, os
runs = int(os. environ. get("RUNS" , "10")) Target( target= target, allow_daemons= False):
util.subp( ['chroot' , target] + cmd)
target = sys.argv[1]
cmd = sys.argv[2:]
print("target: %s" % target)
print("cmd: %s" % ' '.join(cmd))
for run in range(0, runs):
print("run: %s" % run)
with util.Chrootable
# install grub-pc in the target
$ sudo RUNS=1 PYTHONPATH=$PWD ./go.py x apt-get install grub-pc
# set up /dev/vdb (kvm / openstack guest) with partition
$ sudo wipefs --all /dev/vdb; echo "2048," | sudo sfdisk --force --unit=S /dev/vdb
$ sudo RUNS=100 PYTHONPATH=$PWD ./go.py x grub-install /dev/vdb
I was unable to see that fail, which is essentially what i believe was happening in Mike's failure case.
I also tried with trusty OS + trusty root.tar.gz and vivid OS + vivid root.tar.gz.
That said, udevadm monitor *does* show udev events, and so its quite possible that there was something responding to that event.
Worst case, this is a 'sleep'. Best case its a fix.