ubuntu-vm-builder crashes when trying to umount tmp directory.

Bug #879710 reported by Shane Harbour
102
This bug affects 20 people
Affects Status Importance Assigned to Milestone
vm-builder (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Description: Ubuntu 11.10
Release: 11.10

ubuntu-vm-builder:
  Installed: 0.12.4+bzr469-0ubuntu1
  Candidate: 0.12.4+bzr469-0ubuntu1
  Version table:
 *** 0.12.4+bzr469-0ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ oneiric/universe amd64 Packages
        100 /var/lib/dpkg/status

I've tried creating VM 10-12 times in the last 24 hours with different configurations and I've only had it work once.

Command line:
sudo ubuntu-vm-builder kvm ubuntu --suite oneiric --flavour virtual --arch amd64 -o --libvirt qemu:///system --hostname ldaptest --bridge br0 --user shane --pass changeme --addpkg acpid --addpkg openssh-server

Error:
2011-10-21 16:38:48,590 INFO : umount: /tmp/tmpAyJUcY/dev: device is busy.
2011-10-21 16:38:48,590 INFO : (In some cases useful info about processes that use
2011-10-21 16:38:48,591 INFO : the device is found by lsof(8) or fuser(1))
2011-10-21 16:38:48,591 INFO : Cleaning up
2011-10-21 16:38:48,591 ERROR : Process (['umount', '/tmp/tmpAyJUcY/dev']) returned 1. stdout: , stderr: umount: /tmp/tmpAyJUcY/dev: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

Traceback (most recent call last):
  File "/usr/bin/ubuntu-vm-builder", line 24, in <module>
    uvb.main()
  File "/usr/lib/python2.7/dist-packages/VMBuilder/contrib/cli.py", line 216, in main
    distro.build_chroot()
  File "/usr/lib/python2.7/dist-packages/VMBuilder/distro.py", line 84, in build_chroot
    self.call_hooks('configure_os')
  File "/usr/lib/python2.7/dist-packages/VMBuilder/distro.py", line 67, in call_hooks
    call_hooks(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/VMBuilder/util.py", line 165, in call_hooks
    getattr(context, func, log_no_such_method)(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/VMBuilder/plugins/ubuntu/distro.py", line 155, in configure_os
    self.suite.unmount_dev()
  File "/usr/lib/python2.7/dist-packages/VMBuilder/plugins/ubuntu/dapper.py", line 118, in unmount_dev
    run_cmd('umount', '%s/dev' % self.context.chroot_dir)
  File "/usr/lib/python2.7/dist-packages/VMBuilder/util.py", line 120, in run_cmd
    raise VMBuilderException, "Process (%s) returned %d. stdout: %s, stderr: %s" % (args.__repr__(), status, mystdout.buf, mystderr.buf)
VMBuilder.exception.VMBuilderException: Process (['umount', '/tmp/tmpAyJUcY/dev']) returned 1. stdout: , stderr: umount: /tmp/tmpAyJUcY/dev: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

ProblemType: Bug
DistroRelease: Ubuntu 11.10
Package: ubuntu-vm-builder 0.12.4+bzr469-0ubuntu1
ProcVersionSignature: Ubuntu 3.0.0-12.20-generic 3.0.4
Uname: Linux 3.0.0-12-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 1.23-0ubuntu3
Architecture: amd64
Date: Fri Oct 21 16:39:08 2011
InstallationMedia: Xubuntu 11.10 "Oneiric Ocelot" - Release amd64 (20111011)
PackageArchitecture: all
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: vm-builder
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Shane Harbour (shaneharbour) wrote :
Revision history for this message
Dave Walker (davewalker) wrote :

Confirmed, I was able to reproduce this error.

Changed in vm-builder (Ubuntu):
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
heechul yun (heechul) wrote :

A simple workaround is simply add some sleep as follows.

    def unmount_dev(self):
        self.context.cancel_cleanup(self.unmount_dev)
        run_cmd('sleep', '10'); // <--- add this line.
        run_cmd('umount', '%s/dev' % self.context.chroot_dir)

Above

Revision history for this message
Shane Harbour (shaneharbour) wrote :

Added the fix and tested. Works great!

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Whoever is able to reproduce this, to figure out what is keeping the device pinned, could you replace the
    run_cmd('sleep', '10');
above with
   run_cmd('ps', '-ef');
   run_cmd('lsof', '%s/dev')
   run_cmd('mount')

and post the results here?

Revision history for this message
alexis bellido (vz-alexis) wrote :
Download full text (5.6 KiB)

Hey Serge, I can reproduce this and I'm using the same versions mentioned in the bug description.

I added the three lines you suggested to /usr/lib/python2.7/dist-packages/VMBuilder/plugins/ubuntu/dapper.py and this is what I got:

vmbuilder kvm ubuntu --suite=oneiric --flavour=virtual --arch=i386 -o --libvirt=qemu:///system --ip=192.168.0.181 --gw=192.168.0.1 --part=vmbuilder.partition --templates=mytemplates --user=alexis --name=Alexis --pass=changeme --addpkg=vim-nox --addpkg=unattended-upgrades --addpkg=acpid --firstboot=/var/lib/libvirt/images/vm1/boot.sh --mem=256 --hostname=vm1 --bridge=br0
2011-12-03 22:10:10,852 INFO : Calling hook: preflight_check
2011-12-03 22:10:10,872 INFO : Calling hook: set_defaults
2011-12-03 22:10:10,873 INFO : Calling hook: bootstrap
2011-12-03 22:25:20,818 INFO : Calling hook: configure_os
2011-12-03 22:31:23,887 INFO : invoke-rc.d: policy-rc.d denied execution of start.
2011-12-03 22:31:25,977 INFO : update-rc.d: warning: unattended-upgrades start runlevel arguments (2 3 4 5) do not match LSB Default-Start values (none)
2011-12-03 22:31:25,977 INFO : update-rc.d: warning: unattended-upgrades stop runlevel arguments (0 1 6) do not match LSB Default-Stop values (0 6)
2011-12-03 22:31:25,990 INFO : invoke-rc.d: policy-rc.d denied execution of start.
2011-12-03 22:31:30,825 INFO :
2011-12-03 22:31:30,829 INFO : Current default time zone: 'Etc/UTC'
2011-12-03 22:31:30,829 INFO : Local time is now: Sun Dec 4 00:31:30 UTC 2011.
2011-12-03 22:31:30,829 INFO : Universal Time is now: Sun Dec 4 00:31:30 UTC 2011.
2011-12-03 22:31:30,829 INFO :
2011-12-03 22:31:49,387 INFO :
2011-12-03 22:31:49,388 INFO : Current default time zone: 'Etc/UTC'
2011-12-03 22:31:49,390 INFO : Local time is now: Sun Dec 4 00:31:49 UTC 2011.
2011-12-03 22:31:49,390 INFO : Universal Time is now: Sun Dec 4 00:31:49 UTC 2011.
2011-12-03 22:31:49,390 INFO : Run 'dpkg-reconfigure tzdata' if you wish to change it.
2011-12-03 22:31:49,391 INFO :
2011-12-03 22:31:58,682 INFO : lsof: WARNING: can't stat() fuse.gvfs-fuse-daemon file system /home/alexis/.gvfs
2011-12-03 22:31:58,682 INFO : Output information may be incomplete.
2011-12-03 22:31:58,682 INFO : lsof: status error on %s/dev: No such file or directory
2011-12-03 22:31:58,682 INFO : lsof 4.81
2011-12-03 22:31:58,682 INFO : latest revision: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/
2011-12-03 22:31:58,683 INFO : latest FAQ: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/FAQ
2011-12-03 22:31:58,683 INFO : latest man page: ftp://lsof.itap.purdue.edu/pub/tools/unix/lsof/lsof_man
2011-12-03 22:31:58,683 INFO : usage: [-?abhlnNoOPRtUvVX] [+|-c c] [+|-d s] [+D D] [+|-f[gG]]
2011-12-03 22:31:58,683 INFO : [-F [f]] [-g [s]] [-i [i]] [+|-L [l]] [+m [m]] [+|-M] [-o [o]] [-p s]
2011-12-03 22:31:58,683 INFO : [+|-r [t]] [-s [p:s]] [-S [t]] [-T [t]] [-u s] [+|-w] [-x [fl]] [--] [names]
2011-12-03 22:31:58,683 INFO : Use the ``-h'' option to get more help information.
2011-12-03 22:31:58,683 INFO : Cleaning up
2011-12-03 22:31:58,684 ERROR : Process (['lsof', '%s/dev']) retur...

Read more...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Gah! Thanks. Unfortunately my snipped was bad. That was supposed to read:

   run_cmd('ps', '-ef');
   run_cmd('lsof', '%s/dev' % self.context.chroot_dir)
   run_cmd('mount')

Revision history for this message
alexis bellido (vz-alexis) wrote :

Ouch. I replaced the code with the new lines and ran vmbuilder twice. "Unfortunately" everything worked fine this way and I couldn't reproduce the problem again (obviosuly I removed the run_cmd('sleep', '10') fix too).

I experienced something similar with another server, at first vmbuilder crashed but after making some other changes (like apt-get installing something else, totally unrelated) vmbuilder was able to complete its job and never saw the problem again in that server.

I'm wondering if this could be related to getting stuck in some hard disk sector but that doesn't make much sense. The other server I'm referring to is a 4-years-old Core 2 Duo laptop and my current server is a brand new Core i5 desktop, so hardware seems totally unrelated.

I'll let you know if I can reproduce the problem again, sorry I can't help more for now.

Revision history for this message
Serge Hallyn (serge-hallyn) wrote : Re: [Bug 879710] Re: ubuntu-vm-builder crashes when trying to umount tmp directory.

Hm, thanks. It's also possible that the race is just so tight that the
1-2 seconds to run lsof and mount allowed it to proceed.

Revision history for this message
Nobuto Murata (nobuto) wrote :

I will attach the debug log vmbuilder with --addpkg ubuntu-desktop and debug lines:
  run_cmd('lsof', '%s/proc' % self.context.chroot_dir)
  run_cmd('lsof', '%s/dev/pts' % self.context.chroot_dir)
  run_cmd('lsof', '%s/dev' % self.context.chroot_dir)

With run_cmd('ps', '-ef') or without --addpkg ubuntu-desktop, I cannot reproduce it.

Revision history for this message
Casey Hilliard (rchilliard) wrote :

I just wanted to bump this issue as I've been experiencing it without fail for the last two days and the posted workaround has not resolved the issue. Currently running 11.10 x64. My command line has been:

sudo vmbuilder kvm ubuntu -v --debug --suite=lucid --flavour=virtual --rootsize=4096 --swapsize=1024 --mem=388 --arch=i386 --user=system --pass=changemenow --components=main,universe,restricted --addpkg=acpid --addpkg=openssh-server -o --libvirt qemu:///system

The error noted before crashing out has consistently been:

2012-03-16 13:59:53,564 ERROR : Process (['umount', '/tmp/tmptaOMHF/dev']) returned 1. stdout: , stderr: umount: /tmp/tmptaOMHF/dev: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

Traceback (most recent call last):
  File "/usr/bin/vmbuilder", line 24, in <module>
    cli.main()
  File "/usr/lib/python2.7/dist-packages/VMBuilder/contrib/cli.py", line 216, in main
    distro.build_chroot()
  File "/usr/lib/python2.7/dist-packages/VMBuilder/distro.py", line 84, in build_chroot
    self.call_hooks('configure_os')
  File "/usr/lib/python2.7/dist-packages/VMBuilder/distro.py", line 67, in call_hooks
    call_hooks(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/VMBuilder/util.py", line 165, in call_hooks
    getattr(context, func, log_no_such_method)(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/VMBuilder/plugins/ubuntu/distro.py", line 155, in configure_os
    self.suite.unmount_dev()
  File "/usr/lib/python2.7/dist-packages/VMBuilder/plugins/ubuntu/dapper.py", line 125, in unmount_dev
    run_cmd('umount', '%s/dev' % self.context.chroot_dir)
  File "/usr/lib/python2.7/dist-packages/VMBuilder/util.py", line 120, in run_cmd
    raise VMBuilderException, "Process (%s) returned %d. stdout: %s, stderr: %s" % (args.__repr__(), status, mystdout.buf, mystderr.buf)
VMBuilder.exception.VMBuilderException: Process (['umount', '/tmp/tmptaOMHF/dev']) returned 1. stdout: , stderr: umount: /tmp/tmptaOMHF/dev: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

Prior to the crash, mount indicates indeed, the file (/tmp/tmptaOMHF/dev in this case) is still mounted, the file is also notable in /etc/mtab post-crash.

I've noted that in the lsof results inserted just prior to the mount call, there appear to be three cron processes still accessing /dev/null within the mounted environment (i.e. /tmp/tmptaOMHF/dev/null). I'm attaching a log which contains the results output from the inserted:
"run_cmd('lsof', '%s/dev' % self.context.chroot_dir)" line onward until vmbuilder exits, crashing.

I have successfully built several VMs on this particular machine in the past, using the exact same command, so I was a bit surprised to be running into this issue now, any ideas?

Revision history for this message
Casey Hilliard (rchilliard) wrote :

Missed the attach, should be here.

Revision history for this message
Casey Hilliard (rchilliard) wrote :

Wanted to add a note -- Resolved my own recurring issue by switching from lucid to essentially any other release. Can confirm that both maverick and oneiric work for me, having changed only the release in the call to vmbuilder.
.i.e:

sudo vmbuilder kvm ubuntu -v --debug --suite=maverick --flavour=virtual --rootsize=4096 --swapsize=1024 --mem=388 --arch=i386 --user=system --pass=changemenow --components=main,universe,restricted --addpkg=acpid --addpkg=openssh-server -o --libvirt qemu:///system

works, where

sudo vmbuilder kvm ubuntu -v --debug --suite=lucid --flavour=virtual --rootsize=4096 --swapsize=1024 --mem=388 --arch=i386 --user=system --pass=changemenow --components=main,universe,restricted --addpkg=acpid --addpkg=openssh-server -o --libvirt qemu:///system

fails

Considering that I had the command line string working in early Feb (2012-02-03) and that Lucid . 4 (10.04.4) was released ~ Feb 16, I'd have to guess that changes involved therein are leading to issues with vmbuilder?

Revision history for this message
amcs (isobaric2x) wrote :

I see this every time on one KVM host machine and never on my other KVM
host machine.

Good one is:

   Fully patched 11.10 Server (no desktop) acting as KVM host using
vmbuilder to create guest instances of 11.10 server.

Bad one is:

   Fully patched 11.10 Workstation with Virtualization packages added
executing the exact same command line.

I created a shell script to contain my vmbuilder command line and copied
it to both host machines to be sure that I was performing identical
activities on each.

I did dig through the Python traceback enough to locate the line
throwing the exception.
Adding a simple "sleep" for 5 seconds makes it work every time.
I tried to add the suggested debug statements regarding the
determination of who is holding the mount point busy.
But, for the life of me, I could not figure out where the log file went
to -- where is the output of the debug being written?

I'm willing to investigate more if it's of service to anyone.

On 03/16/2012 02:31 PM, Casey Hilliard wrote:
> Wanted to add a note -- Resolved my own recurring issue by switching from lucid to essentially any other release. Can confirm that both maverick and oneiric work for me, having changed only the release in the call to vmbuilder.
> .i.e:
>
> sudo vmbuilder kvm ubuntu -v --debug --suite=maverick --flavour=virtual
> --rootsize=4096 --swapsize=1024 --mem=388 --arch=i386 --user=system
> --pass=changemenow --components=main,universe,restricted --addpkg=acpid
> --addpkg=openssh-server -o --libvirt qemu:///system
>
> works, where
>
> sudo vmbuilder kvm ubuntu -v --debug --suite=lucid --flavour=virtual
> --rootsize=4096 --swapsize=1024 --mem=388 --arch=i386 --user=system
> --pass=changemenow --components=main,universe,restricted --addpkg=acpid
> --addpkg=openssh-server -o --libvirt qemu:///system
>
> fails
>
> Considering that I had the command line string working in early Feb
> (2012-02-03) and that Lucid . 4 (10.04.4) was released ~ Feb 16, I'd
> have to guess that changes involved therein are leading to issues with
> vmbuilder?
>

Revision history for this message
Beau Croteau (crobe03) wrote :

I added this corresponding line:

   def unmount_dev(self):
        self.context.cancel_cleanup(self.unmount_dev)
        run_cmd('chroot', '%s' % self.context.chroot_dir, '/etc/init.d/cron','stop') // <-- HERE
        run_cmd('umount', '%s/dev' % self.context.chroot_dir)

That seems to be a temporary workaround for the problem I was seeing (same as @rchilliard -- lsof showed a cron open file to /dev/null).

I also had this working up until several days ago without anything else changing and then it suddenly broke.

Revision history for this message
Sylvester Rajasekaran (sylvesterson) wrote :

Hi

Based on some of your comments I tried to find what was the issue with the "umount"
The below command helped me to get the information who is holding the lock so based on Serge Hallyn (serge-hallyn) request the error shows as follows, (lsof didn't help me much here so I used fuser)

-----------------------------------------------------------------------------------------------------------------
# fuser -f -v /tmp/tmp4WlrLItmpfs

                                           USER PID ACCESS COMMAND
/tmp/tmp4WlrLItmpfs: root kernel mount /tmp/tmp4WlrLItmpfs
-----------------------------------------------------------------------------------------------------------------

Based on the above information where I can not kill the PID becuase "kernel" is holding the lock.

So what would be my option here?

Thanks

Revision history for this message
lhotari (lartsa) wrote :

This bug duplicates https://bugs.launchpad.net/vmbuilder/+bug/966439 . The workaround is to add "--removepkg=cron" to the vmbuilder command line.

Revision history for this message
martin suchanek (martin-suc) wrote :

Well, that workaround did not help me.

Revision history for this message
Alex Gottschalk (agottschalk) wrote :

I just wanted to bump this issue because we're now experiencing it in an automated deployment system, and it's been open almost a year and half with no resolution.

Revision history for this message
martin suchanek (martin-suc) wrote :

I have even tried couple of days ago ... /usr/share/pyshared/VMBuilder/util.py
umount_cmd = ["umount", "-l", "-t", "tmpfs", mount_point ]
just tried added 'lazy umount option' - but no luck.

Revision history for this message
Stefan Puiu (stefanpuiuro) wrote :
Download full text (3.8 KiB)

I'm seeing a bug that looks very similar in Ubuntu 14.04 (vmbuilder 0.12.4). In my case, umount is called from disk.py.

2017-02-09 04:15:57,625 INFO : Unmounting target filesystem
2017-02-09 04:15:57,645 INFO : umount: /tmp/tmpGsaKCo: device is busy.
2017-02-09 04:15:57,645 INFO : (In some cases useful info about processes that use
2017-02-09 04:15:57,645 INFO : the device is found by lsof(8) or fuser(1))
2017-02-09 04:15:57,645 INFO : Cleaning up
2017-02-09 04:16:00,721 INFO : device-mapper: remove ioctl on loop0p1 failed: Device or resource busy
2017-02-09 04:16:03,732 INFO : device-mapper: remove ioctl on loop0p1 failed: Device or resource busy
2017-02-09 04:16:06,741 INFO : device-mapper: remove ioctl on loop0p1 failed: Device or resource busy
2017-02-09 04:16:09,745 INFO : Could not unmap '/tmp/tmpTb5h7u' after '3' attempts. Final attempt
2017-02-09 04:16:09,798 ERROR : Process (['umount', '/tmp/tmpGsaKCo/']) returned 1. stdout: , stderr: umount: /tmp/tmpGsaKCo: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

Traceback (most recent call last):
  File "/usr/bin/vmbuilder", line 24, in <module>
    cli.main()
  File "/usr/lib/python2.7/dist-packages/VMBuilder/contrib/cli.py", line 228, in main
    hypervisor.install_os()
  File "/usr/lib/python2.7/dist-packages/VMBuilder/hypervisor.py", line 72, in install_os
    self.call_hooks('unmount_partitions')
  File "/usr/lib/python2.7/dist-packages/VMBuilder/distro.py", line 67, in call_hooks
    call_hooks(self, *args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/VMBuilder/util.py", line 170, in call_hooks
    getattr(context, func)(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/VMBuilder/hypervisor.py", line 103, in unmount_partitions
    fs.umount()
  File "/usr/lib/python2.7/dist-packages/VMBuilder/disk.py", line 409, in umount
    run_cmd('umount', self.mntpath)
  File "/usr/lib/python2.7/dist-packages/VMBuilder/util.py", line 120, in run_cmd
    raise VMBuilderException, "Process (%s) returned %d. stdout: %s, stderr: %s" % (args.__repr__(), status, mystdout.buf, mystderr.buf)
VMBuilder.exception.VMBuilderException: Process (['umount', '/tmp/tmpGsaKCo/']) returned 1. stdout: , stderr: umount: /tmp/tmpGsaKCo: device is busy.
        (In some cases useful info about processes that use
         the device is found by lsof(8) or fuser(1))

Even after this run got stuck, I noticed there were users of the dir still left:

$ sudo lsof +D /tmp/tmpGsaKCo/
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/112/gvfs
      Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
dirmngr 5223 usbmux cwd DIR 7,1 4096 2 /tmp/tmpGsaKCo
dirmngr 5223 usbmux rtd DIR 7,1 4096 2 /tmp/tmpGsaKCo
dirmngr 5223 usbmux txt REG 7,1 227608 658461 /tmp/tmpGsaKCo/usr/bin/dirmngr
dirmngr 5223 usbmux mem REG 7,1 1840928 527857 /tmp/tmpGsaKCo/lib/x86_64-linux-gnu/libc-2.19.so
dirmngr 5223 usbmux mem REG 7,1 76408 790007 /tmp/tmpGsaKCo/usr/lib/x86_64-linux-gnu/li...

Read more...

Revision history for this message
Stefan Puiu (stefanpuiuro) wrote :

This happens while building a VMWare image of trusty (14.04) (vmbuilder vmw6 ubuntu --suite=trusty). This happened repeatedly in the last 2 days.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers