Bug #1680997 “Container file system corruption on libvirtd resta...” : Bugs : libvirt package : Ubuntu

Eugen Rieck (w-eugen) on 2017-04-08

affects:

udev (Ubuntu) → libvirt (Ubuntu)

Joshua Powers (powersj) on 2017-04-10

Changed in libvirt (Ubuntu):
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Serge Hallyn (serge-hallyn) wrote on 2017-04-10:

#1

Hi,

this would be deemed a high priority bug for upstream libvirt, but Ubuntu has always, back to 2010, supported lxc, then lxd, instead of libvirt-lxc. (So it's not that libvirt-lxc is deprecated, rather it was never supported in Ubuntu)

Which version of Ubuntu are you using?

Can you reliably reproduce it? If you can give a recipe for "start with a clean ubuntu cloud VM image; set up a container "like this", do that, then it dies", then we may be able to nail down the cause and/or talk to upstream.

If at all possible to migrate you to using lxd containers, that would be ideal, but I assume you have control software written around libvirt's api making that untenable.

Revision history for this message

Eugen Rieck (w-eugen) wrote on 2017-04-10:

#2

The steps outlined in the initial bug report reliably (100%) reproduce the problem for me on Ubuntu 16.04, it is tested in different Environments (1xAMD, ca. 10xIntel).
Here's the short way to get there:

- Install a basic Ubuntu 16.04 Server
- apt-get install virt-manager (installing the GUI pulls in the heavy lifting components)
- create a libvirt/lxc container of something like
<domain type='lxc'>
  <name>AnyName</name>
  <memory unit='KiB'>2097152</memory>
  <currentMemory unit='KiB'>2097152</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64'>exe</type>
    <init>/sbin/init</init>
  </os>
  <features>
    <privnet/>
  </features>
  <clock offset='utc'/>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <devices>
    <filesystem type='file' accessmode='passthrough'>
      <driver type='loop' format='raw'/>
      <source file='/path/to/image.raw'/>
      <target dir='/'/>
    </filesystem>
    <interface type='bridge'>
      <mac address='00:16:3e:34:ea:4b'/>
      <source bridge='br1'/>
      <target dev='vnet2'/>
      <guest dev='eth0'/>
    </interface>
    <console type='pty' tty='/dev/pts/3'>
      <source path='/dev/pts/3'/>
      <target type='lxc' port='0'/>
      <alias name='console0'/>
    </console>
    <hostdev mode='capabilities' type='misc'>
      <source>
        <char>/dev/net/tun</char>
      </source>
    </hostdev>
  </devices>
</domain>

(I have experimented quite a lot, and it boils down to the loop-mounted file system)

- Start the container via virsh or virt-manager
- Restart libvirtd
- Examine state of the container in virsh or virt-manager vs. the state of the loop device via losetup

The important parts are:
- The container is shown as stopped
- The container dosen't reply to network requests or console connection requests (i.e. it seems truly dead)
- The loop device doesn't show up in host-side "mount | grep loop"

- libvirtd allows to (re-)start the container, ending up with a double-mounted file system

Migrating to lxd is not feasable in many environments, in addition to that i am totally aware (and not critisizing!), that libvirt-lxc was/is unsupported. For me the real bug is, that this scenario is possible: If Ubuntu were to just exclude libvirt's lxc driver, that would be not really fine, but at least fool-proof.

The blocker to lxd adoption is not on the admin side (me), but on the end user side: Virt-manager is the favorite toy for SMB/NGO local admins, typically run via XQuartz on a Mac or XMing on Windows.

Please let me know, if and when I can be of further help - I am willing to test and have quite a few testbeds at hand, where I can easily create throw-away containers and ruin them. Since I tripped over this, I migrated around to have one node running no containers at every single customer, just to do exactly that.