Snapd fails to upgrade package that uses layouts to bind mount parent directory of another mount point

Bug #1831010 reported by James Henstridge on 2019-05-30
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
High
Zygmunt Krynicki

Bug Description

For a snap I'm working on I wanted to mount some data shared via the content interface to a location outside of $SNAP or $SNAP_COMMON, so I made use of layouts to bind mount it elsewhere. This worked as I expected, but I was unable to upgrade the snap due to failures in updating the mount namespace.

Attached is a small snapcraft.yaml that reproduces the problem.

Running the following commands triggers the bug for me using snapd 2.39. The layout-content-test command simply runs a bash shell:

    $ snap install --dangerous layout-content-test_0.1_amd64.snap
    layout-content-test 0.1 installed
    $ layout-content-test
    To run a command as administrator (user "root"), use "sudo <command>".
    See "man sudo_root" for details.

    $ ls $SNAP/content-mount
    Adwaita Arc Breeze HighContrast elementary
    Adwaita-dark Arc-Dark Breeze-Dark Radiance
    Ambiance Arc-Darker Communitheme Yaru
    $ ls $SNAP/layout-mount
    Adwaita Arc Breeze HighContrast elementary
    Adwaita-dark Arc-Dark Breeze-Dark Radiance
    Ambiance Arc-Darker Communitheme Yaru
    $ exit
    $ snap install --dangerous layout-content-test_0.1_amd64.snap
    error: cannot perform the following tasks:
    - Setup snap "layout-content-test" (unset) security profiles (cannot setup mount for snap "layout-content-test": cannot update mount namespace of snap "layout-content-test": cannot update preserved namespace of snap "layout-content-test": cannot update snap namespace: device or resource busy)
    - Setup snap "layout-content-test" (unset) security profiles (cannot update mount namespace of snap "layout-content-test": cannot update preserved namespace of snap "layout-content-test": cannot update snap namespace: device or resource busy)

James Henstridge (jamesh) wrote :
James Henstridge (jamesh) wrote :

Here is a simplified version of the snapcraft.yaml that doesn't involve the fancy mulit-mounts of gtk-common-themes.

Instead, I'm doing a content interface mount to $SNAP/content/mount, then using layouts to bind mount $SNAP/content (the parent dir of the mount) to $SNAP/layout.

When installing the snap, it generates the following /var/lib/snapd/mount/snap.layout-content-test.fstab:

    /snap/layout-content-test/x1/content /snap/layout-content-test/x1/layout none rbind,rw,x-snapd.origin=layout 0 0
    /snap/gnome-3-28-1804/40 /snap/layout-content-test/x1/content/mount none bind,ro 0 0

When I run the snap's app, /run/snapd/ns/snap.layout-content-test.fstab has the following content:

    /snap/gnome-3-28-1804/40 /snap/layout-content-test/x1/content/mount none bind,ro 0 0
    /snap/layout-content-test/x1/content /snap/layout-content-test/x1/layout none rbind,rw,x-snapd.origin=layout 0 0

... and the following in the tail of its mountinfo:

    7793 7697 7:48 / /snap/layout-content-test/x1 ro,nodev,relatime master:1776 - squashfs /dev/loop48 ro
    7794 7793 7:97 / /snap/layout-content-test/x1/content/mount ro,relatime master:1967 - squashfs /dev/loop97 ro
    7795 7793 7:48 /content /snap/layout-content-test/x1/layout ro,nodev,relatime master:1776 - squashfs /dev/loop48 ro
    7796 7795 7:97 / /snap/layout-content-test/x1/layout/mount ro,relatime master:1967 - squashfs /dev/loop97 ro

When I try to install the same snap over the top, I get the error mentioned in the original post.

James Henstridge (jamesh) wrote :

Here's an even simpler version that only involves layout mounts rather than external snaps through content interface.

summary: - Snapd fails to upgrade package that uses layouts to remap a content plug
- mount
+ Snapd fails to upgrade package that uses layouts to bind mount parent
+ directory of another mount point
Zygmunt Krynicki (zyga) on 2019-05-30
Changed in snapd:
assignee: nobody → Zygmunt Krynicki (zyga)
Zygmunt Krynicki (zyga) wrote :

Hey James.

I used your pastebin to craft this small exploration script:

#!/bin/bash
set -x
case "${1:-}" in
 '')
  rm -rf /tmp/bug/
  unshare -m "$0" slave
  ;;
 slave)
  findmnt --poll -o+PROPAGATION --noheadings &
  pid=$!

  mkdir /tmp/bug
  cd /tmp/bug
  mkdir A
  touch A/file
  mkdir B
  mkdir B/dir
  mkdir C

  mount --bind A B/dir
  sleep 1
  mount --make-slave B/dir
  mount --make-shared B/dir
  sleep 1
  mount --rbind B C
  sleep 1
  umount --detach C

  kill $pid
  wait $pid
  bash
  ;;
esac

I'm trying to wrap my head around this now.

Zygmunt Krynicki (zyga) wrote :

With that script's interactive shell one can inspect the system. There are three mount points at play:

1011 914 8:2 /tmp/bug/A /tmp/bug/B/dir rw,relatime shared:509 - ext4 /dev/sda2 rw
1012 914 8:2 /tmp/bug/B /tmp/bug/C rw,relatime - ext4 /dev/sda2 rw
1013 1012 8:2 /tmp/bug/A /tmp/bug/C/dir rw,relatime shared:509 - ext4 /dev/sda2 rw

Attempts to detach C fail. I wonder if this is because C has view of B which has a view of A.

If we first detach B/dir then we can successfully detach C. I'm trying to understand what happens in the kernel to give us the EBUSY return code now.

Zygmunt Krynicki (zyga) wrote :

We can construct tree of mount points, based on their mount-id and parent-id numbers and recursively detach the leaves until we reach the point that we wanted to unmount.

In pseudo python code:

func detach(path, mount_id=None):
   # Get a fresh mountinfo table
   # Due to mount event propagation using a cached copy
   # is probably impossible without correctly re-implementing
   # kernel propagation logic.
   mi = mountinfo()
   # If we don't know the mount_id of the path we want to detach
   # we must scan the mount table in reverse, since the last entry
   # is the most recent one.
   if mount_id is None:
     for mie in reversed(mi.entries):
       if mie.mount_point == path:
          mount_id = mie.mount_id
          break
     else:
       raise Exception(f"{path} is not a mount point")
   # Construct a tree where each mount info entry knows about
   # the children based on parent-id, mount-id association.
   tree = treeify(mi)
   # Find the node we want to detach
   node = tree.find(mount_id=mount_id)
   assert node.mount_point == path
   # Detach each child, those will only be the mount entries
   # that are immediately underneath this mount point.
   for child in node.children():
      detach(child.mount_point, child.mount_id)
   umount2(node.mount_point, MS_DETACH)

Zygmunt Krynicki (zyga) wrote :

Uh, I just realised I passed --detach to mount(8) which happily IGNORES IT WITHOUT ERROR. The user space tool calls this option --lazy, the system call calls it MNT_DETACH. Oh well, back to analysis.

Zygmunt Krynicki (zyga) wrote :

So, MNT_DETACH works as intended but ... we're not using it.

I've applied a simple fix to snapd and confirmed I can refresh the test snap. I will propose a PR shortly.

Changed in snapd:
status: New → In Progress
importance: Undecided → High
Zygmunt Krynicki (zyga) wrote :

This is addressed by the following pull request: https://github.com/snapcore/snapd/pull/6937

Zygmunt Krynicki (zyga) on 2019-05-31
Changed in snapd:
status: In Progress → Fix Committed
milestone: none → 2.40
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers