Snapd `cannot update snap namespace` when connecting / disconnecting interfaces

Bug #1871189 reported by Joseph Borg on 2020-04-06
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
High
Zygmunt Krynicki

Bug Description

When trying to connect interfaces:

```
error: cannot perform the following tasks:
- Connect microk8s:docker-privileged to snapd:docker-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:docker-support to snapd:docker-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:kubernetes-support to snapd:kubernetes-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:k8s-kubelet to snapd:kubernetes-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:k8s-kubeproxy to snapd:kubernetes-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:dot-kube to snapd:personal-files (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:network-control to snapd:network-control (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:network-observe to snapd:network-observe (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:firewall-control to snapd:firewall-control (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:process-control to snapd:process-control (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:kernel-module-observe to snapd:kernel-module-observe (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:kernel-module-control to snapd:kernel-module-control (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:mount-observe to snapd:mount-observe (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:hardware-observe to snapd:hardware-observe (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:system-observe to snapd:system-observe (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:k8s-journald to snapd:kubernetes-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
```

Then, when trying to remove the snap / disconnect interfaces:

```
- Disconnect microk8s:network from snapd:network (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: device or resource busy)
```

Changed in snapd:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Ian Johnson (anonymouse67)
Ian Johnson (anonymouse67) wrote :

So I tried to make a minimal reproducer for this snap, but the minimal reproducer I have starts working again when I enable robust-mount-namespace-updates unfortunately, however even with robust mount namespaces on, I can still reproduce this with the original snap.

Zygmunt, I'm assigning this to you and I can provide you with the full snap tomorrow, there's probably a smaller reproducer but I wasn't able to build one, I think part of it might be that there are services in the snap that are actually using some files that are part of the layout.

Changed in snapd:
assignee: Ian Johnson (anonymouse67) → Zygmunt Krynicki (zyga)
Changed in snapd:
importance: Medium → High
Zygmunt Krynicki (zyga) wrote :

Can you please provide information on how to reproduce this.

I looked at various channels but I was unable to find a version with the interfaces mentioned here.

Changed in snapd:
status: Triaged → Incomplete
Joseph Borg (joeborg) wrote :

Hey Zyga, sure.

1) Download this snap: https://github.com/ubuntu/microk8s/actions/runs/69256416
    For reference the snapcraft.yaml is: https://github.com/ubuntu/microk8s/blob/feature/jdb/strict/snap/snapcraft.yaml

2) Bring snapd, core, core18 to edge.
    sudo snap install snapd --edge
    sudo snap refresh core --edge
    sudo snap install core18 --edge

3) Install the snap
    sudo snap install ./microk8s.snap --dangerous

4) Connect the interfaces (this fails):
    for i in docker-privileged docker-support kubernetes-support k8s-kubelet k8s-kubeproxy dot-kube network network-bind network-control network-observe firewall-control process-control kernel-module-observe kernel-module-control mount-observe hardware-observe system-observe home opengl k8s-journald; do sudo snap connect microk8s:$i; done

5) Try to remove the sanp (this fails):
    sudo snap remove microk8s

6) Ask for Markdown in launchpad :)

Let me know if I can help.

Zygmunt Krynicki (zyga) wrote :

Thank you for the details. I will debug this tomorrow.

I can relate to 6. Can we please somehow get Markdown? :-)

Zygmunt Krynicki (zyga) on 2020-04-11
Changed in snapd:
status: Incomplete → In Progress
Zygmunt Krynicki (zyga) wrote :

I've stopped all the services, discarded the mount namespace, connected all of the interfaces and managed to start a shell successfully.

The log of that is attached. I haven't investigated the details yet (there are *plenty* of layouts in this snap).

When I connect interfaces sequentially things indeed break. Looking at details.

Zygmunt Krynicki (zyga) wrote :

Some small advice unrelated to the bug (I hope). In the log I attached above you can see where snapd creates "writable mimic" when it prints "create-writable-mimic" with a path. This entry is interesting:

utils.go:456: DEBUG: create-writable-mimic "/snap/microk8s/x1/var/lib"

The snap needs a writable mimic in $SNAP/var/lib/snapd/lib/gl - you can avoid that by putting a layout entry that mounts a tmpfs there explicitly.

layout:
  $SNAP/var/lib/snapd/lib/gl:
    type: tmpfs

In addition, ship $SNAP/va/rlib/snapd/lib/gl as an empty mount point in your snap, this will save a lot of redundant operations.

Zygmunt Krynicki (zyga) wrote :

Looking at the actual bug I think we're tripping over:

2020/04/11 20:32:23.099745 change.go:353: DEBUG: mount --make-rprivate "/var/log/pods" (error: no such file or directory)

I'm looking at *why* this happens now.

Zygmunt Krynicki (zyga) wrote :

I've traced it to an interesting observation:

repeated "update" of the mount namespace, when there are no changes at all, when robust mount namespace updates are enabled, causes failure:

With this symlink layout item removed I can no longer reproduce the problem.

none /usr/libexec none x-snapd.kind=symlink,x-snapd.symlink=/var/snap/microk8s/common/usr/libexec,x-snapd.origin=layout 0 0

I suspect that the required mimic at /usr is somehow affecting the rest, investigating.

Zygmunt Krynicki (zyga) wrote :

It is the interplay of the two mimics:

/usr/lib/x86_64-linux-gnu for various nvidia bind mounts
/usr for the /usr/libexec symlink

We first create /usr/lib/x86_64-linux-gnu and then /usr - the fact that we have both is now causing problems. Looking at a smaller reproducer to analyze the algorithm we employ.

Zygmunt Krynicki (zyga) wrote :

The minimal reproducer is this mount profile:

/snap/microk8s/x1/var/lib/snapd/lib/gl/libEGL_nvidia.so.0 /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 none bind,rw,x-snapd.kind=file,x-snapd.origin=layout 0 0
none /usr/libexec none x-snapd.kind=symlink,x-snapd.symlink=/var/snap/microk8s/common/usr/libexec,x-snapd.origin=layout 0 0

I've attached log of two consecutive executions of snap-update-ns with that mount profile. The first one passes, the second one fails.

Zygmunt Krynicki (zyga) wrote :

This really shows what's wrong, from the part when we are explaining:

unmount (none /usr/libexec none x-snapd.kind=symlink,x-snapd.symlink=/var/snap/microk8s/common/usr/libexec,x-snapd.origin=layout 0 0)

This really means: remove the symlink at /usr/libexec

unmount (/snap/microk8s/x1/var/lib/snapd/lib/gl/libEGL_nvidia.so.0 /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 none bind,rw,x-snapd.kind=file,x-snapd.origin

This means: umount the bind-mount at /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 and unlink the placeholder file we created.

What really happens:

remove "/usr/libexec" (error: <nil>)

This is as I explained above.

umount "/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0" UMOUNT_NOFOLLOW (error: <nil>)

This also is as I explained above:

remove "/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0" (error: remove /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0: device or resource busy)

This fails, we also have a bigger writable mimic for *all of* /usr! In other words /usr is still a mount point.

This is very surprising because we have robust-mount-namespace-updates enabled and they were implemented to handle this situation *exactly* so what gives?

Well, this is a *file* bind mount, and that case is not accounted for in the code. https://github.com/snapcore/snapd/blob/master/cmd/snap-update-ns/change.go#L450 lacks a check for kind=="file".

Zygmunt Krynicki (zyga) wrote :

This pull request fixes the problem https://github.com/snapcore/snapd/pull/8481

I will send additional patches with regression and unit tests on Tuesday.

Ian Johnson (anonymouse67) wrote :

Thanks for the investigative work Zygmunt!

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers