Snapd `cannot update snap namespace` when connecting / disconnecting interfaces

Bug #1871189 reported by Joseph Borg
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Fix Released
High
Zygmunt Krynicki

Bug Description

When trying to connect interfaces:

```
error: cannot perform the following tasks:
- Connect microk8s:docker-privileged to snapd:docker-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:docker-support to snapd:docker-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:kubernetes-support to snapd:kubernetes-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:k8s-kubelet to snapd:kubernetes-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:k8s-kubeproxy to snapd:kubernetes-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:dot-kube to snapd:personal-files (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:network-control to snapd:network-control (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:network-observe to snapd:network-observe (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:firewall-control to snapd:firewall-control (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:process-control to snapd:process-control (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:kernel-module-observe to snapd:kernel-module-observe (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:kernel-module-control to snapd:kernel-module-control (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:mount-observe to snapd:mount-observe (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:hardware-observe to snapd:hardware-observe (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:system-observe to snapd:system-observe (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
error: cannot perform the following tasks:
- Connect microk8s:k8s-journald to snapd:kubernetes-support (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: invalid argument)
```

Then, when trying to remove the snap / disconnect interfaces:

```
- Disconnect microk8s:network from snapd:network (cannot update mount namespace of snap "microk8s": cannot update preserved namespace of snap "microk8s": cannot update snap namespace: device or resource busy)
```

Changed in snapd:
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → Ian Johnson (anonymouse67)
Revision history for this message
Ian Johnson (anonymouse67) wrote :

So I tried to make a minimal reproducer for this snap, but the minimal reproducer I have starts working again when I enable robust-mount-namespace-updates unfortunately, however even with robust mount namespaces on, I can still reproduce this with the original snap.

Zygmunt, I'm assigning this to you and I can provide you with the full snap tomorrow, there's probably a smaller reproducer but I wasn't able to build one, I think part of it might be that there are services in the snap that are actually using some files that are part of the layout.

Changed in snapd:
assignee: Ian Johnson (anonymouse67) → Zygmunt Krynicki (zyga)
Changed in snapd:
importance: Medium → High
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Can you please provide information on how to reproduce this.

I looked at various channels but I was unable to find a version with the interfaces mentioned here.

Changed in snapd:
status: Triaged → Incomplete
Revision history for this message
Joseph Borg (joeborg) wrote :

Hey Zyga, sure.

1) Download this snap: https://github.com/ubuntu/microk8s/actions/runs/69256416
    For reference the snapcraft.yaml is: https://github.com/ubuntu/microk8s/blob/feature/jdb/strict/snap/snapcraft.yaml

2) Bring snapd, core, core18 to edge.
    sudo snap install snapd --edge
    sudo snap refresh core --edge
    sudo snap install core18 --edge

3) Install the snap
    sudo snap install ./microk8s.snap --dangerous

4) Connect the interfaces (this fails):
    for i in docker-privileged docker-support kubernetes-support k8s-kubelet k8s-kubeproxy dot-kube network network-bind network-control network-observe firewall-control process-control kernel-module-observe kernel-module-control mount-observe hardware-observe system-observe home opengl k8s-journald; do sudo snap connect microk8s:$i; done

5) Try to remove the sanp (this fails):
    sudo snap remove microk8s

6) Ask for Markdown in launchpad :)

Let me know if I can help.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Thank you for the details. I will debug this tomorrow.

I can relate to 6. Can we please somehow get Markdown? :-)

Zygmunt Krynicki (zyga)
Changed in snapd:
status: Incomplete → In Progress
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I've stopped all the services, discarded the mount namespace, connected all of the interfaces and managed to start a shell successfully.

The log of that is attached. I haven't investigated the details yet (there are *plenty* of layouts in this snap).

When I connect interfaces sequentially things indeed break. Looking at details.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Some small advice unrelated to the bug (I hope). In the log I attached above you can see where snapd creates "writable mimic" when it prints "create-writable-mimic" with a path. This entry is interesting:

utils.go:456: DEBUG: create-writable-mimic "/snap/microk8s/x1/var/lib"

The snap needs a writable mimic in $SNAP/var/lib/snapd/lib/gl - you can avoid that by putting a layout entry that mounts a tmpfs there explicitly.

layout:
  $SNAP/var/lib/snapd/lib/gl:
    type: tmpfs

In addition, ship $SNAP/va/rlib/snapd/lib/gl as an empty mount point in your snap, this will save a lot of redundant operations.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Looking at the actual bug I think we're tripping over:

2020/04/11 20:32:23.099745 change.go:353: DEBUG: mount --make-rprivate "/var/log/pods" (error: no such file or directory)

I'm looking at *why* this happens now.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I've traced it to an interesting observation:

repeated "update" of the mount namespace, when there are no changes at all, when robust mount namespace updates are enabled, causes failure:

With this symlink layout item removed I can no longer reproduce the problem.

none /usr/libexec none x-snapd.kind=symlink,x-snapd.symlink=/var/snap/microk8s/common/usr/libexec,x-snapd.origin=layout 0 0

I suspect that the required mimic at /usr is somehow affecting the rest, investigating.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

It is the interplay of the two mimics:

/usr/lib/x86_64-linux-gnu for various nvidia bind mounts
/usr for the /usr/libexec symlink

We first create /usr/lib/x86_64-linux-gnu and then /usr - the fact that we have both is now causing problems. Looking at a smaller reproducer to analyze the algorithm we employ.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

The minimal reproducer is this mount profile:

/snap/microk8s/x1/var/lib/snapd/lib/gl/libEGL_nvidia.so.0 /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 none bind,rw,x-snapd.kind=file,x-snapd.origin=layout 0 0
none /usr/libexec none x-snapd.kind=symlink,x-snapd.symlink=/var/snap/microk8s/common/usr/libexec,x-snapd.origin=layout 0 0

I've attached log of two consecutive executions of snap-update-ns with that mount profile. The first one passes, the second one fails.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This really shows what's wrong, from the part when we are explaining:

unmount (none /usr/libexec none x-snapd.kind=symlink,x-snapd.symlink=/var/snap/microk8s/common/usr/libexec,x-snapd.origin=layout 0 0)

This really means: remove the symlink at /usr/libexec

unmount (/snap/microk8s/x1/var/lib/snapd/lib/gl/libEGL_nvidia.so.0 /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 none bind,rw,x-snapd.kind=file,x-snapd.origin

This means: umount the bind-mount at /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0 and unlink the placeholder file we created.

What really happens:

remove "/usr/libexec" (error: <nil>)

This is as I explained above.

umount "/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0" UMOUNT_NOFOLLOW (error: <nil>)

This also is as I explained above:

remove "/usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0" (error: remove /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.0: device or resource busy)

This fails, we also have a bigger writable mimic for *all of* /usr! In other words /usr is still a mount point.

This is very surprising because we have robust-mount-namespace-updates enabled and they were implemented to handle this situation *exactly* so what gives?

Well, this is a *file* bind mount, and that case is not accounted for in the code. https://github.com/snapcore/snapd/blob/master/cmd/snap-update-ns/change.go#L450 lacks a check for kind=="file".

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

This pull request fixes the problem https://github.com/snapcore/snapd/pull/8481

I will send additional patches with regression and unit tests on Tuesday.

Revision history for this message
Ian Johnson (anonymouse67) wrote :

Thanks for the investigative work Zygmunt!

Changed in snapd:
status: In Progress → Fix Released
milestone: none → 2.45
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.