stale copy of plug and slot attributes is kept in connection state

Bug #1825883 reported by Ken VanDine on 2019-04-22
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
snapd
High
Zygmunt Krynicki

Bug Description

The gtk-common-themes snap provides a number of themes exported under the gtk-3-themes plug. I was fixing a bug reporting that Yaru-dark wasn't included, I noticed on refresh the snap using the content doesn't get the new mount point.

gnome-calculator is a good example. It connects to gtk-common-themes:gtk-3-themes which provides a number of directories to make available to snaps that need them. After adding the Yaru-dark directory under themes and refreshing, the dark variant isn't available.

To reproduce:
snap run --shell gnome-calculator
ls -ltr $SNAP/data-dir/themes/ |grep Yaru

You'll notice there is only one directory there, "Yaru"

Now refresh gtk-common-themes from candidate and repeat those steps. You should see Yaru and Yaru-dark, which I'm only seeing "Yaru".

snap disconnect gnome-calculator:gtk-3-themes
snap connect gnome-calculator:gtk-3-themes gtk-common-themes:gtk-3-themes
snap run --shell gnome-calculator
ls -ltr $SNAP/data-dir/themes/ |grep Yaru

Now you'll notice Yaru and Yaru-dark are both available to the gnome-calculator snap.

I'm guessing this is a bug related to the mount namespace.

Zygmunt Krynicki (zyga) wrote :

This is curious because I would expect this to work. I will write a spread test tomorrow and see what happens.

Changed in snapd:
assignee: nobody → Zygmunt Krynicki (zyga)
Zygmunt Krynicki (zyga) on 2019-04-23
Changed in snapd:
status: New → In Progress
Zygmunt Krynicki (zyga) wrote :

I can reproduce this error according to the instructions given. I will now write an integration test that does the same in a reduced and reproducible environment.

Quick code inspection does not yet reveal where the issue may be occurring.

Changed in snapd:
importance: Undecided → High
milestone: none → 2.39
Zygmunt Krynicki (zyga) wrote :

This is also easily reproducible with a regression test. I will now inspect what the heck is wrong.

Zygmunt Krynicki (zyga) wrote :

We managed to debug this to the following observation:

- snapd maintains a list of connections between plugs and slots in the state
- the per-connection-state tracks static and dynamic plug and slot attributes
- the static slot attributes are stale

I'm working on a regression test here:

https://github.com/snapcore/snapd/compare/master...zyga:fix/lp-1825883?expand=1

Experiments with that branch consisting of installing the app snap, installing v1 of the content snap and inspecting the state file shows that we take a snapshot of the slot attributes

zyga@fyke:/var/lib/snapd$ sudo cat state.json | jq '.data.conns["app-snap:things content-snap:things"]'
{
  "interface": "content",
  "plug-static": {
    "content": "things",
    "target": "$SNAP/things"
  },
  "slot-static": {
    "content": "things",
    "source": {
      "read": [
        "$SNAP/things/a",
        "$SNAP/things/b"
      ]
    }
  }
}

That connection state is *NOT* updated when we refresh to content snap v2. This results in stale data being presented to the rest of the snapd system, ultimately resulting in the mount profile not presenting the new theme.

summary: - content interface mounts not updating on refresh
+ stale copy of plug and slot attributes is kept in connection state
Zygmunt Krynicki (zyga) wrote :

We understand the bug now but fixing it will require some discussion inside the team that we cannot do today due to partial presence of key members.

Zygmunt Krynicki (zyga) wrote :

I was thinking about this bug some more and I'd like to work through a mental scenario:

- there are two snaps A and B
- A and B have an interface connection
- The interface endpoint (either plug or slot, irrelevant) has an attribute that depends on the snap revision
  - A simple example is a content interface using $SNAP for either the source or the target
- The snap that has the endpoint that contains the attribute uses interface hooks to keep track of the value at all times
- Upon initial connection or auto-connection the appropriate interface hook fires, the snap can use snapctl to read the attribute value, so far so good
- The second snap (the one providing the attribute) refreshes
- The first snap's hook are not fired, the snap no longer knows the correct value because the $SNAP on the remote side has changed without notification

The tracking case is not something that matters to any snap that I'm aware of today but shows that there's fundamentally no way to reliably track what a snap is connected to. Regardless of all the hook complexity we have, it is not enough and the information can get out of sync.

While this adds some more complexity I think we need to add a new hook type: connection-updated. Unlike eraser proposals that had a reconnect hook, this time the purpose of the hook is simple and unambiguous: to let a snap know that the far end of the connection was changed. The hook would not allow a snap to re-negotiate anything (it would be only the post-connection notification hook, aka connect-{plug,slot}-${PLUG,SLOT}_NAME. It might even be *that* hook pair exactly.

There is one fundamental difference between a connection-updated and the earlier proposal consisting of the reconnection hook. The interface was never disconnected. There was never a moment when the snap could observe the connection to be severed.

As such, I would like to propose that when a snap refreshes and we perform the security setup, we _afterwards_ add tasks for interface hooks on all the snaps that have an interface connection to the snap that was refreshed, to let them known that a connection was updated.

As an important observation, because the dynamic interface attributes are only controllable during the prepare-{plug,slot}-${PLUG,SLOT}_NAME hook, we don't have to change dynamic attributes during a snap refresh.

During a snap refresh, we must however refresh the static attributes, this may run afoul with validation (sanitisation) which may abort the refresh but that is unavoidable.

Zygmunt Krynicki (zyga) wrote :

I proposed an halfway there fix that addresses enough of the problem to fix the content interface https://github.com/snapcore/snapd/pull/6770

Zygmunt Krynicki (zyga) wrote :

I've proposed a 2nd version of the patch fixing this as https://github.com/snapcore/snapd/pull/6781

Paweł Stołowski (stolowski) wrote :

This is being re-worked again, the 2nd fix wasn't correct after all and gets reverted.

Michael Vogt (mvo) wrote :
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers