services in a snap with dependencies aren't started in correct order on install

Bug #1796125 reported by Ian Johnson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Fix Released
Undecided
Maciej Borzecki

Bug Description

You can specify services that need to be run before/after a daemon in the snap.yaml using the `after` and `before` keywords. For example, in the below snap, the desired ordering is service1 runs, then service2, then service3; this is represented using the after spec in the snapcraft.yaml. However, there is a race condition here because snapd when installing the snap runs `systemctl start service1 service2 service3`, which sometimes leads to service3 being run before service1 finishes (and also even sometimes being run before service2).

The `after`/`before` keywords are implemented in systemd units using the corresponding `After` and `Before`, which works when booting the system up, but not when initially installing the snap. When you install the same systemd units, and restart your system, systemd correctly handles the ordering dependency every time when starting up the system. (I tested this by installing the following snap in a VM, and restarting the VM and checking the systemd service startup order, restarting and checking again, etc. 10 times in a row)

The system booting case is different than what snapd does when it installs a snap because when starting the system up, systemd isn't asked explicitly to start all of the services in one transaction as snapd does when installing a snap. The correct remedy here then I think is to add all `after` dependencies to the `Requires` setting in the systemd unit, and do the inverse for `before`.

This can be reproduced with the linked snapcraft project: https://github.com/anonymouse64/snapd-services-ordering-test. As this is somewhat of a race condition, you may need to install the snaps multiple times to see the invalid ordering show up. The project has a run-tests.sh script which checks the ordering of services using `snap logs` 10 times, so just run the following:

```
$ git clone https://github.com/anonymouse64/snapd-services-ordering-test
$ cd snapd-services-ordering-test
$ ./run-tests.sh
invalid service ordering for test test-services-16169
See log output:
2018-10-04T14:40:42Z systemd[1]: Starting Service for snap application test-services-16169.service1...
2018-10-04T14:40:42Z test-services-16169.service1[9598]: service1 started, sleeping
2018-10-04T14:40:42Z test-services-16169.service3[9584]: service3 started, sleeping
2018-10-04T14:40:47Z test-services-16169.service1[9598]: done sleeping, notifying
2018-10-04T14:40:47Z test-services-16169.service3[9584]: done sleeping, notifying
2018-10-04T14:40:47Z systemd[1]: Started Service for snap application test-services-16169.service3.
2018-10-04T14:40:47Z systemd[1]: Started Service for snap application test-services-16169.service1.
2018-10-04T14:40:47Z systemd[1]: Starting Service for snap application test-services-16169.service2...
2018-10-04T14:40:47Z test-services-16169.service2[9680]: service2 started
2018-10-04T14:40:47Z systemd[1]: Started Service for snap application test-services-16169.service2.

invalid service ordering for test test-services-32267
See log output:
2018-10-04T14:42:28Z systemd[1]: Starting Service for snap application test-services-32267.service1...
2018-10-04T14:42:28Z test-services-32267.service3[13257]: service3 started, sleeping
2018-10-04T14:42:28Z test-services-32267.service1[13280]: service1 started, sleeping
2018-10-04T14:42:33Z test-services-32267.service3[13257]: done sleeping, notifying
2018-10-04T14:42:33Z systemd[1]: Started Service for snap application test-services-32267.service3.
2018-10-04T14:42:33Z test-services-32267.service1[13280]: done sleeping, notifying
2018-10-04T14:42:33Z systemd[1]: Started Service for snap application test-services-32267.service1.
2018-10-04T14:42:33Z systemd[1]: Starting Service for snap application test-services-32267.service2...
2018-10-04T14:42:33Z test-services-32267.service2[13341]: service2 started
2018-10-04T14:42:33Z systemd[1]: Started Service for snap application test-services-32267.service2.
invalid service ordering for test test-services-10625
See log output:
2018-10-04T14:42:41Z systemd[1]: Starting Service for snap application test-services-10625.service1...
2018-10-04T14:42:42Z test-services-10625.service1[13869]: service1 started, sleeping
2018-10-04T14:42:42Z test-services-10625.service2[13848]: service2 started
2018-10-04T14:42:42Z systemd[1]: Started Service for snap application test-services-10625.service2.
2018-10-04T14:42:42Z systemd[1]: Starting Service for snap application test-services-10625.service3...
2018-10-04T14:42:42Z test-services-10625.service3[13911]: service3 started, sleeping
2018-10-04T14:42:47Z test-services-10625.service1[13869]: done sleeping, notifying
2018-10-04T14:42:47Z systemd[1]: Started Service for snap application test-services-10625.service1.
2018-10-04T14:42:47Z test-services-10625.service3[13911]: done sleeping, notifying
2018-10-04T14:42:47Z systemd[1]: Started Service for snap application test-services-10625.service3.
invalid service ordering for test test-services-21584
See log output:
2018-10-04T14:42:55Z systemd[1]: Starting Service for snap application test-services-21584.service1...
2018-10-04T14:42:55Z test-services-21584.service1[14474]: service1 started, sleeping
2018-10-04T14:42:55Z test-services-21584.service2[14454]: service2 started
2018-10-04T14:42:55Z systemd[1]: Started Service for snap application test-services-21584.service2.
2018-10-04T14:42:55Z systemd[1]: Starting Service for snap application test-services-21584.service3...
2018-10-04T14:42:55Z test-services-21584.service3[14521]: service3 started, sleeping
2018-10-04T14:43:00Z test-services-21584.service1[14474]: done sleeping, notifying
2018-10-04T14:43:00Z systemd[1]: Started Service for snap application test-services-21584.service1.
2018-10-04T14:43:00Z test-services-21584.service3[14521]: done sleeping, notifying
2018-10-04T14:43:00Z systemd[1]: Started Service for snap application test-services-21584.service3.
```

If you change the systemd units generated to also include the `Requires`, like the simple patch below, there are no errors starting the units up:
```
$ git diff wrappers/services.go
diff --git a/wrappers/services.go b/wrappers/services.go
index 57498d454..e7b9236fc 100644
--- a/wrappers/services.go
+++ b/wrappers/services.go
@@ -436,7 +436,7 @@ func genServiceFile(appInfo *snap.AppInfo) []byte {
        serviceTemplate := `[Unit]
 # Auto-generated, DO NOT EDIT
 Description=Service for snap application {{.App.Snap.InstanceName}}.{{.App.Name}}
-Requires={{.MountUnit}}
+Requires={{.MountUnit}}{{range .After}} {{.}}{{end}}
 Wants={{.PrerequisiteTarget}}
 After={{.MountUnit}} {{.PrerequisiteTarget}}{{range .After}} {{.}}{{end}}
 {{- if .Before}}
```

(note that the patch doesn't handle the Before case, but demonstrates the basic idea behind the fix)

description: updated
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :
Changed in snapd:
assignee: nobody → Maciej Borzecki (maciek-borzecki)
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

PR: https://github.com/snapcore/snapd/pull/6023

I've also forked the reference test case and updated it to use just `snap pack`: https://github.com/bboozzoo/snapd-services-ordering-test

Revision history for this message
Ian Johnson (anonymouse67) wrote :

Thanks for working on this. As mentioned in the PR, I am wondering about how we should handle disabled services? For example if in my snap I have the following dependencies:

* service2 depends on service1
* service1 depends on nothing

and I have an install hook that disables service1 with `snapctl stop --disable service1`, what should happen during the installation? I think there are a few options:

1. We ignore the missing dependency as implemented in the PR, and just start up the enabled services, and if those fail to start (likely because their dependency was never started), only then we abort the installation
2. We walk the dependency tree and only start services which have all their dependencies enabled (and again only abort the installation if starting these services fails)
3. We abort the installation if a service is not disabled, but it's dependencies are disabled

Note that the behavior of systemd with `After` specs is to *not* start the `After` services, so in this case when we run `systemctl start service2` (again with service1 disabled), service1 is not started at all.

I'm unsure about systemd's default behavior regarding starting services at boot which have their dependencies disabled, but I think that systemd will ignore that and just start these services (but not start the disabled service). I will confirm later...

Tony Espy (awe)
Changed in snapd:
status: New → Confirmed
Revision history for this message
Maciej Borzecki (maciek-borzecki) wrote :

The fix is available in snapd 2.36+

Changed in snapd:
status: Confirmed → Fix Committed
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.