snap 'core' broken/missing and causing autopkgtest failures

Bug #1824237 reported by Dan Streetman
4
This bug affects 1 person
Affects Status Importance Assigned to Milestone
snapd
Fix Released
Undecided
Zygmunt Krynicki
snapd (Ubuntu)
Fix Released
Undecided
Unassigned
Cosmic
New
Undecided
Unassigned

Bug Description

[impact]

some autopkgtests, like 'snapd' and 'docker.io', rely on snaps in their autopkgtests. Recently, something happened and those autopkgtests now do not have the 'core' snap installed - or, it's installed, but 'broken'.

Specifically, there is no /snap/core/ so no snaps can run, since they rely on the core snap for their interpreter (/snap/core/current/lib64/ld-linux-x86-64.so.2).

[test case]

look at any of the recent snapd autopkgtests, e.g.:
https://objectstorage.prodstack4-5.canonical.com/v1/AUTH_77e2ada1e7a84929a74ba3b87153c0ac/autopkgtest-cosmic/cosmic/amd64/s/snapd/20190410_125933_3cfed@/log.gz

+ /snap/bin/go get -u github.com/snapcore/spread/cmd/spread
/snap/go/3540/gowrapper: line 3: /snap/go/3540/bin/go: No such file or directory

[regression potential]

TBD; fix unknown

[other info]

this appears to be limited to cosmic only; local autopkgtests for bionic and disco do not fail.

Revision history for this message
Dan Streetman (ddstreet) wrote :

for clarification, this is from the inside of an autopkgtest (cosmic) that's stopped on the failed test:

ubuntu@autopkgtest:~$ ls -l /snap/
total 16
drwxr-xr-x 2 root root 4096 Apr 10 17:23 bin
drwxr-xr-x 3 root root 4096 Apr 10 17:23 go
drwxr-xr-x 2 root root 4096 Apr 10 17:22 lxd
-r--r--r-- 1 root root 548 Apr 10 17:21 README
ubuntu@autopkgtest:~$ snap list
Name Version Rev Tracking Publisher Notes
core 6673 stable canonical✓ broken
go 1.12.2 3540 stable mwhudson classic
lxd 10343 stable/… canonical✓ broken

as the description states, it failed here:

+ /snap/bin/go get -u github.com/snapcore/spread/cmd/spread
/snap/go/3540/gowrapper: line 3: /snap/go/3540/bin/go: No such file or directory

because its interpreter doesn't exist:

ubuntu@autopkgtest:~$ file /snap/go/3540/bin/go
/snap/go/3540/bin/go: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /snap/core/current/lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, BuildID[sha1]=492a07b258284c10c1260cf6662f283fac5a04a4, not stripped
ubuntu@autopkgtest:~$ ls -l /snap/core/current/lib64/ld-linux-x86-64.so.2
ls: cannot access '/snap/core/current/lib64/ld-linux-x86-64.so.2': No such file or directory

Dan Streetman (ddstreet)
description: updated
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Additional IRC log with some useful information:

<ddstreet> hey guys, just fyi, i opened lp #1824237 because the core snap in cosmic appears broken, and is causing autopkgtest failures for several pkgs (at least snapd and docker.io, maybe more)
23:30 <mup> Bug #1824237: snap 'core' broken/missing and causing autopkgtest failures <snapd (Ubuntu):New> <snapd (Ubuntu Cosmic):New> <https://launchpad.net/bugs/1824237>
23:41 <zyga> ddstreet: ack, thank you
23:41 I will discuss with the team tomorrow
23:41 <ddstreet> thnx!
23:41 <zyga> (I didn't mean to ping you ack)
23:41 <ddstreet> np, i haven't left yet :)
23:41 <zyga> ddstreet: is it reproducible?
23:41 <ddstreet> yep, every time
23:41 <zyga> perfect
23:41 well, at least it's not a heisenbug
23:46 ddstreet: do we have system log from an affected system?
23:46 ddstreet: did something unmount core but it is still present and can be mounted (the mount unit can be started)
23:46 <ddstreet> zyga i only see it in autopkgtests
23:46 <zyga> ddstreet: can we reproduce this with an interactive shell somehow>?
23:46 <ddstreet> sure, you familiar with running autopkgtests?
23:46 <zyga> so-so
23:46 <zyga> not in a distro context
23:46 I only used qemu
23:47 <ddstreet> so if you get the latest snapd dsc, e.g. 'pull-lp-source snapd cosmic'
23:47 then install autopkgtest pkg
23:48 then you can do 'autopkgtest -s -U snapd_2.38+18.10.dsc -- lxd ubuntu:cosmic'
23:48 the -s will stop on the failed test, and let you ssh into it
23:48 <zyga> this is brilliant, let me try at once
23:48 <ddstreet> personally, i use qemu, not lxd, so not sure if the lxd test will let you ssh in
23:48 <zyga> I prefer qemu too
23:48 more easy to recover :)
23:48 <ddstreet> to run a test with qmeu instead of lxd, you can just do 'autopkgtest-buildvm-ubuntu-cloud -r cosmic'
23:49 <ddstreet> and it will download and set up a .img file in your local dir
23:49 then, just replace the '-- lxd ubuntu:cosmic' with '-- qemu IMGFILE'
23:49 replacing IMGFILE of course
23:49 i'm about to eod but i'll be around tomorrow if i can help
23:49 <zyga> thank you
23:49 this is very useful
23:50 with this we can look around and have some theories as to what happened
23:50 I'm past EOD so I will only look deeper in the morning
23:50 <ddstreet> yep, hopefully it's clear once you get into the test env :)
23:50 ok sounds good, thnx!
23:50 <zyga> thank you, good night!

Steve Langasek (vorlon)
tags: added: regression-proposed
Zygmunt Krynicki (zyga)
Changed in snapd:
status: New → In Progress
assignee: nobody → Zygmunt Krynicki (zyga)
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I tried to reproduce it just now and ... nothing. It works:

I used a freshly built cosmic in qemu:

2019-04-11 11:30:21 Found /tmp/autopkgtest.ORoVMO/build.Xl5/src/spread.yaml.
2019-04-11 11:30:40 Project content is packed for delivery (102.95MB).
2019-04-11 11:30:40 Sequence of jobs produced with -seed=1554975040
2019-04-11 11:30:40 If killed, discard servers with: spread -reuse-pid=8180 -discard
2019-04-11 11:30:40 Allocating autopkgtest:ubuntu-18.10-amd64...
2019-04-11 11:30:40 Waiting for autopkgtest:ubuntu-18.10-amd64 to make SSH available at localhost:22...
2019-04-11 11:30:40 Allocated autopkgtest:ubuntu-18.10-amd64.
2019-04-11 11:30:40 Connecting to autopkgtest:ubuntu-18.10-amd64...
2019-04-11 11:30:40 Connected to autopkgtest:ubuntu-18.10-amd64 at localhost:22.
2019-04-11 11:30:40 Sending project content to autopkgtest:ubuntu-18.10-amd64...
2019-04-11 11:30:48 Preparing autopkgtest:ubuntu-18.10-amd64...
2019-04-11 11:35:47 Preparing autopkgtest:ubuntu-18.10-amd64:tests/smoke/...
2019-04-11 11:36:51 Preparing autopkgtest:ubuntu-18.10-amd64:tests/smoke/find-info...
2019-04-11 11:36:58 Executing autopkgtest:ubuntu-18.10-amd64:tests/smoke/find-info (1/4)...
2019-04-11 11:36:58 Restoring autopkgtest:ubuntu-18.10-amd64:tests/smoke/find-info...
2019-04-11 11:36:59 Preparing autopkgtest:ubuntu-18.10-amd64:tests/smoke/install...
2019-04-11 11:37:04 Executing autopkgtest:ubuntu-18.10-amd64:tests/smoke/install (2/4)...
2019-04-11 11:37:21 Restoring autopkgtest:ubuntu-18.10-amd64:tests/smoke/install...
2019-04-11 11:37:21 Preparing autopkgtest:ubuntu-18.10-amd64:tests/smoke/sandbox...
2019-04-11 11:37:28 Executing autopkgtest:ubuntu-18.10-amd64:tests/smoke/sandbox (3/4)...
2019-04-11 11:37:34 Restoring autopkgtest:ubuntu-18.10-amd64:tests/smoke/sandbox...
2019-04-11 11:37:34 Preparing autopkgtest:ubuntu-18.10-amd64:tests/smoke/remove...
2019-04-11 11:37:44 Executing autopkgtest:ubuntu-18.10-amd64:tests/smoke/remove (4/4)...
2019-04-11 11:37:47 Restoring autopkgtest:ubuntu-18.10-amd64:tests/smoke/remove...
2019-04-11 11:37:47 Restoring autopkgtest:ubuntu-18.10-amd64:tests/smoke/...
2019-04-11 11:37:54 Restoring autopkgtest:ubuntu-18.10-amd64...
2019-04-11 11:37:55 Discarding autopkgtest:ubuntu-18.10-amd64...
2019-04-11 11:37:55 Successful tasks: 4
2019-04-11 11:37:55 Aborted tasks: 0

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I ran several more iterations without reproducing the problem. I will run some more but it would help if the reporter can clarify if the issue is still observed in the wild.

Revision history for this message
Dan Streetman (ddstreet) wrote :

> I tried to reproduce it just now and ... nothing. It works:

hmm, that's strange...i just re-ran autopkgtest locally and it failed again.

What cmdline are you running for the autopkgtest?

This is the cmdline I'm using to reproduce:
$ autopkgtest -s -U snapd_2.38+18.10.dsc -- qemu /build/autopkgtest/autopkgtest-cosmic-amd64.img

and the img is created from autopkgtest-buildvm-ubuntu-cloud with -r cosmic.

Revision history for this message
Dan Streetman (ddstreet) wrote :

also - i get the same failure when testing with snapd_2.37.4+18.10.1.dsc so maybe this failure is coming from something outside the snapd package itself? Maybe the autopkgtest pulls in snaps and/or github code that's changed recently and is causing the problem?

Revision history for this message
Michael Vogt (mvo) wrote :

Looking at the ADT logs it looks like the tests started to fail at 2019-04-03 around 22:00utc.

The output of `snap changes` on a failed system would be great.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I used:

autopkgtest -s -U snapd_2.38+18.10.dsc -- qemu ./autopkgtest-cosmic-amd64.img

The image was built with:

autopkgtest-buildvm-ubuntu-cloud -r cosmic

The tests indeed pull stuff from the store (I bet the core snap at least) but even if we assume the store is unstable it does not explain how a snap that was installed can become broken.

In snapd world a snap becomes broken when meta/snap.yaml cannot be loaded, for example, it would imply that the core snap was simply unmounted. The question is, if that really happened and if so, why?

Revision history for this message
Michael Vogt (mvo) wrote :

I was just able to reproduce this on our travis instance. In the /var/log/apt/history.log I see:
"""
Start-Date: 2019-04-11 13:12:03
Commandline: apt-get remove -y --purge -y snapd
Purge: snapd:amd64 (2.37.4+18.04.1)
Error: Sub-process /usr/bin/dpkg returned an error code (1)
End-Date: 2019-04-11 13:12:12
"""
and in term.log:
"""
....
rm: cannot remove '/var/cache/snapd/aux': Is a directory^M
dpkg: error processing package snapd (--purge):^M
 installed snapd package post-removal script subprocess returned error exit status 1^M
Log ended: 2019-04-11 13:12:12
"""
so I strongly suspect that this is the issue.

Revision history for this message
Michael Vogt (mvo) wrote :

I looked into this a bit more and all the failures happen with 2.37.4 - the 2.38 SRU is actually fine and will fix this bug.

Revision history for this message
Steve Langasek (vorlon) wrote :

Thanks, Michael! Fixing the bug tags.

tags: removed: regression-proposed
Changed in snapd (Ubuntu):
status: New → Fix Released
Changed in snapd:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.