snap can't be installed once pushed to the store

Bug #1610816 reported by Menno Finlay-Smits
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Snapcraft
Invalid
Undecided
Unassigned
Software Center Agent
Fix Released
High
Unassigned

Bug Description

After successfully pushing a snap to the store I was unable to install it.

Here's what I did:

1. Created a new snap called "juju-menno".
2. Registered the name (snapcraft register juju-menno)
3. Pushed the snap (snapcraft juju-menno_2.0-beta15_amd64.snap --release edge)
4. Logged in to the store and saw that the snap was there. It was listed Public and Published.
5. Attempted to install the snap:

$ snap install --edge --devmode juju-menno
error: cannot perform the following tasks:
- Download snap "juju-menno" from channel "edge" (snap not found)
$ sudo snap install --edge --devmode juju-menno.menno
error: cannot perform the following tasks:
- Download snap "juju-menno.menno" from channel "edge" (snap not found)

6. noodles ended up helping me. He clicked Reindex and then the snap was available.

<noodles> I didn't change anything about your revision. The only thing I did do was "Reindex" - not sure why it wouldn't have been indexed correctly the first time :/
<menn0> noodles: before, that was giving me a "snap not found"
<noodles> Yeah, I saw the same, verifyied that I couldn't see any results when querying the index directly, so tried reindexing.

I would have expected the snap to be installable without extra intervention.

description: updated
Revision history for this message
Michael Nelson (michael.nelson) wrote :

The rocketchat crew are hitting this occasionally as well. I verified that the push --released snap was displayed in myapps as being published in stable, but required a re-index before it showed up in the snap index:

http://paste.ubuntu.com/23381379/

The script being used to release there is at:

https://github.com/RocketChat/Rocket.Chat/blob/develop/.travis/snap.sh

I'm adding snapcraft to this just for notification, but it's almost certainly an intermittent issue in myapps.developer.ubuntu.com (most likely, not handling failure cases).

Changed in software-center-agent:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Michael Nelson (michael.nelson) wrote :

Checking logs, there are some non-200 hits to /snap-release/: https://pastebin.canonical.com/169472/ . But no 503s (which the code responds with if the publish to the index fails immediately), only 404s and 409s:

https://pastebin.canonical.com/169473/

Next: check if the 409s could be related. If not, check celery logs for the actual publishing to CPI.

Revision history for this message
Michael Nelson (michael.nelson) wrote :

So the data shows a race condition between snapcraft calling /snap-release/ (which publishes to the index with the specified channel) and the auto-review of the store (when a new revision is uploaded) triggering a publish to the index (without channels). That is, afaict,

1) snapcraft pushes a new snap, then polls for it to go into ready-to-publish

2) When the store finishes the autoreview it moves it to ready-to-publish, and for an upload where the corresponding snap is already published, a task is created (Task1) to update the index with the new revision and whatever the current list of channels is at the time (if snap-release is not yet called, an empty list, otherwise whatever the release channels are).

3) At the same time, snapcraft also detects that everything went fine (upload is ready-to-publish), and so calls /snap-release/ with the desired channel (stable), this also creates a task (Task2) to update the index with the new revision released to the specified channel.

So if Task1 is created before Task2, but Task2 is executed before Task1, the index is first updated with channels=['stable'], then a few seconds later with channels=[]. For example (private, sorry): https://pastebin.canonical.com/169629/
There you see:
02:42:44 - upload r143 begins auto review task
02:42:48 - CPI updated with r143 in stable (I assume because review completed, polling snapcraft detected and released)
02:42:51 - CPI updated with r143 in no channels.

We need to check, but I don't think we have single workers per celery queue (which would ensure tasks are always executed in created order, but could still theoretically result in a race due to the concurrent creation). Other possible fixes: push --release tells the store during the push which channels to which it should release (rather than a separate API call - check snapcraft). Another possibility would be for the auto-review to not publish with empty channels. I'll ask around in case there are other ideas (or mistakes in my understanding of the data above).

Revision history for this message
Michael Nelson (michael.nelson) wrote :

Another option which @wgrant mentioned, is that we can update the store review task so that it doesn't send a channels attribute (ie. it's ambivalent about channels, given that the upload was only just added), and the index to know that, if there is no channel attribute, it should not update any existing channels (and if there is no current doc in the index, the channel list is set empty).

Revision history for this message
Michael Nelson (michael.nelson) wrote :

As a work-around for people who hit this situation, releasing to a different channel and then back to the original will force an update (either via the webui, or via snapcraft release)

Revision history for this message
Michael Nelson (michael.nelson) wrote :

While looking at a separate issue, we've (well, wgrant :) ) has identified what is most likely the cause of the out-of-order tasks being run.

Celery was running with the default value for worker_prefetch_multiplier [1], effectively meaning that multiple tasks would be grabbed by each worker, so if worker one grabs [1, 2, 3, 4], while another concurrent process grabs [5, 6, 7, 8], if task 1 happens to be a longer-running task then 5, 6, 7 and 8 can potentially all complete before 2.

This has been updated and we're monitoring, but please let us know if you see the same behaviour again. Thanks.

[1] http://docs.celeryproject.org/en/latest/userguide/configuration.html#std:setting-worker_prefetch_multiplier

Changed in snapcraft:
status: New → Invalid
Changed in software-center-agent:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.