juju 3.0.0 unable to bootstrap on fresh machine

Bug #1988355 reported by Caner Derici
102
This bug affects 20 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
High
Unassigned
snapd
Confirmed
Medium
Alberto Mardegan

Bug Description

This recently popped up in our pylibjuju CI tests with juju --channel=latest/edge.

Problem is, after

`sudo snap install juju --channel latest/edge --classic`
Warning: flag --classic ignored for strictly confined snap juju

`juju bootstrap localhost test `

fails with:

ERROR cannot load ssh client keys: mkdir /home/jenkins/.local/share: permission denied

This is probably because the juju latest/edge is a strictly confined snap, it doesn't have the permissions to create (parent) directories. @jnsgruk reported on MM that things start to work after manually running `mkdir -p ~/.local/share`.

tags: added: bootstrap
Changed in juju:
status: New → Triaged
milestone: none → 3.0-rc1
importance: Undecided → High
Revision history for this message
Alberto Mardegan (mardy) wrote :

Hi Caner, I created a branch in snapd which could fix the issue: https://github.com/snapcore/snapd/pull/12111

But I'm not sure it does, because it depends on the setup you have in the test machine (and how you create the jenkins user). Can you please check what is the output of the command

    ps -fe | grep userd

in the test machine?

I can also try to create a snapd build for you to test. Which binary architecture are you using?

Alberto Mardegan (mardy)
Changed in snapd:
status: New → Incomplete
importance: Undecided → Medium
assignee: nobody → Alberto Mardegan (mardy)
Revision history for this message
Caner Derici (cderici) wrote :

Hi Alberto, thanks for looking into this!

So, I'm not entirely sure about how the jenkins user is set up initially. This is on an ephemeral node (focal, amd64) on aws running for a regular CI checks for a PR. Here's the whole output, might be helpful:

https://pastebin.canonical.com/p/nSjC7tvk2c/plain/

I went ahead and manually created an ephemeral node and ran the same CI job and I was able to recreate the problem. So I may be able to test it manually if you could make a snapd binary with the fix. I also checked the user jenkins on the same machine:

jenkins@ip-172-31-23-203:~$ ps -fe | grep userd
jenkins 11843 1409 0 17:34 pts/0 00:00:00 grep --color=auto userd

Changed in juju:
milestone: 3.0-rc1 → none
Revision history for this message
John A Meinel (jameinel) wrote :

My understanding is that the interface will be updated to create ~/.local/share for us if it doesn't exist, so that juju itself can create ~/.local/share/juju.

I think it makes sense to have the confinement such that juju itself cannot create ~/.local/share (it shouldn't be able to muck around in any other directories in that path).

The short term workaround is to add a "mkdir -p ~/.local/share" in your CI scripts, but that doesn't work very well for arbitrary end users.

Revision history for this message
John A Meinel (jameinel) wrote :

In *Juju* we could trap this error, and check if we are in a confined snap, and then prompt our users as a workaround, rather than just failing mysteriously on fresh installs.

Changed in juju:
milestone: none → 3.0.1
Revision history for this message
Alberto Mardegan (mardy) wrote :

Sorry for not following up on this earlier! The approach I initially thought of might have been able to fix a few cases, but not all of them: it was relying on the userd session daemon to be running, but that is not a certain fact, since we want snaps to be working even in those cases where the used it not logged in a graphical session.

The solution would be to make "snap run ..." create the directories: the /usr/bin/snap program is not subject to an AppArmor confinement, so it could create the directories before invoking snap-confine.

Changed in snapd:
status: Incomplete → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :

@mardy, sorry I missed your response. It is unclear to me whether you are saying that there should be something that the juju snap should be doing, or whether snapd itself should be doing this. I don't think we can inject a step before snap-confine, but if there is a way to do so, please let us know.

summary: - juju latest/edge unable to bootstrap on fresh machine
+ juju 3.0.0 unable to bootstrap on fresh machine
Revision history for this message
Alberto Mardegan (mardy) wrote :

Hi John, no, it was not a suggestion for you, but something for the snapd team. Sorry for not making this clear :-)

Changed in juju:
milestone: 3.0.1 → 3.0.2
Changed in juju:
milestone: 3.0.2 → 3.0.3
Revision history for this message
Juan M. Tirado (tiradojm) wrote :

I will set this as released because it is no longer impacting pylibjuju.

Changed in juju:
status: Triaged → Fix Released
Revision history for this message
John A Meinel (jameinel) wrote :

I think there is still something that juju itself can do, so not Fix Released.

Specifically on a fresh install, we should recognize that we are in a confined snap and give the user a better error message, asking them to create the parent directory. Rather than just having mkdir fail and fall over.

Changed in juju:
milestone: 3.0.3 → none
status: Fix Released → Triaged
Revision history for this message
Eric Chen (eric-chen) wrote :

Hi, May I confirm the current plan for this issue?
Will the snap create the folder or it should be handled by user?

tags: added: cdo-qa
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.