Wrong content in cloudinit-userdata can crash Juju's controller

Bug #1978454 reported by DUFOUR Olivier
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Committed
Wishlist
Unassigned

Bug Description

When experimenting with cloudinit-userdata option in a model, I thought naively that it was possible to use exactly the same syntax methods for runcmd of cloudinit.
Anyway, when adding some specific content and then adding a machine, the controller will then simply panic and restart.

Versions :
- MaaS : 3.1
- Juju : 2.9.31
- single juju controller (not tested in HA environment)

Steps :
1) add to cloudinit-userdata :
"postruncmd:
 - [ systemctl, restart, snap.lxd.daemon.service ]"

2) add a machine to the model

3) the controller crashes
https://pastebin.canonical.com/p/ySHfRSN84C/

3)a) The machine gets allocated from MaaS point of view.
3)b) the controller restarting will attempt to allocate another machine and crash in a loop until no machine is available from MaaS.

Revision history for this message
Joseph Phillips (manadart) wrote :

Does this work if you have a YAML file with contents:

cloudinit-userdata: |
    postruncmd:
    - "systemctl restart snap.lxd.daemon.service"

And use it to set the model config?

Revision history for this message
DUFOUR Olivier (odufourc) wrote :

Yes the format of the content indicated in your comment, does work as intended and it was the final solution I retained.

However the example I gave in the description is just there to showcase that the controller crashes definitely.

For sure my syntax is wrong but :
- the controller shouldn't either crash at all ;
- or at least states a bit more clearly that the cloudinit-userdata is the source of the error before crashing.

Changed in juju:
status: New → Triaged
importance: Undecided → Wishlist
Revision history for this message
Brett Holman (holmanb) wrote :

> I thought naively that it was possible to use exactly the same syntax methods for runcmd of cloudinit.

This a reasonable assumption to make, since the differences you see are undocumented.

> For sure my syntax is wrong but :
>- the controller shouldn't either crash at all ;

+1, agreed. I think that this occurs because juju inserts its own `runcmd` which contains `set -xe`[1]. Therefore any command in the script which fails (such as the one you included) will cause a crash. You should be able to work around this by starting your `postruncmd` with `set +xe;`, which should work around Juju's behavior for now.

> - or at least states a bit more clearly that the cloudinit-userdata is the source of the error before crashing.

Agreed, it is import for debugging to display the source of an error to the user. A bug[2] for this issue already exists.

[1] https://github.com/juju/juju/blob/42f6a4bbb7818847c3fd3234d39da98369c4a03f/cloudconfig/userdatacfg_unix.go#L175

[2] https://bugs.launchpad.net/juju/+bug/1708676

Revision history for this message
Brett Holman (holmanb) wrote :

See this PR[1] which should fix the issue.

https://github.com/juju/juju/pull/17180

Revision history for this message
Brett Holman (holmanb) wrote :

The PR containing a fix has been merged and should be available in the upcoming 3.5 release.

Changed in juju:
status: Triaged → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.