juju doesn't surface cloud-init failures

Bug #1708676 reported by Nicholas Skaggs
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Medium
Unassigned

Bug Description

This bug is similar to bug 1644566, though the underlying reason doesn't seem to be the same. On a fresh artful (this seems to affect at least zesty also) instance, I am unable to bootstrap a new lxd container. I have tried purging config and reinstalling lxd and juju from deb and snaps to various degress with no luck. The end result is juju stuck waiting at Attempting to connect to 10.249.132.39:22. See attached log. This is with lxd 2.16 and juju 2.2.2 or 2.3-alpah1.

$ lxc list
+---------------+---------+----------------------+------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+---------------+---------+----------------------+------+------------+-----------+
| juju-759e30-0 | RUNNING | 10.249.132.39 (eth0) | | PERSISTENT | 0 |
+---------------+---------+----------------------+------+------------+-----------+
$ lxc network list
+--------+----------+---------+-------------+---------+
| NAME | TYPE | MANAGED | DESCRIPTION | USED BY |
+--------+----------+---------+-------------+---------+
| eno1 | physical | NO | | 0 |
+--------+----------+---------+-------------+---------+
| lxdbr0 | bridge | YES | | 1 |
+--------+----------+---------+-------------+---------+
$ ssh ubuntu@10.249.132.39
Warning: Permanently added '10.249.132.39' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.11.0-10-generic x86_64)

 * Documentation: https://help.ubuntu.com
 * Management: https://landscape.canonical.com
 * Support: https://ubuntu.com/advantage

  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud

0 packages can be updated.
0 updates are security updates.

ubuntu@juju-759e30-0:~$ exit
logout
Connection to 10.249.132.39 closed.
$ lxc file pull juju-759e30-0/var/lib/cloud/seed/nocloud-net/meta-data -
#cloud-config
instance-id: juju-759e30-0
local-hostname: juju-759e30-0

Revision history for this message
Nicholas Skaggs (nskaggs) wrote :
Revision history for this message
Peter Matulis (petermatulis) wrote :

I have been bit by this on Xenial as well. However, it works on bare metal (Zesty). It could have something to do with the cloud images.

tags: added: docteam
Revision history for this message
Peter Matulis (petermatulis) wrote :

This started working again for me today (on Xenial).

Revision history for this message
Nicholas Skaggs (nskaggs) wrote : Re: juju bootstrap lxd hangs if there is cloud-init failure

Thanks to Ian, who clued me into the issue. Juju isn't bubbling up underlying cloud-init issues. In this specific case, apt was locking up trying to install (no network connectivity). For those who may encounter this before this is surfaced into juju, ssh into the machine manually and look at the cloud-init logs.

ssh ubuntu@IP
tail -f /var/log/cloud-init-*.log

summary: - juju bootstrap lxd hangs
+ juju bootstrap lxd hangs if there is cloud-init failure
Changed in juju:
status: New → Triaged
importance: Undecided → Medium
summary: - juju bootstrap lxd hangs if there is cloud-init failure
+ juju doesn't surface cloud-init failures
Revision history for this message
Nicholas Skaggs (nskaggs) wrote :

I discovered that manually launching a container and ssh'ing into it using lxd I have network connectivity. For juju launched containers however, I do not. In addition, cloud-init shouldn't hang forever when attempting to install apt packages and apt fails (apt in this case doesn't seem to timeout).

Dan Watkins (oddbloke)
no longer affects: cloud-init
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 5 years, so we're marking it Expired. If you believe this is incorrect, please update the status.

Changed in juju:
status: Triaged → Expired
tags: added: expirebugs-bot
Revision history for this message
Jordan Barrett (barrettj12) wrote :

This issue still affects Juju today.

If an error occurs during bootstrap, we DON'T notify the user - instead the Juju client just hangs and the user is left wondering what's happening. They have to ssh into the machine, find the right log file, trawl through the logs...

This is terrible UX, and we should be making a best-effort here to surface any issues that come up during the bootstrap process.

Changed in juju:
status: Expired → Triaged
Revision history for this message
Brett Holman (holmanb) wrote :

I'm not sure how Juju works, but if it has access to the instance that it deploys, there are some user-facing tools that Juju could make use of.

Cloud-init surfaces status information via the command `cloud-init status --format json`.

Return codes:

0: means that cloud-init completed successfully.
1: means that cloud-init experienced a fatal error and was unable to complete.
2: means that cloud-init completed, but experienced an error.

Output contains the following keys:

extended_status: shows current cloud-init status
recoverable_errors: displays text describing errors seen by cloud-init

Revision history for this message
Brett Holman (holmanb) wrote :

If access to the instance is not possible, cloud-init could potentially expose this status information via some new functionality.

For example, since cloud-init logs to the serial console, cloud-init could log additional information that may be needed by Juju, since the same information would likely help other users as well.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.