[RFE] make machine status messages more informative

Bug #1849794 reported by Andrea Ieri
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Low
Unassigned

Bug Description

I am adding three pre-built maas nodes to a juju model via `juju add-machine --debug ssh:ubuntu@<IP>`.
The first and third machines were added successfully, but the second one has been in pending state for several hours.
Removing the machine and readding it does not help.
No messages related to that machine are ever printed in the debug-log of the model they're being added to.
Adding --debug to the command shows that ssh authentication succeeds, but nothing further happens (for comparison, on the other machines it took about 2 minutes to go from 'Detecting series and characteristics' to completion).

$ juju add-machine --debug ssh:ubuntu@100.100.184.17
08:11:59 INFO juju.cmd supercommand.go:57 running juju [2.6.9 gc go1.10.4]
08:11:59 DEBUG juju.cmd supercommand.go:58 args: []string{"/snap/juju/9102/bin/juju", "add-machine", "--debug", "ssh:ubuntu@100.100.184.17"}
08:11:59 INFO juju.juju api.go:67 connecting to API addresses: [100.100.185.33:17070 100.100.185.104:17070 100.100.185.23:17070]
08:11:59 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://100.100.185.104:17070/model/5ba2a092-c7ec-4fd2-8d90-70511fe9bf00/api"
08:11:59 INFO juju.api apiclient.go:624 connection established to "wss://100.100.185.104:17070/model/5ba2a092-c7ec-4fd2-8d90-70511fe9bf00/api"
08:11:59 INFO juju.cmd.juju.machine add.go:246 load config
08:11:59 INFO juju.juju api.go:67 connecting to API addresses: [100.100.185.104:17070 100.100.185.33:17070 100.100.185.23:17070]
08:11:59 DEBUG juju.api apiclient.go:1092 successfully dialed "wss://100.100.185.104:17070/model/5ba2a092-c7ec-4fd2-8d90-70511fe9bf00/api"
08:11:59 INFO juju.api apiclient.go:624 connection established to "wss://100.100.185.104:17070/model/5ba2a092-c7ec-4fd2-8d90-70511fe9bf00/api"
08:11:59 INFO juju.juju api.go:303 API endpoints changed from [100.100.185.104:17070 100.100.185.23:17070 100.100.185.33:17070] to [100.100.185.104:17070 100.100.185.33:17070 100.100.185.23:17070]
08:11:59 INFO cmd authkeys.go:114 Adding contents of "/home/jujumanage/.local/share/juju/ssh/juju_id_rsa.pub" to authorized-keys
08:11:59 INFO cmd authkeys.go:114 Adding contents of "/home/jujumanage/.ssh/id_rsa.pub" to authorized-keys
08:11:59 INFO juju.environs.manual.sshprovisioner sshprovisioner.go:43 initialising "100.100.184.17", user "ubuntu"
08:11:59 DEBUG juju.utils.ssh ssh.go:305 using OpenSSH ssh client
08:12:00 INFO juju.environs.manual.sshprovisioner sshprovisioner.go:54 ubuntu user is already initialised
08:12:00 INFO juju.environs.manual.sshprovisioner sshprovisioner.go:167 Checking if 100.100.184.17 is already provisioned
08:12:00 DEBUG juju.utils.ssh ssh.go:305 using OpenSSH ssh client
08:12:01 INFO juju.environs.manual.sshprovisioner sshprovisioner.go:102 Detecting series and characteristics on 100.100.184.17
08:12:01 DEBUG juju.utils.ssh ssh.go:305 using OpenSSH ssh client
08:12:02 INFO juju.environs.manual.sshprovisioner sshprovisioner.go:158 series: bionic, characteristics: arch=amd64 cores=40 mem=385606M

I don't see any relevant process in the process table of 100.100.184.17 that would suggest juju is doing anything to the machine.
Neither /var/lib/juju nor /var/log/juju ever get created.

The only difference I can see between this stuck machine and the others is that the one that is having problems is currently the pgsql master and is holding both the maas and the postgres VIPs. I don't however see how that could be relevant since 100.100.184.17 is the private address of that machine, not a VIP.

How can I gather more information about what juju is doing on 100.100.184.17?

Revision history for this message
Joseph Phillips (manadart) wrote :

See if there is anything pertinent in:
- /var/log/cloud-init.log
- /var/log/cloud-init-output.log

Changed in juju:
status: New → Triaged
milestone: none → 2.7-rc1
Revision history for this message
Andrea Ieri (aieri) wrote :

Thank you! That really helped. I had cloud-init pinned and cloud-init-output.log was full of these:

Running apt-get upgrade
Reading package lists...
Building dependency tree...
Reading state information...
Calculating upgrade...
The following packages were automatically installed and are no longer required:
  dbconfig-common dbconfig-pgsql
Use 'sudo apt autoremove' to remove them.
The following packages will be upgraded:
  apt apt-utils base-files bsdutils dpkg dpkg-dev fdisk grep landscape-common
  libapt-inst2.0 libapt-pkg5.0 libblkid1 libbluetooth3 libdns-export1100
  libdpkg-perl libfdisk1 libisc-export169 libldap-2.4-2 libldap-common
  libmount1 libnss-systemd libpam-systemd libprocps6 librados2 librbd1
  libsmartcols1 libsystemd0 libudev1 libuuid1 libvirt-bin libvirt-clients
  libvirt-daemon libvirt-daemon-driver-storage-rbd libvirt-daemon-system
  libvirt0 maas-dns mount netplan.io nplan procps python3-software-properties
  qemu-block-extra qemu-kvm qemu-system-common qemu-system-x86 qemu-utils
  snapd software-properties-common sosreport systemd systemd-sysv thermald
  udev util-linux uuid-runtime xkb-data
The following packages will be DOWNGRADED:
  cloud-init
56 upgraded, 0 newly installed, 1 downgraded, 0 to remove and 0 not upgraded.
E: Packages were downgraded and -y was used without --allow-downgrades.

Removing the apt preference config file allowed cloud-init to complete and the node addition to succeed.

I think it would be very helpful if the machine status was more informative (e.g. even something as simple as "running cloud-init"). I'll amend the subject of this bug report and turn it into an RFE.

summary: - manually provisioned machine stuck in pending state
+ [RFE] make machine status messages more informative
Revision history for this message
Andrea Ieri (aieri) wrote :

Ok, that's slightly incorrect actually: having cloud-init pinned was not the problem, the problem was having cloud-init pinned _and_ having a newer cloud-init version installed.

$ apt policy cloud-init
cloud-init:
  Installed: 19.1-1-gbaa47854-0ubuntu1~18.04.1
  Candidate: 18.2-14-g6d48d265-0ubuntu1
  Version table:
     19.2-36-g059d049c-0ubuntu2~18.04.1 500
        500 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
 *** 19.1-1-gbaa47854-0ubuntu1~18.04.1 100
        100 /var/lib/dpkg/status
     18.2-14-g6d48d265-0ubuntu1 1001
        500 http://archive.ubuntu.com/ubuntu bionic/main amd64 Packages

Anyway, that doesn't change the fact that after having struggled with juju logs for a long time, a slight nudge towards cloud-init logs was enough to make me solve this quickly.

Changed in juju:
importance: Undecided → Medium
milestone: 2.7-rc1 → none
Revision history for this message
Canonical Juju QA Bot (juju-qa-bot) wrote :

This bug has not been updated in 2 years, so we're marking it Low importance. If you believe this is incorrect, please update the importance.

Changed in juju:
importance: Medium → Low
tags: added: expirebugs-bot
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.