[2.0 rc1] juju can't access vSphere VM deployed with Xenial, cloud-init fails to set SSH keys

Bug #1588041 reported by Larry Michel
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Unassigned
cloud-init
Fix Released
Undecided
Unassigned

Bug Description

I tried to do a bootstrap with vsphere as a provider using vsphere 6.0 and juju 1.25.5.
-----------------------------------------------------------------
  vsphere:
    type: vsphere
    host: '**.***.*.***'
    user: '<email address hidden>'
    password: '**********'
    datacenter: 'dc0'
    bootstrap-timeout: 1800
    logging-config: "<root>=DEBUG;juju=DEBUG;golxc=TRACE;juju.container.lxc=TRACE"
    agent-stream: released
-----------------------------------------------------------------

Initially, I did not specify the default series and bootstrap VM deployed with Xenial, however, juju could not connect to it after getting the address and seems stuck trying to connect and I had to CTRL-C:

-----------------------------------------------------------------
$ juju bootstrap -e vsphere
ERROR the "vsphere" provider is provisional in this version of Juju. To use it anyway, set JUJU_DEV_FEATURE_FLAGS="vsphere-provider" in your shell environment
$ export JUJU_DEV_FEATURE_FLAGS="vsphere-provider"
$ juju bootstrap -e vsphere
Bootstrapping environment "vsphere"
Starting new instance for initial state server
Launching instance
 - juju-e33e5800-edd9-4af7-8654-6d59b1e98eb9-machine-0
Installing Juju agent on bootstrap instance
Waiting for address
Attempting to connect to 10.245.39.94:22
Attempting to connect to fe80::250:56ff:fead:1b03:22
^CInterrupt signalled: waiting for bootstrap to exit
ERROR failed to bootstrap environment: interrupted
-----------------------------------------------------------------

When I specified the default series to be trusty, it worked:
-----------------------------------------------------------------
  vsphere:
    type: vsphere
    host: '**.***.*.***'
    user: '<email address hidden>'
    password: '**********'
    datacenter: 'dc0'
    default-series: trusty
    bootstrap-timeout: 1800
    logging-config: "<root>=DEBUG;juju=DEBUG;golxc=TRACE;juju.container.lxc=TRACE"
    agent-stream: released
-----------------------------------------------------------------

This was the output:

-----------------------------------------------------------------
$ juju bootstrap -e vsphere
Bootstrapping environment "vsphere"
Starting new instance for initial state server
Launching instance
 - juju-b157863b-3ed4-4ae5-8c3c-82ae7629bff7-machine-0
Installing Juju agent on bootstrap instance
Waiting for address
Attempting to connect to 10.245.45.153:22
Attempting to connect to fe80::250:56ff:fead:3fa2:22
Warning: Permanently added '10.245.45.153' (ECDSA) to the list of known hosts.
sudo: unable to resolve host ubuntuguest
Logging to /var/log/cloud-init-output.log on remote host
Running apt-get update
Running apt-get upgrade
Installing package: curl
Installing package: cpu-checker
Installing package: bridge-utils
Installing package: rsyslog-gnutls
Installing package: cloud-utils
Installing package: cloud-image-utils
Installing package: tmux
Fetching tools: curl -sSfw 'tools from %{url_effective} downloaded: HTTP %{http_code}; time %{time_total}s; size %{size_download} bytes; speed %{speed_download} bytes/s ' --retry 10 -o $bin/tools.tar.gz <[https://streams.canonical.com/juju/tools/agent/1.25.5/juju-1.25.5-trusty-amd64.tgz]>
Bootstrapping Juju machine agent
Starting Juju machine agent (jujud-machine-0)
Bootstrap agent installed
vsphere -> vsphere
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Waiting for API to become available
Bootstrap complete
-----------------------------------------------------------------

Larry Michel (lmic)
description: updated
Larry Michel (lmic)
tags: added: vsphere
Revision history for this message
James Tunnicliffe (dooferlad) wrote :

Can you confirm that Xenial boots if you just launch it using vsphere as a host? If it does can we have all of /var/log please? My hunch is cloud-init is running something that hangs, which we may be responsible for.

Changed in juju-core:
status: New → Incomplete
Revision history for this message
Larry Michel (lmic) wrote :

James, Xenial does boot. I can see it on the console but I don't have the keys to ssh into it. Where do I get those?

Changed in juju-core:
status: Incomplete → New
Revision history for this message
Cheryl Jennings (cherylj) wrote :

Larry - you can try to bootstrap with xenial and use the --keep-broken flag for it to not tear down the machine upon failure. Then you should be able to juju ssh into it (as long as the machine got far enough along to import the keys).

Once you can ssh into the machine, it would be useful to have:
- /var/log/cloud-init-output.log
- /var/log/juju/* if anything exists.

Changed in juju-core:
status: New → Incomplete
Revision history for this message
Larry Michel (lmic) wrote :

Per our conversation, it does not look like cloud-init sets things up correctly if it runs at all. Adding pastebin output from debug session that you requested Cheryl: https://pastebin.canonical.com/158212/

Changed in juju-core:
status: Incomplete → New
Revision history for this message
Cheryl Jennings (cherylj) wrote :

This error is puzzling:
2016-06-07 16:37:40 DEBUG juju.provider.common bootstrap.go:326 connection attempt for fe80::250:56ff:fead:d9ff failed: ssh: connect to host fe80::250:56ff:fead:d9ff port 22: Invalid argument

Pinged Larry on IRC to run the bootstrap with TRACE to see the actual command being used for ssh.

Revision history for this message
Cheryl Jennings (cherylj) wrote :

Worked with Larry and determined that the cloud-config.txt file is empty on xenial and not on trusty. Going to take another look at how juju is putting this there before pulling in smoser.

Changed in juju-core:
status: New → Triaged
importance: Undecided → High
tags: added: oil-2.0
Revision history for this message
Anastasia (anastasia-macmood) wrote :

It's Critical for Larry's project

Changed in juju-core:
milestone: none → 2.0.0
David Britton (dpb)
tags: added: landscape
affects: juju-core → juju
Changed in juju:
milestone: 2.0.0 → none
milestone: none → 2.0.0
Changed in juju:
milestone: 2.0.0 → 2.0-beta18
Curtis Hovey (sinzui)
tags: added: ci
Revision history for this message
Curtis Hovey (sinzui) wrote :
Download full text (9.0 KiB)

This issue affect juju ci.

$ cat vs.yaml
bootstrap-timeout: 1200
default-series: xenial
enable-os-upgrade: false
test-mode: true
authorized-keys: |
  ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCrXoztXs5X89N7SwxGjXYjL4M8Coclk+/blvzyiqZDaWsH234DETiv3Rwc2wJlEk3K1HSLnpBQHL6HQME7j/PpMaFGiJqD0tfF0JU2Kj6FEsgV43IR0YvXm0/2EvzO4NMplukmJVUPIAa++Tpl/72F+t8t2mSK73PwzeycpC+X9z/NC5EwsOMH87NYrM1HdwwZlz2GJswcG0IHDB/5oKV4nPMkm6EFweKt4N5HrRjA9l7y3tUbNhGMuEJVIbskfn6LTdir6CghRHY+OT70RrN+gqRVw3y/GvrfVE3m8ZqgbqAFE9UDBXDSI8AX2dYAlmvTI/B9X2QiMrNC0DXvnY/X oil-ci-bot
  ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9aoIHpnF4Y4w3PvUP0rhruNRaCFf0kPpYqM8V8uHvm/z//S2EE4wjIN3mxBv97kwGRmJWTYm8wu2ZepKxqsiH1LMpPiLHM2vOZyRkrz7wprHwxwH0ler6lZvM6hrWG7Pae6UcQuDASkV89hNdLPE20whyhIl07uPt+hjpMXJM82aSAhT57/rdOSRMgV518j1Aq/xNpbIKIaXA9P/TTHtLOxJvItKialx+MS0BSiE7O9dE8MeR4SyaAQr+yH81I5TlB1D2Qot80CB7eCiV0gSGt/azLhWTX/sDItQ2VVBxaYcEj9eoOUiVCDKx6eQ7tMJdWE/63wKYCs07sbu5G4Z3 juju-client-key
  ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDyEdd8eDyy2WA6Q+W4GV2NqKX+yfkkW5ogSu3/7DjlqjED7dCNkevq2qpdR1AMkbKTouihWdyc8QKl9lzuwn0zoocXJUVgIOPV+KFxSAhj1djGvryHDjYUwdrLYUMu3CUFeIsRao2cn7EgZs0w1Y1quqr9c8cEg7XsAs0ZMN9YksEjG000VupOIZJNtk+5EYJm/6vNFI83IOn7ctWfjXymBuh7XM8d8vszyYDRdeDXY5Q9VLqHOP7/CFteIvcdHnSC1ObQuKzXRWz+m9thgQnRQjvirdwvDUXhjjQk9MNJZj84EukB8HyAVSN863MfuVGoCsNn7iEdtT6W2nKTWyL3 abentley@speedy
agent-metadata-url: https://swift.canonistack.canonical.com/v1/AUTH_526ad877f3e3464589dc1145dfeaac60/juju-dist/parallel-testing/agents

$ juju --debug bootstrap curtis vsphere/dc0 --constraints mem=2G --config vs.yaml
16:31:55 INFO juju.cmd supercommand.go:63 running juju [2.0-beta17 gc go1.6]
16:31:55 DEBUG juju.cmd.juju.commands bootstrap.go:473 provider attrs: map[]
16:31:55 DEBUG juju.cmd.juju.commands bootstrap.go:522 preparing controller with config: map[enable-os-upgrade:false type:vsphere uuid:f4d002c5-ccc4-4373-8f1d-fb516597c67e test-mode:true authorized-keys:ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQCrXoztXs5X89N7SwxGjXYjL4M8Coclk+/blvzyiqZDaWsH234DETiv3Rwc2wJlEk3K1HSLnpBQHL6HQME7j/PpMaFGiJqD0tfF0JU2Kj6FEsgV43IR0YvXm0/2EvzO4NMplukmJVUPIAa++Tpl/72F+t8t2mSK73PwzeycpC+X9z/NC5EwsOMH87NYrM1HdwwZlz2GJswcG0IHDB/5oKV4nPMkm6EFweKt4N5HrRjA9l7y3tUbNhGMuEJVIbskfn6LTdir6CghRHY+OT70RrN+gqRVw3y/GvrfVE3m8ZqgbqAFE9UDBXDSI8AX2dYAlmvTI/B9X2QiMrNC0DXvnY/X oil-ci-bot
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC9aoIHpnF4Y4w3PvUP0rhruNRaCFf0kPpYqM8V8uHvm/z//S2EE4wjIN3mxBv97kwGRmJWTYm8wu2ZepKxqsiH1LMpPiLHM2vOZyRkrz7wprHwxwH0ler6lZvM6hrWG7Pae6UcQuDASkV89hNdLPE20whyhIl07uPt+hjpMXJM82aSAhT57/rdOSRMgV518j1Aq/xNpbIKIaXA9P/TTHtLOxJvItKialx+MS0BSiE7O9dE8MeR4SyaAQr+yH81I5TlB1D2Qot80CB7eCiV0gSGt/azLhWTX/sDItQ2VVBxaYcEj9eoOUiVCDKx6eQ7tMJdWE/63wKYCs07sbu5G4Z3 juju-client-key
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDyEdd8eDyy2WA6Q+W4GV2NqKX+yfkkW5ogSu3/7DjlqjED7dCNkevq2qpdR1AMkbKTouihWdyc8QKl9lzuwn0zoocXJUVgIOPV+KFxSAhj1djGvryHDjYUwdrLYUMu3CUFeIsRao2cn7EgZs0w1Y1quqr9c8cEg7XsAs0ZMN9YksEjG000VupOIZJNtk+5EYJm/6vNFI83IOn7ctWfjXymBuh7XM8d8vszyYDRdeDXY5Q9VLqHOP7/CFteIvcdHnSC1ObQuKzXRWz+m9thgQnRQjvirdwvDUXhjjQk9MNJZj84EukB8HyAVSN863MfuVGoCsNn7iEdtT6W2nKTWyL3 abentley@speedy
 agent-metadata-url:https://swift.canonistack.canonical.com/v1/AUTH_526ad877f3e3464589dc1145dfe...

Read more...

tags: added: jujuqa
Revision history for this message
Curtis Hovey (sinzui) wrote :
Curtis Hovey (sinzui)
Changed in juju-core:
status: New → Triaged
importance: Undecided → Medium
no longer affects: juju-core
Curtis Hovey (sinzui)
Changed in juju:
milestone: 2.0-beta18 → 2.0-beta19
Changed in juju:
milestone: 2.0-beta19 → 2.0-rc1
Changed in juju:
milestone: 2.0-rc1 → 2.0.0
Revision history for this message
Larry Michel (lmic) wrote :
Download full text (3.3 KiB)

I tried to deploy a bundle with mixed series and all the Trusty came up and the Xenial one stayed pending. It looks to be related as the machines are up and keys are not set up.

jenkins@lmic-s9-instance:~/vmware$ juju status
MODEL CONTROLLER CLOUD/REGION VERSION
default vspherecontroller-beta18 vsphere/dc0 2.0-beta18

APP VERSION STATUS SCALE CHARM STORE REV OS NOTES
elasticsearch active 2 elasticsearch jujucharms 18 ubuntu
etcd unknown 0/3 etcd jujucharms 8 ubuntu
filebeat unknown 0 filebeat jujucharms 4 ubuntu
kibana active 1 kibana jujucharms 14 ubuntu exposed
kubernetes unknown 0/3 kubernetes jujucharms 8 ubuntu exposed
topbeat unknown 0 topbeat jujucharms 4 ubuntu

RELATION PROVIDES CONSUMES TYPE
peer elasticsearch elasticsearch peer
elasticsearch elasticsearch filebeat regular
rest elasticsearch kibana regular
elasticsearch elasticsearch topbeat regular
cluster etcd etcd peer
beats-host etcd filebeat subordinate
etcd etcd kubernetes regular
beats-host etcd topbeat subordinate
juju-info filebeat kubernetes regular
beats-host kubernetes filebeat subordinate
certificates kubernetes kubernetes peer
beats-host kubernetes topbeat subordinate

UNIT WORKLOAD AGENT MACHINE PUBLIC-ADDRESS PORTS MESSAGE
elasticsearch/0 active idle 0 10.245.61.122 9200/tcp Ready
elasticsearch/1 active idle 1 10.245.61.121 9200/tcp Ready
etcd/0 unknown allocating 2 10.245.61.123 Waiting for agent initialization to finish
etcd/1 unknown allocating 3 10.245.61.124 Waiting for agent initialization to finish
etcd/2 unknown allocating 4 10.245.61.125 Waiting for agent initialization to finish
kibana/0 active idle 5 10.245.61.129 80/tcp,9200/tcp ready
kubernetes/0 unknown allocating 6 10.245.61.126 Waiting for agent initialization to finish
kubernetes/1 unknown allocating 7 10.245.61.127 Waiting for agent initialization to finish
kubernetes/2 unknown allocating 8 10.245.61.128 Waiting for agent initialization to finish

MACHINE STATE DNS INS-ID SERIES AZ
0 started 10.245.61.122 juju-efa5c6-0 trusty
1 started 10.245.61.121 juju-efa5c6-1 trusty
2 pending 10.245.61.123 juju-efa5c6-2 xenial
3 pending 10.245.61.124 juju-efa5c6-3 xenial
4 pending 10.245.61.125 juju-efa5c6-4 xenial
5 started 10.245.61.129 juju-efa5c6-5 trusty
6 pending 10.245.61.126 juju-efa5c6-6 xenial
7 pending 10.2...

Read more...

Larry Michel (lmic)
summary: - juju bootstrap with vsphere provider hangs with xenial
+ juju bootstrap with vsphere provider hangs with xenial, cloud-init
+ doesn't set things up - keys not found
Larry Michel (lmic)
summary: - juju bootstrap with vsphere provider hangs with xenial, cloud-init
- doesn't set things up - keys not found
+ juju can't access vSphere VM deployed with Xenial, cloud-init fails to
+ set SSH keys
summary: - juju can't access vSphere VM deployed with Xenial, cloud-init fails to
- set SSH keys
+ [2.0 rc1] juju can't access vSphere VM deployed with Xenial, cloud-init
+ fails to set SSH keys
Revision history for this message
Robert C Jennings (rcj) wrote :

Looking at the juju vsphere provider code I see that the metadata is passed in via the OVA during image import and have confirmed with Larry through cloud-init logs on trusty that the OVF cloud-init data source is used there via virtual CD-ROM to feed in the userdata and ssh keys.

Larry and I are working on grabbing the content of the virtual CD attached to the Xenial instance to ensure that it received user-data (previously there was a note in this bug that none was present and we need to confirm).

Revision history for this message
Robert C Jennings (rcj) wrote :

I have downloaded the xenial ova used by the provider and confirmed that cloud-init is configured to make use of the OVF datasource.

Revision history for this message
Robert C Jennings (rcj) wrote :

Verified that the user data from the ISO attached to the xenial instance looks decent (don't know that it'll work, but there's data)

Revision history for this message
Robert C Jennings (rcj) wrote :

I have downloaded a drive image of a xenial VM that Juju attempted to boot and found that cloud-init could not parse the user-data attached to the instance. Here is the relevant error from /var/log/cloud-init.log. Attaching /var/lib/cloud, /var/log, and /etc/cloud to this bug for further research.

Sep 22 18:47:54 ubuntu [CLOUDINIT] __init__.py[DEBUG]: {'Content-Type': 'text/x-not-multipart', 'Content-Disposition': 'attachment; filename="part-001"', 'MIME-Version': '1.0'}
Sep 22 18:47:54 ubuntu [CLOUDINIT] __init__.py[WARNING]: Unhandled non-multipart (text/x-not-multipart) userdata: 'b'I2Nsb3VkLWNvbmZpZwphcHRf'...'

affects: cloud-images → cloud-init
Revision history for this message
Robert C Jennings (rcj) wrote :

Scott, can you review the user-data and advise?

Revision history for this message
Larry Michel (lmic) wrote :

One more data point is that Yakkety works. I was able to bootstrap a new controller:

juju bootstrap vspherecontroller-beta-yak vsphere/dc0 --debug --config default-series=yakkety --config image-stream=daily

It downloaded the ova from:

Downloading ova file from url: http://cloud-images.ubuntu.com/daily/server/yakkety/20160918/yakkety-server-cloudimg-amd64.ova

Revision history for this message
Robert C Jennings (rcj) wrote :

And this failure is occurring now with a xenial release image with serial 20160921 which contains cloud-init 0.7.7~bzr1256-0ubuntu1~16.04.1

http://cloud-images.ubuntu.com/releases/xenial/release/unpacked/ubuntu-16.04-server-cloudimg-amd64.manifest

Revision history for this message
Scott Moser (smoser) wrote :

i have a hunch that this is a dupe of bug 1619394 which landed in xenial 2 hours ago.
You'll need an image with cloud-init inside (or patch an image) to try.

Revision history for this message
Robert C Jennings (rcj) wrote :

Larry, keep an eye on http://cloud-images.ubuntu.com/daily/server/xenial/ for a serial with a date of 20160923 or later. Then re-test by bootstrapping with "--config image-stream=daily". That should pick up the cloud-init package released today.

Revision history for this message
Robert C Jennings (rcj) wrote :

And you can confirm by looking @ http://cloud-images.ubuntu.com/daily/server/xenial/<IMAGE_SERIAL>/unpacked/xenial-server-cloudimg-amd64.manifest to find the version of cloud-init, it will need to be 0.7.8-1-g3705bb5-0ubuntu1~16.04.1

Revision history for this message
Larry Michel (lmic) wrote :
Download full text (3.3 KiB)

I tested this morning and looks like it is fixed as Juju is reporting the node as active now:

MACHINE STATE DNS INS-ID SERIES AZ
0 started 10.245.61.193 juju-a64e14-0 trusty
1 started 10.245.61.194 juju-a64e14-1 trusty
2 started fe80::1 juju-a64e14-2 xenial
3 started fe80::1 juju-a64e14-3 xenial
4 started fe80::1 juju-a64e14-4 xenial
5 started 10.245.61.197 juju-a64e14-5 trusty
6 started fe80::1 juju-a64e14-6 xenial
7 started fe80::1 juju-a64e14-7 xenial
8 started fe80::1 juju-a64e14-8 xenial

But they have a IPV6 IP address and that gives errors:

jenkins@lmic-s9-instance:~/kubernetes$ juju ssh 3
ssh: connect to host fe80::1 port 22: Invalid argument

It's not clear to me either how that's affecting the charms although most of them come up, but there are some errors and I'm not clear now whether they're related to this:

jenkins@lmic-s9-instance:~/kubernetes$ juju status
MODEL CONTROLLER CLOUD/REGION VERSION
default vspherecontroller-beta18 vsphere/dc0 2.0-rc1

APP VERSION STATUS SCALE CHARM STORE REV OS NOTES
elasticsearch active 2 elasticsearch jujucharms 18 ubuntu
etcd error 3 etcd jujucharms 8 ubuntu
filebeat active 3 filebeat jujucharms 4 ubuntu
kibana active 1 kibana jujucharms 14 ubuntu exposed
kubernetes error 3 kubernetes jujucharms 8 ubuntu exposed
topbeat active 3 topbeat jujucharms 4 ubuntu

UNIT WORKLOAD AGENT MACHINE PUBLIC-ADDRESS PORTS MESSAGE
elasticsearch/0 active idle 0 10.245.61.193 9200/tcp Ready
elasticsearch/1 active idle 1 10.245.61.194 9200/tcp Ready
etcd/0 error idle 2 fe80::1 2379/tcp hook failed: "certificates-relation-changed"
  filebeat/0 active idle fe80::1 Filebeat ready
  topbeat/0 active idle fe80::1 Topbeat ready
etcd/1 maintenance idle 3 fe80::1 Attempting install of etcd from apt
  filebeat/2 active idle fe80::1 Filebeat ready
  topbeat/2 active idle fe80::1 Topbeat ready
etcd/2 active idle 4 fe80::1 2379/tcp cluster is healthy
  filebeat/1 active idle fe80::1 Filebeat ready
  topbeat/1 active idle fe80::1 Topbeat ready
kibana/0 active idle 5 10.245.61.197 80/tcp,9200/tcp ready
kubernetes/0 maintenance executing 6 fe80::1 (install) Installing docker-engine from apt
kubernetes/1 maintenance ex...

Read more...

Revision history for this message
Larry Michel (lmic) wrote :

Also to note is that the controller node is also Xenial and that worked.

This is the model config:

jenkins@lmic-s9-instance:~/kubernetes$ juju model-config
ATTRIBUTE FROM VALUE
agent-metadata-url default ""
agent-stream default released
agent-version model 2.0-rc1
apt-ftp-proxy default ""
apt-http-proxy default ""
apt-https-proxy default ""
apt-mirror default ""
automatically-retry-hooks default true
default-series default xenial
development default false
disable-network-management default false
enable-os-refresh-update default true
enable-os-upgrade default true
external-network model ""
firewall-mode default instance
ftp-proxy default ""
http-proxy default ""
https-proxy default ""
ignore-machine-addresses default false
image-metadata-url default ""
image-stream model daily
logforward-enabled default false
logging-config model <root>=DEBUG;unit=DEBUG
no-proxy default ""
provisioner-harvest-mode default destroyed
proxy-ssh default false
resource-tags model {}
ssl-hostname-verification default true
test-mode default false
transmit-vendor-metrics default true

Changed in juju:
status: Triaged → Fix Released
Joshua Powers (powersj)
Changed in cloud-init:
status: New → Fix Released
Revision history for this message
James Falcon (falcojr) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.