'juju ssh <unit> ...' fails with Permission denied (publickey), for only one or two machines in a deployment

Bug #1635622 reported by Nell Jerram
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Expired
High
Unassigned

Bug Description

It appears that Juju fails to copy a public SSH key to one or two machines, even though it does to all of the other machines.

Here's a transcript from a failed test run that shows what I guess are the relevant details:

[...]
The following NEW packages will be installed:
[...]
  juju juju-2.0 libalgorithm-diff-perl libalgorithm-diff-xs-perl
[...]
Get:119 http://ppa.launchpad.net/juju/devel/ubuntu xenial/main amd64 juju all 1:2.0.0-0ubuntu1~16.04.2~juju1 [10.4 kB]
[...]
Setting up juju (1:2.0.0-0ubuntu1~16.04.2~juju1) ...
Processing triggers for libc-bin (2.23-0ubuntu3) ...
Since Juju 2 is being run for the first time, downloading latest cloud information.
Fetching latest public cloud list...
Updated your list of public clouds with 1 cloud region added:

    added cloud region:
        - aws/us-east-2
Credentials updated for cloud "google".
Creating Juju controller "bundle-fv" on google/us-central1
Looking for packaged Juju agent version 2.0.0 for amd64
Launching controller instance(s) on google/us-central1...
 - juju-2eb287-0 (arch=amd64 mem=1.7G cores=1)
Fetching Juju GUI 2.2.1
Waiting for address
Attempting to connect to 104.198.213.7:22
Attempting to connect to 10.240.0.29:22
Logging to /var/log/cloud-init-output.log on the bootstrap machine
Running apt-get update
Running apt-get upgrade
Installing curl, cpu-checker, bridge-utils, cloud-utils, tmux
Fetching Juju agent version 2.0.0 for amd64
Installing Juju machine agent
Starting Juju machine agent (service jujud-machine-0)
Bootstrap agent now started
Contacting Juju controller at 10.240.0.29 to verify accessibility...
Bootstrap complete, "bundle-fv" controller now available.
Controller machines are in the "controller" model.
Initial model "default" added.
[...]

+ juju set-model-constraints mem=3G

[ juju deploy cs:~project-calico/bundle/calico-newton-juju2 ]

Deploying charm "cs:~project-calico/bird-5"
Deploying charm "cs:xenial/cinder-3"
Deploying charm "cs:etcd-8"
Deploying charm "cs:xenial/glance-2"
Deploying charm "cs:xenial/keystone-2"
Deploying charm "cs:trusty/mysql-38"
Deploying charm "cs:~project-calico/neutron-api-11"
Deploying charm "cs:~project-calico/neutron-calico-18"
Deploying charm "cs:xenial/nova-cloud-controller-4"
Deploying charm "cs:xenial/nova-compute-3"
Deploying charm "cs:xenial/openstack-dashboard-2"
Deploying charm "cs:xenial/rabbitmq-server-5"
Related "nova-cloud-controller:image-service" and "glance:image-service"
Related "neutron-calico:etcd-proxy" and "etcd:proxy"
Related "neutron-api:etcd-proxy" and "etcd:proxy"
Related "neutron-calico:amqp" and "rabbitmq-server:amqp"
Related "neutron-calico:neutron-plugin-api" and "neutron-api:neutron-plugin-api"
Related "openstack-dashboard:identity-service" and "keystone:identity-service"
Related "neutron-calico:neutron-plugin" and "nova-compute:neutron-plugin"
Related "nova-compute:cloud-compute" and "nova-cloud-controller:cloud-compute"
Related "nova-compute:amqp" and "rabbitmq-server:amqp"
Related "nova-cloud-controller:identity-service" and "keystone:identity-service"
Related "cinder:cinder-volume-service" and "nova-cloud-controller:cinder-volume-service"
Related "cinder:amqp" and "rabbitmq-server:amqp"
Related "glance:image-service" and "cinder:image-service"
Related "neutron-api:neutron-api" and "nova-cloud-controller:neutron-api"
Related "neutron-api:amqp" and "rabbitmq-server:amqp"
Related "neutron-calico:bgp-route-reflector" and "bird:bgp-route-reflector"
Related "nova-cloud-controller:shared-db" and "mysql:shared-db"
Related "nova-cloud-controller:amqp" and "rabbitmq-server:amqp"
Related "nova-compute:image-service" and "glance:image-service"
Related "glance:identity-service" and "keystone:identity-service"
Related "mysql:shared-db" and "keystone:shared-db"
Related "nova-compute:shared-db" and "mysql:shared-db"
Related "glance:shared-db" and "mysql:shared-db"
Related "mysql:shared-db" and "cinder:shared-db"
Related "cinder:identity-service" and "keystone:identity-service"
Related "neutron-api:shared-db" and "mysql:shared-db"
Related "neutron-api:identity-service" and "keystone:identity-service"
Deploy of bundle completed.
+ juju add-unit nova-compute
+ wait_for_units 16 unit_ready 1800
[...]
+ juju status
       Unit Workload Agent Machine Public address Ports Message
READY: bird/0* unknown idle 0 104.198.253.104 9905/tcp,9906/tcp
READY: cinder/0* active idle 1 104.154.89.222 8776/tcp Unit is ready
READY: etcd/0* active idle 2 104.197.159.19 2379/tcp (leader) cluster is healthy
READY: glance/0* active idle 0 104.198.253.104 9292/tcp Unit is ready
READY: keystone/0* active idle 3 130.211.117.245 5000/tcp Unit is ready
READY: mysql/0* unknown idle 4 104.155.162.37 3306/tcp
READY: neutron-api/0* active idle 5 130.211.126.27 9696/tcp Unit is ready
READY: nova-cloud-controller/0* active idle 6 104.198.17.14 8774/tcp Unit is ready
READY: nova-compute/0* active idle 7 104.154.174.220 Unit is ready
READY: neutron-calico/0* unknown idle 104.154.174.220
READY: nova-compute/1 active idle 8 104.198.62.248 Unit is ready
READY: neutron-calico/1 unknown idle 104.198.62.248
READY: nova-compute/2 active idle 10 104.197.215.208 Unit is ready
READY: neutron-calico/2 unknown idle 104.197.215.208
READY: openstack-dashboard/0* active idle 9 104.198.62.168 80/tcp,443/tcp Unit is ready
READY: rabbitmq-server/0* active idle 0 104.198.253.104 5672/tcp Unit is ready

       Machine State DNS Inst id Series AZ
       0 started 104.198.253.104 juju-b51532-0 xenial us-central1-a
       1 started 104.154.89.222 juju-b51532-1 xenial us-central1-b
       2 started 104.197.159.19 juju-b51532-2 xenial us-central1-c
       3 started 130.211.117.245 juju-b51532-3 xenial us-central1-f
       4 started 104.155.162.37 juju-b51532-4 trusty us-central1-f
       5 started 130.211.126.27 juju-b51532-5 xenial us-central1-a
       6 started 104.198.17.14 juju-b51532-6 xenial us-central1-b
       7 started 104.154.174.220 juju-b51532-7 xenial us-central1-a
       8 started 104.198.62.248 juju-b51532-8 xenial us-central1-b
       9 started 104.198.62.168 juju-b51532-9 xenial us-central1-c
       10 started 104.197.215.208 juju-b51532-10 xenial us-central1-c
[...]
++ juju run --unit neutron-api/0 'unit-get private-address'
+ neutron_ip=10.240.0.51
++ juju run --unit keystone/0 'unit-get private-address'
+ keystone_ip=10.240.0.39
+ juju ssh openstack-dashboard/0 'sudo apt-get install -y python-openstackclient'
[...]
Connection to 104.198.62.168 closed.
+ juju ssh openstack-dashboard/0 'wget -nv https://storage.googleapis.com/calico-jenkins/dl-cirros-0.3.4-x86_64-disk.img -O - | openstack --os-username=admin --os-password=openstack --os-project-name=admin --os-project-domain-name= --os-auth-url=http://10.240.0.39:35357/v2.0 image create --disk-format=qcow2 --container-format=bare --public cirros-0.3.2-x86_64'
[...]
Connection to 104.198.62.168 closed.
+ juju ssh openstack-dashboard/0 'wget -nv https://storage.googleapis.com/calico-jenkins/dl-ubuntu_w_ipv6_asterisk_apache.img -O - | openstack --os-username=admin --os-password=openstack --os-project-name=admin --os-project-domain-name= --os-auth-url=http://10.240.0.39:35357/v2.0 image create --disk-format=qcow2 --container-format=bare --public ipv6_enabled_image'
[...]
Connection to 104.198.62.168 closed.
+ computes='nova-compute/0 nova-compute/1 nova-compute/2'
+ for unit in '${computes}'
+ juju ssh nova-compute/0 'sudo service nova-api-metadata restart'
Permission denied (publickey).
+ for unit in '${computes}'
+ juju ssh nova-compute/1 'sudo service nova-api-metadata restart'
Connection to 104.198.62.248 closed.
+ for unit in '${computes}'
+ juju ssh nova-compute/2 'sudo service nova-api-metadata restart'
Connection to 104.197.215.208 closed.
[...]

Revision history for this message
Dimiter Naydenov (dimitern) wrote :

To see the actual cause of the error, please run and paste the output of e.g `juju --debug ssh nova-compute/0 -v`

Changed in juju-core:
status: New → Incomplete
Revision history for this message
Nell Jerram (neil-jerram) wrote :

Hmm, I just had a test run in which this problem did _not_ occur, after adding

  --config agent-stream=devel

to my 'juju bootstrap' invocation.

Coincidence? Or would it make sense for that to be the fix?

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Neil,

Adding agent-stream=devel allows Juju to use newer agents, the ones that have not been released yet. These agents may have the fix needed to address this problem.

So, probably not a coincidence \o/ It looks like this issue has been fixed and just has not been promoted yet to "released"

Revision history for this message
Nell Jerram (neil-jerram) wrote :
Download full text (3.5 KiB)

@Dimiter @Anastasia - Here is an example failure from a test run where I removed the '--config agent-stream=devel' again, and added '--debug' and '-v' to the juju ssh invocations:

+ juju --debug ssh nova-compute/0 -v 'sudo service nova-api-metadata restart'
14:32:38 INFO juju.cmd supercommand.go:63 running juju [2.0.0 gc go1.6.2]
14:32:38 DEBUG juju.cmd supercommand.go:64 args: []string{"juju", "--debug", "ssh", "nova-compute/0", "-v", "sudo service nova-api-metadata restart"}
14:32:38 INFO juju.juju api.go:72 connecting to API addresses: [104.198.70.102:17070 10.240.0.35:17070]
14:32:38 INFO juju.api apiclient.go:519 dialing "wss://104.198.70.102:17070/model/fedec934-9bd8-4a2d-83dc-c24426e4621e/api"
14:32:38 INFO juju.api apiclient.go:309 connection established to "wss://104.198.70.102:17070/model/fedec934-9bd8-4a2d-83dc-c24426e4621e/api"
14:32:38 DEBUG juju.juju api.go:263 API hostnames unchanged - not resolving
14:32:38 DEBUG juju.cmd.juju.commands ssh_common.go:226 proxy-ssh is false
14:32:38 DEBUG juju.utils.ssh ssh.go:292 using OpenSSH ssh client
OpenSSH_7.2p2 Ubuntu-4ubuntu1, OpenSSL 1.0.2g-fips 1 Mar 2016
debug1: Reading configuration data /home/ubuntu/.ssh/config
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 19: Applying options for *
debug1: Connecting to 104.154.32.63 [104.154.32.63] port 22.
debug1: Connection established.
debug1: identity file /home/ubuntu/.local/share/juju/ssh/juju_id_rsa type 1
debug1: key_load_public: No such file or directory
debug1: identity file /home/ubuntu/.local/share/juju/ssh/juju_id_rsa-cert type -1
debug1: identity file /home/ubuntu/.ssh/id_rsa type 1
debug1: key_load_public: No such file or directory
debug1: identity file /home/ubuntu/.ssh/id_rsa-cert type -1
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu1
debug1: Remote protocol version 2.0, remote software version OpenSSH_7.2p2 Ubuntu-4ubuntu2.1
debug1: match: OpenSSH_7.2p2 Ubuntu-4ubuntu2.1 pat OpenSSH* compat 0x04000000
debug1: Authenticating to 104.154.32.63:22 as 'ubuntu'
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: <email address hidden>
debug1: kex: host key algorithm: ecdsa-sha2-nistp256
debug1: kex: server->client cipher: <email address hidden> MAC: <implicit> compression: none
debug1: kex: client->server cipher: <email address hidden> MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

debug1: Server host key: ecdsa-sha2-nistp256 SHA256:AfXRezEW5wJ2IEHCQRsmjVWesrtc4QfW6NzwbIM1Kc0
debug1: Host '104.154.32.63' is known and matches the ECDSA host key.
debug1: Found key in /tmp/ssh_known_hosts849103164:3
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: rekey after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS received
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_input_ext_info: server-sig-algs=<rsa-sha2-256,rsa-sha2-512>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey
debug1: Next authentication method: publickey
debug1: Offering RSA ...

Read more...

Revision history for this message
Nell Jerram (neil-jerram) wrote :

@Dimiter @Anastasia - I have just had another occurrence of this problem in a test run which _does_ have '--config agent-stream=devel', so it appears that adding that config does _not_ fix the problem.

Revision history for this message
Nell Jerram (neil-jerram) wrote :

I guess there must be some logs somewhere - on the controller? - which could show us if there was a problem with copying the SSH key onto the machine for which 'juju ssh' fails?

Do you know how I can get those logs?

Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Neil

Can you ssh into the machine via normal means, i.e. by-passing Juju mechanism?

Revision history for this message
Nell Jerram (neil-jerram) wrote :

@Anastasia - No, the normal means fails also.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

Re-targeting to "juju", launchpad project where we are tracking Juju 2 concerns.

no longer affects: juju-core
Changed in juju:
status: New → Triaged
importance: Undecided → High
milestone: none → 2.1.0
Changed in juju:
assignee: nobody → Alexis Bruemmer (alexis-bruemmer)
Changed in juju:
assignee: Alexis Bruemmer (alexis-bruemmer) → nobody
Revision history for this message
Anastasia (anastasia-macmood) wrote :

@Neil

We have addressed a few issues related to ssh and keys. Could you please re-try with Juju 2.1-rc2?

I believe this is fixed as I cannot reproduce what you are seeing. Here is my output: https://pastebin.canonical.com/179825/

Changed in juju:
status: Triaged → Incomplete
milestone: 2.1-rc2 → none
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.