juju ssh doesn't use API binding to ssh to machines, causing random freeze of the connection

Bug #1936740 reported by Miroslaw Malinowski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Triaged
Wishlist
Unassigned

Bug Description

I'm deploying OpenStack using Maas + Juju and when using juju ssh <machine>, juju will pickup ssh interface at random using Juju internal ssh proxy, even when Juju API interface is present on the machine. In my scenario, OAM network used for Juju API is 172.16.130.0/24, subnets 131-132 are for Openstack internal and public interfaces. I think juju should know it can reach machine/container at API network and use the shortest route possible, and only do proxy when your juju client can't communicate with the machine in any other way. In my setup both Juju and Maas are on VM and both only have Juju API interfaces (172.16.130.0/24) attached, so when Juju tries to establish ssh to 172.16.131.0/24 connection on OpenStack VLAN it has to go by few extra proxies, making the connection very unstable and often freeze after a minute or so. I couldn't find exactly what is causing the freeze but as a workaround using juju ssh --proxy <machine> to force Juju using API does the trick. e.g.

 e.g.
juju ssh 5
ip a
7: br-ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 172.16.130.73/24 brd 172.16.130.255 scope global br-ens18
8: br-ens19: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 172.16.131.6/24 brd 172.16.131.255 scope global br-ens19
9: br-ens21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 172.16.170.53/24 brd 172.16.170.255 scope global br-ens21
10: br-ens20: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    inet 172.16.132.3/24 brd 172.16.132.255 scope global br-ens20

ps -aef | grep ssh
maas-ad+ 345097 251460 3 08:33 pts/0 00:00:00 /snap/juju/16736/bin/juju ssh 5
maas-ad+ 345121 345097 0 08:33 pts/0 00:00:00 ssh -o StrictHostKeyChecking yes -o PasswordAuthentication no -o ServerAliveInterval 30 -t -t -o UserKnownHostsFile /tmp/ssh_known_hosts109679527 -i /home/maas-admin/.local/share/juju/ssh/juju_id_rsa ubuntu@172.16.132.3

You can see connection has been established on 172.16.132 subnet, br-ens20 even when my client is on Juju API subnet 172.16.130 and API subnet is present on a machine as well. After running few more tests it looks like the Juju pickup interface is almost at random, sometimes it's 172.16.131, sometimes API interface. When I do juju ssh --proxy it's on the correct interface and no more freeze but it seems odd that when the interface is there to grab Juju will create few extra hops for no point. e.g.
maas-ad+ 345936 251460 4 08:44 pts/0 00:00:00 /snap/juju/16736/bin/juju ssh --proxy 5
maas-ad+ 345958 345936 0 08:44 pts/0 00:00:00 ssh -o StrictHostKeyChecking yes -o ProxyCommand /snap/juju/16736/bin/juju ssh --model=admin/csc-cloud --proxy=false --no-host-key-checks --pty=false ubuntu@172.16.130.51 -q "nc %h %p" -o PasswordAuthentication no -o ServerAliveInterval 30 -t -t -o UserKnownHostsFile /tmp/ssh_known_hosts615123961 -i /home/maas-admin/.local/share/juju/ssh/juju_id_rsa ubuntu@172.16.130.73
maas-ad+ 345959 345958 3 08:44 pts/0 00:00:00 /snap/juju/16736/bin/juju ssh --model=admin/csc-cloud --proxy=false --no-host-key-checks --pty=false ubuntu@172.16.130.51 -q nc 172.16.130.73 22
maas-ad+ 345965 345959 0 08:44 pts/0 00:00:00 ssh -o StrictHostKeyChecking no -o PasswordAuthentication no -o ServerAliveInterval 30 -o UserKnownHostsFile /dev/null -i /home/maas-admin/.local/share/juju/ssh/juju_id_rsa ubuntu@172.16.130.51 -q nc 172.16.130.73 22

Maybe there is a reason behind it, if so then can we at least have a switch to tell Juju use API interface and just throw connection error if can't.

Tags: ssh
tags: added: ssh
Revision history for this message
Joseph Phillips (manadart) wrote :

The issue is that even though there is a space dedicated to *agent*->controller comms, that doesn't necessarily mean that the *client* has a shortest path via that space.

We'll see what we can do.

Changed in juju:
importance: Undecided → Wishlist
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.