Getting locked out of long running GCE instances using juju ssh & juju scp

Bug #1669501 reported by Matt Bruzek
30
This bug affects 6 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
Critical
Witold Krecicki
cloud-images
Invalid
Undecided
Unassigned

Bug Description

Encountered a problem with connecting to long running GCE instance and only using juju scp and juju ssh to connect to this VM. I believe juju is doing something irregular that is getting my IP address blocked from connecting to the VM.

I was actively using a juju ssh session to jenkins/1 in one terminal, and trying to scp a file up to the jenkins/1 in another terminal. I was able to connect earlier and transfer a file down (using juju scp). When trying to upload the file I got disconnected and it appears my IP address is blocked. Here is the actual commands I was running and what happened in the terminal:

$ juju scp juju_gce.tar.gz jenkins/1:/var/lib/jenkins/juju/juju_gce_new.tar.gz
ERROR exit status 1 (Timeout, server 104.197.80.216 not responding.
lost connection)
$ juju scp juju_gce.tar.gz jenkins/1:
ERROR cannot connect to any address: [104.197.80.216:22 10.240.0.3:22 172.17.0.1:22]
$ juju ssh jenkins/1
ERROR cannot connect to any address: [104.197.80.216:22 10.240.0.3:22 172.17.0.1:22]
$ ping 104.197.80.216
PING 104.197.80.216 (104.197.80.216) 56(84) bytes of data.
^C
--- 104.197.80.216 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1031ms

The terminal using juju ssh timed out at the same time I got this error:

Timeout, server 104.197.80.216 not responding.

- - - - -

After these error messages I am unable to juju scp or juju ssh to this server and I can not connect with ping or regular ssh.

I was blocked out at my home address, so I traveled to a coffee shop where was able to connect for a short time before getting blocked again. It appears to me that Juju is doing something irregular and getting blocked on GCE instances.

Please help with this problem.

Matt Bruzek (mbruzek)
description: updated
Revision history for this message
Matt Bruzek (mbruzek) wrote :

I am able to access the GCE VM using the Google console. I see my IP addresses in the iptable --lis DROP rules! The coffee shop ip address is 96.33.226.42.

Chain sshguard (1 references)
target prot opt source destination
DROP all -- smtp1.orderz.com anywhere
DROP all -- hostby.planet-telecom.eu anywhere
DROP all -- 122.194.229.9 anywhere
DROP all -- 112.85.42.110 anywhere
DROP all -- v157-7-208-112.myvps.jp anywhere
DROP all -- 153.99.182.8 anywhere
DROP all -- 112.85.42.28 anywhere
DROP all -- 112.85.42.22 anywhere
DROP all -- 193.201.224.237 anywhere
DROP all -- 96-42-224-45.dhcp.roch.mn.charter.com anywhere
DROP all -- 123.183.209.136 anywhere
DROP all -- 106.226.86.109.triolan.net anywhere
DROP all -- 146.228.112.199 anywhere
DROP all -- 114.119.7.53 anywhere
DROP all -- fs.sip.gocheepmobile.com anywhere
DROP all -- 120.27.133.147 anywhere
DROP all -- 123.183.209.135 anywhere
DROP all -- 163-172-219-77.rev.poneytelecom.eu anywhere
DROP all -- 96-42-209-188.dhcp.roch.mn.charter.com anywhere
DROP all -- 117.54.13.180 anywhere
DROP all -- 91.224.160.131 anywhere
DROP all -- 46.148.18.163 anywhere

It looks like Juju scp/ssh is doing something that sets off a sshguard rule on GCE VMs. Here is the output from the systemctl status sshguard

Mar 02 15:55:35 juju-25c78b-1 sshguard[1710]: Blocking 96.33.226.42:4 for >630secs: 40 danger in 4 attacks over 907 seconds (all: 40d in 1 abuses over 907s).

The only method of connection was juju ssh/scp to connect to this system and now my IP address is blocked!

Revision history for this message
Jay R. Wren (evarlast) wrote :

I ran into this recently and mistakenly thought this was an sshguard bug or GCE bug and filed bugs on those projects.

https://bitbucket.org/sshguard/sshguard/issues/65/blocks-a-source-ip-for-many-connections

"You should never see `Connection closed by IP port PORT [preauth]` from a well behaved client."

Why is juju doing this?

Revision history for this message
Matt Bruzek (mbruzek) wrote :

I have snap installed juju and was asked to update the bug with the version:

$ juju --version
2.2-alpha1-yakkety-amd64
$ which juju
/snap/bin/juju

Revision history for this message
Jay R. Wren (evarlast) wrote :

My juju version:

2.0.4-sierra-amd64

Revision history for this message
Jay R. Wren (evarlast) wrote :

juju ssh uses ReachableHostPort which attempted to check of the tcp port is reachable before trying to ssh to that port. This check is triggering sshguard.

https://github.com/juju/juju/blob/staging/cmd/juju/commands/ssh_common.go#L408

This file has changed since 2.0.3, but given mbruzek has the same behavior, it seems the behavior has changed.

Revision history for this message
Anastasia (anastasia-macmood) wrote :

I am guessing that comment #5 has a typo and the intent was to say that the behavior did not change between 2.0.x and 2.2.x.

This is indeed an unfortunate side-effect :(

@Jay R. Wren (evarlast), @Matt Bruzek (mbruzek),
Do you have an approximate time that it takes to trip sshguard? I wonder if we can setup automated testing to ensure that the problem is fixed, once we fix it :)

Changed in juju:
status: New → Triaged
importance: Undecided → High
tags: added: network
Revision history for this message
Matt Bruzek (mbruzek) wrote :

@Anastasia the only reproduction scenario that I know is sending a file via juju scp and having juju ssh attached in another session. It took less than a few hours of use before I was blocked.

Revision history for this message
John A Meinel (jameinel) wrote :

So in 2.1 we changed from just doing a TCP Connect to port 22, to actually starting an SSH session to the point that we've negotiated an SSH key and know that the target machine is the host that we want to talk to. (otherwise things like LXD/KVM/etc addresses and the Canonical VPN cause us to find reachable machines that aren't the machine we want to talk to.)

We've had a few reports where people are using 10.* address spaces that end up colliding with other 10.* addresses that are reachable. (Configure MAAS to hand out IP addresses on a second switch that actually match addresses that Canonical is using when you use the Canonical VPN.)

We could try to take the probe through all the way to Auth, though that adds several round trips to a connection that we don't intend to keep.

We could try to have a way to not probe, for people that know that there won't be addresses that we might successfully route to, but aren't the target machine.

We could only probe in cases where we don't see what looks like an official public address (MAAS tends to be in a situation where all of the addresses are RFC1918, but not all of them are reachable.) If we *do* always favor non RFC1918, that means that from within the cloud we'll always be preferring public addresses, and the associated fees/slowdowns that it implies. (eg, AWS doesn't charge if one Instance talks to another on its private network, but *does* charge if you go via the Public address of that instance.)

Is SSH Guard running on the machine itself (where we would possibly configure it), or is it running on GCE infrastructure? If it is running on the machine itself, why are those machines different in GCE and we aren't seeing this on other hosting platforms (Azure, AWS). Is this something that you've expressly installed and configured?

We *could* do a bad auth (forced ubuntu/ubuntu sort of auth), but given that we are expressly 'probing for ssh keys' that is sort of indistinguishable from someone doing it for nefarious purposes, IMO. It seems SSH Guard doesn't care that we immediately follow up the probe by a fully valid SSH connection with a viable SSH exchange immediately after we try probing. (It seems to me that if an IP address is *successfully* connecting, that should offset times where it is probing.)

Changed in juju:
status: Triaged → Incomplete
Revision history for this message
Curtis Hovey (sinzui) wrote :

I would be nice if Juju could configure sshguard or disable it. We presume Juju is configuring firewalls to manage the host. This issue affect Juju and charm testing. Long running tests need to disable ssh guard for them to complete.

tags: added: ci gce-provider
tags: added: jujuqa
Revision history for this message
John A Meinel (jameinel) wrote : Re: [Bug 1669501] Re: Getting locked out of long running GCE instances using juju ssh & juju scp

How do you disable SSH guard? Is it possible to configure it to not react
to the particular behavior of Juju. (It's ok as long as you get a genuine
connection from that IP).

John
=:->

On Mar 31, 2017 5:21 PM, "Curtis Hovey" <email address hidden> wrote:

> I would be nice if Juju could configure sshguard or disable it. We
> presume Juju is configuring firewalls to manage the host. This issue
> affect Juju and charm testing. Long running tests need to disable ssh
> guard for them to complete.
>
> ** Tags added: ci gce-provider
>
> ** Tags added: jujuqa
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1669501
>
> Title:
> Getting locked out of long running GCE instances using juju ssh & juju
> scp
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1669501/+subscriptions
>

Revision history for this message
Seman (sseman) wrote :

Here is a page that describes how sshguard recognizes an attack. Not sure what Juju is doing to trigger this attack signature.
https://www.sshguard.net/docs/reference/attack-signatures/

Optionally, sshguard supports address whitelisting:
https://www.sshguard.net/docs/whitelist/

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Disabling sshguard is pretty simple (sudo service sshguard stop), but it would have to be done as part of every charm's install routine. I have some more logs that show sshguard is indeed to blame. I'm running bundletester on a host (162.213.34.190) which deploys the bundle and proceeds to test each charm. Notice 4 '[preauth]' attempts from my host followed by sshguard blocking my test host:

-----
Apr 5 14:31:14 juju-2f68d2-4 sshd[1070]: Connection closed by 162.213.34.190 port 40784 [preauth]
Apr 5 14:31:16 juju-2f68d2-4 sshd[1072]: Accepted publickey for ubuntu from 162.213.34.190 port 40788 ssh2: RSA SHA256:ytUIDf82rBLBr3ENPDmVY55E5JiK1/L8+VxAefdcqYo
Apr 5 14:31:16 juju-2f68d2-4 sshd[1141]: Received disconnect from 162.213.34.190 port 40788:11: disconnected by user
Apr 5 14:31:16 juju-2f68d2-4 sshd[1141]: Disconnected from 162.213.34.190 port 40788
Apr 5 14:31:18 juju-2f68d2-4 sshd[1240]: Connection closed by 162.213.34.190 port 40802 [preauth]
Apr 5 14:31:20 juju-2f68d2-4 sshd[1246]: Accepted publickey for ubuntu from 162.213.34.190 port 40804 ssh2: RSA SHA256:ytUIDf82rBLBr3ENPDmVY55E5JiK1/L8+VxAefdcqYo
Apr 5 14:31:22 juju-2f68d2-4 sshd[1325]: Received disconnect from 162.213.34.190 port 40804:11: disconnected by user
Apr 5 14:31:22 juju-2f68d2-4 sshd[1325]: Disconnected from 162.213.34.190 port 40804
Apr 5 14:31:24 juju-2f68d2-3 sshd[1309]: Connection closed by 162.213.34.190 port 60056 [preauth]
Apr 5 14:31:25 juju-2f68d2-3 sshd[1311]: Accepted publickey for ubuntu from 162.213.34.190 port 60060 ssh2: RSA SHA256:ytUIDf82rBLBr3ENPDmVY55E5JiK1/L8+VxAefdcqYo
Apr 5 14:31:26 juju-2f68d2-3 sshd[1369]: Received disconnect from 162.213.34.190 port 60060:11: disconnected by user
Apr 5 14:31:26 juju-2f68d2-3 sshd[1369]: Disconnected from 162.213.34.190 port 60060
Apr 5 14:31:27 juju-2f68d2-3 sshd[1387]: Connection closed by 162.213.34.190 port 60066 [preauth]
Apr 5 14:31:27 juju-2f68d2-5 sshguard[1637]: Blocking 162.213.34.190:4 for >630secs: 40 danger in 4 attacks over 13 seconds (all: 40d in 1 abuses over 13s).
-----

So, from comment #8, it sounds like juju is responsible for the [preauth] stuff. @jam: what kind of perf penalties are we looking at for doing a full connection probe instead of preauth?

Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

One more thing to note.. On all my gce failures (yellow triangles here: http://bigtop.charm.qa/index.html), this is failing to connect to units with subordinates.

Perhaps juju does a couple preauths for the principal unit and then does another couple for the subordinate. sshguard's default threshold is "40 points", and each preauth counts as 10. I suspect colocating applications on the same machine would also trigger it.

If you're looking for a gce repro scenario, i suggest using a principal/sub to get your ip blocked (the default block time is 20m).

Revision history for this message
Cory Johns (johnsca) wrote :

I was able to reproduce this by deploying the cs:ubuntu charm and then running:

for i in {0..4}; do juju ssh ubuntu/0 /bin/true; done

The fourth connection hung and eventually timed out.

Revision history for this message
Cory Johns (johnsca) wrote :

John, I don't understand why the preauth check is necessary. You mentioned dealing with broken networking with some MAAS setups, but how does doing the preauth help with that in any way? Is this just an optimization to try to use the internal address when possible to save on cloud charges but not being able to reliably detect whether the private address goes to the right machine from where `juju ssh` is being called?

Revision history for this message
Matt Bruzek (mbruzek) wrote :

Please note juju scp also has this issue. I got locked out of a GCE instance today while trying to download some test results from a kubernetes-e2e test charm. This is really inconvenient for me and any GCE users trying to download or ssh to their units.

Changed in juju:
importance: High → Critical
status: Incomplete → Triaged
assignee: nobody → John A Meinel (jameinel)
milestone: none → 2.2-beta3
milestone: 2.2-beta3 → 2.2-rc1
John A Meinel (jameinel)
Changed in juju:
assignee: John A Meinel (jameinel) → Witold Krecicki (wpk)
Changed in juju:
milestone: 2.2-beta4 → 2.2-rc1
Revision history for this message
Witold Krecicki (wpk) wrote :
Changed in juju:
status: Triaged → In Progress
Witold Krecicki (wpk)
Changed in juju:
status: In Progress → Fix Committed
Revision history for this message
Kevin W Monroe (kwmonroe) wrote :

Sweet! works great on 2.2.0-xenial-amd64.

Changed in juju:
status: Fix Committed → Fix Released
Dan Watkins (oddbloke)
Changed in cloud-images:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.