ssh fails to connect to VPN host - hangs at 'expecting SSH2_MSG_KEX_ECDH_REPLY'

Bug #1254085 reported by James Hunt
198
This bug affects 43 people
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Invalid
High
Unassigned

Bug Description

ssh -vvv <host> is failing for me where <host> is a VPN system.

VPN is configured and connected via network-manager. Last messages from ssh (hangs forever):

debug2: kex_parse_kexinit: none,<email address hidden>
debug2: kex_parse_kexinit: none,<email address hidden>
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: mac_setup: found hmac-md5
debug1: kex: server->client aes128-ctr hmac-md5 none
debug2: mac_setup: found hmac-md5
debug1: kex: client->server aes128-ctr hmac-md5 none
debug1: sending SSH2_MSG_KEX_ECDH_INIT
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

= Workaround =

$ sudo apt-get install putty
$ putty <host>

This works perfectly.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: openssh-client 1:6.4p1-1
ProcVersionSignature: Ubuntu 3.12.0-3.8-generic 3.12.0
Uname: Linux 3.12.0-3-generic i686
NonfreeKernelModules: nvidia
ApportVersion: 2.12.7-0ubuntu1
Architecture: i386
CurrentDesktop: Unity
Date: Fri Nov 22 15:37:18 2013
InstallationDate: Installed on 2010-10-21 (1128 days ago)
InstallationMedia: Ubuntu 10.10 "Maverick Meerkat" - Release i386 (20101007)
RelatedPackageVersions:
 ssh-askpass 1:1.2.4.1-9
 libpam-ssh N/A
 keychain 2.7.1-1
 ssh-askpass-gnome 1:6.4p1-1
SSHClientVersion: OpenSSH_6.4p1 Ubuntu-1, OpenSSL 1.0.1e 11 Feb 2013
SourcePackage: openssh
UpgradeStatus: Upgraded to trusty on 2013-11-01 (20 days ago)

Revision history for this message
James Hunt (jamesodhunt) wrote :
Revision history for this message
James Hunt (jamesodhunt) wrote :

scp suffers the same issue, but interestingly, so seemingly does pscp which hangs indefinitely after successfully connecting and starting to send the data.

Changed in openssh (Ubuntu):
importance: Undecided → High
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in openssh (Ubuntu):
status: New → Confirmed
Revision history for this message
sedlund (scott-edlund) wrote :

Same issue, 13.10 openssh-client 6.2p2-6ubuntu0.1 cannot connect to server 12.04.03 openssh-server 5.9p1-5ubuntu1.1.

Putty is working as stated above.

Revision history for this message
sedlund (scott-edlund) wrote :

I think this is this same bug 708493 re surging again...

Revision history for this message
sedlund (scott-edlund) wrote :

FYI Specifying the cipher with onopenssh-client 1:6.2p2-6ubuntu as:

ssh -c 3des-cbc <targethost>

works for me.

Revision history for this message
Scott Moore (scottbomb) wrote :

I have the exact same hang just trying to ssh into the host (remotely).

Revision history for this message
Scott Moore (scottbomb) wrote :

The next time I removed the X forwarding ( ssh -X ) and it didn't hang so X-forwarding (or the attempted access thereof) may play a role in this bug.

Revision history for this message
Nikolay Bryskin (nikicat) wrote :

It's almost certanly MTU problem. Just try
ip li set mtu 1200 dev tap0 (or tun0)
to test

Revision history for this message
Oguz Yarimtepe (oguzy) wrote :

setting MTU to 1200 fixed my problem

Revision history for this message
mathew (meta23) wrote :
Download full text (16.8 KiB)

Same problem here. MTU of the server is 1500, changing client MTU to the same value doesn't fix it.

Testing with ping, 1300 byte pings make it through fine, so tried setting client MTU to that on both client and server. No dice.

Server version is OpenSSH_5.9p1 Debian-5ubuntu1.4, OpenSSL 1.0.1 14 Mar 2012.
A client running OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013 can connect.
A client running OpenSSH_6.6.1p1 Ubuntu-2ubuntu2, OpenSSL 1.0.1f 6 Jan 2014 *cannot* connect.

Successful connection:

% ssh -vv [redacted]
OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: Applying options for *
debug2: ssh_connect: needpriv 0
debug1: Connecting to [redacted] port 22.
debug1: Connection established.
debug1: identity file /home/meta/.ssh/identity type -1
debug1: identity file /home/meta/.ssh/identity-cert type -1
debug1: identity file /home/meta/.ssh/id_rsa type -1
debug1: identity file /home/meta/.ssh/id_rsa-cert type -1
debug1: identity file /home/meta/.ssh/id_dsa type -1
debug1: identity file /home/meta/.ssh/id_dsa-cert type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.9p1 Debian-5ubuntu1.4
debug1: match: OpenSSH_5.9p1 Debian-5ubuntu1.4 pat OpenSSH*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.3
debug2: fd 3 setting O_NONBLOCK
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: <email address hidden>,<email address hidden>,<email address hidden>,<email address hidden>,ssh-rsa,ssh-dss
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,<email address hidden>
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,<email address hidden>
debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,<email address hidden>,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,<email address hidden>,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,<email address hidden>,hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,<email address hidden>,hmac-sha1-96,hmac-md5-96
debug2: kex_parse_kexinit: none,<email address hidden>,zlib
debug2: kex_parse_kexinit: none,<email address hidden>,zlib
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit:
debug2: kex_parse_kexinit: first_kex_follows 0
debug2: kex_parse_kexinit: reserved 0
debug2: kex_parse_kexinit: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1
debug2: kex_parse_kexinit: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256
debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,<email address hidden>
debug2: kex_parse_kexinit: aes1...

Revision history for this message
Danton Nunes (danton-nunes) wrote :

Setting the MTU to 1480 did it. This MTU problem also affects navigation to certains sites, e.g. facebook via https. Old Ubuntu 12.04-LTS did not feature this bug.

Revision history for this message
Danton Nunes (danton-nunes) wrote :

besides setting MTU to 1480, it is wise to do:
# /sbin/iptables -I OUTPUT -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
# /sbin/ip6tables -I OUTPUT -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu
because there still are firewalls out there that can't handle fragmentation.

Revision history for this message
tilo kremer (ubunteelo) wrote :

it works when turning off DTLS

Revision history for this message
Pablo Piaggio (papibe) wrote :

This affects me.

14.04 trying to connect through ssh to 12.04. Both 64bits.

It stops at the the line: debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

Please help.

Revision history for this message
Yury Krasouski (krasoffski) wrote :

Hello,
Used NetworkManager VPN (CHAP and MSCHAPv2 options are checked).

Have the same problem on clear installed xubuntu 14.04.2.
It stops at the the line: debug1: expecting SSH2_MSG_KEX_ECDH_REPLY.
It is about 1 year and there is no any news or workaround.

Revision history for this message
Yury Krasouski (krasoffski) wrote :

Hello again, problem appeared on openSUSE 13.2 (x32) as client and Xubuntu 14.04.2 as a server.
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY

Seems like it is a server problem not a client.

Revision history for this message
Yury Krasouski (krasoffski) wrote :

Changing MTU does not solve the problem.

Revision history for this message
Yury Krasouski (krasoffski) wrote :
Revision history for this message
uwe (maabdulhaq) wrote :

I've had the same issue on 14.04, using cisco vpn client (horrific) , connecting to a linux machine or freebsd machine, setting MTU to 1200 helped as well.

Revision history for this message
pedroleouf (pedroleouf) wrote :

I had the same issue. No VPN tho, client is `OpenSSH_6.6.1p1 Ubuntu-2ubuntu2, OpenSSL 1.0.1f 6 Jan 2014` and server is `OpenSSH_6.6.1p1 Ubuntu-2ubuntu2, OpenSSL 1.0.1f 6 Jan 2014` (yes, same). Of course scp or rsync and anything using ssh connection using openssh was broken as well.

Changing the MTU of my connection to 1200 fixed the issue, but yeah, it's a workaround...

Revision history for this message
fnkr (fnkr) wrote :

MTU 1200 did it for me too! Thanks!

Revision history for this message
Dave V (mindkeep) wrote :

Same issue when trying to connect from Arch ssh OpenSSH_6.9p1, OpenSSL 1.0.2d 9 Jul 2015, to Ubuntu 14.04.2 sshd.

Revision history for this message
Luke J Militello (kilahurtz) wrote :

Has there been any recent updates on this? I have a network setup with tunnels between Cisco routers as the egress points for each remote site and am having the same problems when attempting to SSH from a host inside of one site to the other. Again, MTU related. I hope there is a permanent fix in the works as it is cumbersome to have to change the MTU of each individual machine.

Revision history for this message
Luke J Militello (kilahurtz) wrote :

Since the DF bit is set and PMTUD is being disobeyed, I discovered that MSS is obeyed. Therefore, I resolved my problems by doing the overhead math and setting the MSS adjust parameter on the LAN facing interfaces of my routers. This workaround is definitely more scalable than changing the system MTU on all my machines.

Revision history for this message
clickwir (clickwir) wrote :

Had the same hang. Client is Ubuntu 15.10 and server is 14.04, both up to date as of today. Fix was to change client MTU from 9000 to 1500.

Revision history for this message
derWalter (walter-derwalter) wrote :

ubuntu 14.04 to cisco catalyst switches and debianboxes

out of nowhere (or maybe with some update from around a week ago) i could not access my network anymore...
just to find out its my box fault... had to rdp into windows machines to ssh into my linux/unix boxes..........................

Revision history for this message
Rodney Beede (business2008+launchpad) wrote :

Might be your PMTU discovery is being blocked by a firewall somewhere.

http://mccltd.net/blog/?p=1577

Revision history for this message
Ricardo de Barros (stealthymarine) wrote :

Lowering MTU 1200 worked for me.

Revision history for this message
bs (bentzy-sagiv) wrote :

This solved the issue:
Append to /etc/sysctl.conf the following:
net.ipv4.tcp_mtu_probing = 1

after restart you should see at /proc/sys/net/ipv4/tcp_mtu_probing the value "1"

A temporary solution is:
echo 1 > /proc/sys/net/ipv4/tcp_mtu_probing
caveat: this will be reset at boot.

You can try also with value "2" if still not working.

(see explanation at:https://thesimplecomputer.info/pages/adventures-in-linux-tcp-tuning-page2)

Revision history for this message
Abhijit (abhijit86k) wrote :

Just noticed this after the last dist-upgrade on Ubuntu 14.04 LTS, installed packages are openssh-server 1:6.6p1-2ubuntu2.7 and openssh-client 1:6.6p1-2ubuntu2.7.

Revision history for this message
Radosław Warzocha (radoslaw-warzocha) wrote :

Looks like this is still on in 1:6.6p1-2ubuntu2.8 version of openssh-client and server

Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi everybody,
this is coming up over and over again and not only on Ubuntu but on various Distributions.

As outlined before the error is an effect of broken path MTU discovery.
This could be Firewall, broken Router software, ... , bad local MTU config, ... many potential sources.

It is nothing that "openssh" nor Ubuntu's openssh packaging can really fix.

The real "fix" is to fixup the network configuration wherever it is broken for correct PMTU discovery (or fix the local net/mtu configuration if that is the issue).

The mentioned workaround is nice - thank you bs for the mentioning - as it gives user unable to configure the network a way to work around the issue. The way it works is that it switches the MTU discovery to different modes (https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt).

That said one might argue why the default mode is disabled, but look at since when this is the default: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5d424d5a674f782d0659a3b66d951f412901faee
That is a decade ago and never changed (these days namespacified, but still 0).
So I think this as default is set in stone as much as everything else that survives that long.

All that outlined I think we have to mark the bug invalid/incomplete as it should be considered a local configuration issue IMHO. Please of you object please set it back to confirmed and explain why you think so - and if possible please also mention how you'd suggest to approach the case.

Changed in openssh (Ubuntu):
status: Confirmed → Incomplete
status: Incomplete → Invalid
Revision history for this message
sedlund (scott-edlund) wrote :

There is a problem with 'OpenSSH' client not connecting while 'Putty' client does work given the same network settings. This was my finding almost 3 years ago.

Given that another client does work - there is something OpenSSH Client can do to resolve the issue.

Ubuntu distributes OpenSSH client in it's core distribution. As such this is a valid issue with Ubuntu and should remain open until one of the following conditions are met:

1) The issue is fixed in OpenSSH client and connection succeeds as it does in Putty and Ubuntu releases a package that resolves it.

2) The OpenSSH package inserts a sane timeout on 'expecting SSH2_MSG_KEX_ECDH_REPLY' and issue's an error to the user instead of hanging for a connection, leaving a poor user experience. Inform the user that the software is not adequate in its ability tolerate uncertain MTU settings, and suggest a more robust client such as Putty.

3) Ubuntu removes OpenSSH client from its core distribution.

Revision history for this message
fhsm (fhsm) wrote :

I'm running into hung ssh connection and it is crazy making because I'm getting inconsistent behavior across 16.10 boxes.

I've got two newly installed 16.10 boxes ($ ssh -V: OpenSSH_7.3p1 Ubuntu-1, OpenSSL 1.0.2g 1 Mar 2016). I'll call them Box1 and Box2. They are in in adjacent ports on switch that has not had any changes in years and on which all other systems are functioning.

I've got four clients: one OS X 10.9 (OpenSSH_6.2p1,OSSLShim 0.9.8r 8 Dec 2011), OS X 10.11 (OpenSSH_6.9p1, LibreSSL 2.1.8), Box1, and Box2.

I can connect to Box1 from the other three clients without any problem. Works as expected.

I can ONLY connect to Box2 from OS X 10.9 and itself (i.e. ssh me@localhost). I cannot get to it from Box1, I cannot get to it from OS X 10.11. I have swapped the two OS X boxes around on the network and also the two 16.10 boxes such that I'm confident that the ability to connect is a function of the client-server combination not the network link between them. ssh -vvv into Box2 from all of the failing clients hangs as above, expecting SSH2_MSG_KEX_ECDH_REPLY.

This behavior / bug is so perplexing I'm unsure of it's relevance. I see two factors of interest:
 - Two ubuntu 16.10 sshds are behaving differently despite stock config across the two;
 - The ubuntu 16.10 is unable to connect to an sshd that one of two OS X ssh clients is able to connect two.

Since the two just got installed they haven't had much time to diverge. The only differences between box1 and box2 are: (1) box2 is a few updates behind and (2) although both have LXD installed only box2 has had a container launched on it. Unfortunately prior to racking only the 10.9 system had ssh-ed into these boxes.

I'm going to sit on this for now in case someone interesting in trying to get the Ubuntu ssh client to function - or - someone curious about the divergent behavior of the Ubuntu sshd can give guidance on how to turn this from a odd +1 report into something more useful. If not I'll do a little more troubleshooting (install the pending updates [initramfs-tools initramfs-tools-bin initramfs-tools-core isc-dhcp-client isc-dhcp-common libglib2.0-0 libglib2.0-data liblxc1 lxc-common], purge LXD) to see if I stumble onto something useful and failing that nuke and pave.

Revision history for this message
fhsm (fhsm) wrote :

Update to my comment above (#37):

I was able to connect to Box2 from OS X 10.11. I tried once more by mistake and was shocked that it suddenly worked (albeit with a long connection lag). With more testing I found I could connect from 10.11 to Box2 maybe 1/25 times (as I said, crazy making). I wasn't ever able to get to Box2 from Box1 despite trying numerous times. Long story short it looks like the Ubuntu SSH client is the most particular vs least able to win the race, followed by that in 10.11, followed by 10.9 which always connected without a problem. Interestingly when I switched to keys 10.11 was much more likely to connect (1/5 times). The switch to keys had no effect on Box1's inability to connect to Box2. So as with #36 I'd say Ubuntu's ssh client gave the worst experience here - vs - is actually the best but just needs an error message explaining why it's electing to protect me from (...?) vs "working" as my other clients did.

For those curious about the sshd aspect of the story I did finally track down a fix for Box2. Box1 and Box2 are both dual nic systems. They were configured with eth0 as static and eth1 dhcp. Plugging eth1 and letting it obtained a dhcp lease / taking that interface out of the config fixed Box2. Having the interface configured but the jack unplug produced the client dependent connection issues I outlined in #37. Box1 had the exact same interface config (eth1 looking for a dhcp lease but not being plugged in) without any problem. They are both full Intel systems but totally different hardware. Seems like the kernel abstraction leaking hardware details up the stack but understandable given both were arguably misconfigured. Imagine how annoying it would be in a dual nic system to have a port go out and suddenly arbitrarily be unable to connect to sshd with anything other than an old copy of OS X.

TL;DR - my 16.10 ssh client didn't work when others did and failed silently. This may be Ubuntu being smart. It may be Ubuntu being broken. Either repair or an informative error would make the experience better than / comparable to others.

Revision history for this message
Simon Clift (ssclift-gmail) wrote :

My $0.02. I don't disagree that this is an MTU problem on the path, and that OpenSSH could be smarter about this, but my fix was to specify the key exchange algo.

   ssh -o KexAlgorithms=ecdh-sha2-nistp521 <email address hidden>

replacing the algorithm with one that the server says it supports.

Not working:
  * MTU probing fix (#32) didn't work for me, net.ipv4.tcp_mtu_probing to 1 or 2.
  * Specification of the cipher (#6) as in ssh -c.

Revision history for this message
mdavidsaver (mdavidsaver) wrote :

I found this thread helpful, so I thought to add my experience.

In short, I have a dual band wifi router/DSL modem (Arris BGW210-700) which seems to mess with some traffic moving between devices connected at 2.4GHz (my SSH server) and 5GHz (my ssh client). I can avoid this by forcing the use of 2.4GHz by both devices.

The symptom I see is the same random stalled SSH sessions as the reporter. In my case, only about 1 in 20 attempts succeed. Adding the various CLI arguments mentioned seems to change the probability of a stall a little, but none eliminate it.

Running a packet capture on the client machine with wireshark, I see that the stall is followed by a frame labeled "TCP Spurious Retransmission" from server to client, and then some "TCP Dup ACK" from client to server. The frame being resent has a length of only 518 bytes, well below the 1500 byte MTU.

I could successfully 'ping -s 1458 <ip>' in both directions. Wireshark confirms that 1500 byte frames were being sent. Still, I tried changing the MTU on both machines to first 1400 and then 1200. This reduced the chance of a stall to the point where SSH was almost usable.

I was puzzled at this point. I suspected the wifi router as I previously had these machines working through a different (older) router, but wasn't sure how the router could be involved between two local devices. Eventually I realized that the router was bridging traffic since the two machines were connecting to different radios.

I disabled the 5GHz radio on the router to force the client machine to 2.4GHz. At that point 20 of 20 connection attempts succeeded.

Running the client machine (my laptop) at the lower bit rate isn't a permanent solution for me. I doubt I'll make any head way with the router though. sigh... wonderful closed firmware.

I'll also mention a couple of other things I tried which made no difference.

Adding "UseDNS no" to the SSH server config changed nothing.

Disabling the offloading features of the server NIC with ethtool also changed nothing.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.