pubkey auth hangs on high latency networks

Bug #1713248 reported by Jim Salter
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

This is a weird one. I use a small fleet of laptops to do professional network testing, and I wrote some tools that use SSH with pubkey auth to run simultaneous tests.

The problem is, if the wifi connection isn't superb, the auth times out more often than not. For example, if I've got a long range 5 GHz connection which returns 100% of pings but has a median latency of 100ms or so, auth hangs after send packet: type 50, never receiving the packet type 51 to complete the auth.

debug2: key: /root/.ssh/id_ecdsa ((nil))
debug2: key: /root/.ssh/id_ed25519 ((nil))
debug3: send packet: type 5
debug3: receive packet: type 6
debug2: service_accept: ssh-userauth
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug3: send packet: type 50

This is where it hangs; after approximately a 30 second timeout (haven't clocked it precisely) it falls through to any remaining available authentication methods - other keys, hostbased, password, whatever hasn't been explicitly disabled.

Yes, UseDNS no is on, on both client and sender. I've also disabled all non-essential PAM modules, and disabled HostBasedAuth, RSAAuth, and GSSAPI on both client and sender.

The same two laptops in the same location will complete password authentication without a problem - if a pubkey is present for the user on the client side, it'll have to time that out first before it fails through and asks for the password (which will be promptly accepted and work normally); but if there is no pubkey for the client user, it'll prompt for the password and accept it immediately.

If I move the server laptop closer to the router, so that the median latency falls in something more like the 50ms range, the pubkey auth works fine. I want to reiterate here that we're talking about high latency, but we're *not* talking about dropped packets - pings between the laptops when they're having problems range from 100ms-900ms latency, but with 100% returns. (And, again, password auth works fine, it's only pubkey that has the issue - and only when latency is high.)

I can't find anything on the internets referring to a problem with high latency pubkey authentication on machines where pubkey auth works fine with lower latency, but here I am.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: openssh-server 1:7.2p2-4ubuntu2.1
ProcVersionSignature: Ubuntu 4.4.0-78.99-generic 4.4.62
Uname: Linux 4.4.0-78-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl nvidia_uvm nvidia_drm nvidia_modeset nvidia
ApportVersion: 2.20.1-0ubuntu2.10
Architecture: amd64
Date: Sat Aug 26 12:04:22 2017
InstallationDate: Installed on 2016-11-05 (293 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.3)
ProcEnviron:
 LANGUAGE=en_US
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: openssh
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Jim Salter (jrssnet) wrote :
Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 1713248] [NEW] pubkey auth hangs on high latency networks

I'm not sure your ping test proves that the network isn't dropping
packets. ping sends rather small packets by default, and you could have
a situation where only larger packets are dropped. Have you tried
asking ping to use larger packet sizes to try to narrow this down?

I'd also suggest using something like tshark on both ends to see exactly
what packets are being sent and received.

Revision history for this message
Jim Salter (jrssnet) wrote :

After logging in, my actual tests move several gigabytes of http traffic - downloads of incompressible data in files ranging from 16KB to 16MB - complete without incident.

I've been resorting to establishing an ssh control channel while the test laptops are in short range, then taking them to the longer range and running the actual tests, using the already established control channel to avoid the need for re authentication.

Literally *everything* but SSH pub key auth functions at long range. This behavior is consistent across at least five models of wireless interface, and thirty plus models of wireless router or access point.

Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 1713248] Re: pubkey auth hangs on high latency networks

OK - but I think you're still going to need to attack this with
something like tshark at both ends in order to even get enough
information to make plausible guesses.

Revision history for this message
Seth Arnold (seth-arnold) wrote :

You could use tc(8) to add predictable latency for testing http://bencane.com/2012/07/16/tc-adding-simulated-network-latency-to-your-linux-server/

Thanks

Revision history for this message
Andreas Hasenack (ahasenack) wrote :

I'm going to set this to incomplete until there is more information.

Changed in openssh (Ubuntu):
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for openssh (Ubuntu) because there has been no activity for 60 days.]

Changed in openssh (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.