bzr: ERROR: paramiko.SSHException: Key-exchange timed out; consistent after sending 1GB data

Bug #556132 reported by Chuck Wilder on 2010-04-06
30
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Bazaar
Medium
Unassigned
Launchpad itself
Medium
Michael Hudson-Doyle
Twisted
Fix Released
Unknown

Bug Description

Session log:

Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Chuck>bzr launchpad-login chuckw20

C:\Documents and Settings\Chuck>bzr get lp:widelands-media
Connected (version 2.0, client Twisted)
Authentication (publickey) successful!
Secsh channel 1 opened.
bzr: ERROR: paramiko.SSHException: Key-exchange timed out waiting for key negoti
ation

Traceback (most recent call last):
  File "bzrlib\commands.pyo", line 853, in exception_to_return_code
  File "bzrlib\commands.pyo", line 1055, in run_bzr
  File "bzrlib\commands.pyo", line 661, in run_argv_aliases
  File "bzrlib\commands.pyo", line 665, in run_direct
  File "bzrlib\cleanup.pyo", line 122, in run_simple
  File "bzrlib\cleanup.pyo", line 156, in _do_with_cleanups
  File "bzrlib\builtins.pyo", line 1232, in run
  File "bzrlib\bzrdir.pyo", line 1184, in sprout
  File "bzrlib\repository.pyo", line 1704, in fetch
  File "bzrlib\decorators.pyo", line 194, in write_locked
  File "bzrlib\repository.pyo", line 3432, in fetch
  File "bzrlib\fetch.pyo", line 83, in __init__
  File "bzrlib\fetch.pyo", line 109, in __fetch
  File "bzrlib\fetch.pyo", line 137, in _fetch_everything_for_search
  File "bzrlib\repository.pyo", line 4274, in insert_stream
  File "bzrlib\repository.pyo", line 4307, in _locked_insert_stream
  File "bzrlib\groupcompress.pyo", line 1598, in insert_record_stream
  File "bzrlib\groupcompress.pyo", line 1653, in _insert_record_stream
  File "bzrlib\versionedfile.pyo", line 1665, in read
  File "bzrlib\smart\repository.pyo", line 578, in iter_substream_bytes
  File "bzrlib\smart\repository.pyo", line 568, in iter_stream_decoder
  File "bzrlib\smart\message.pyo", line 336, in read_streamed_body
  File "bzrlib\smart\message.pyo", line 277, in _read_more
  File "bzrlib\smart\medium.pyo", line 464, in read_bytes
  File "bzrlib\smart\medium.pyo", line 476, in _read_bytes
  File "bzrlib\smart\medium.pyo", line 151, in read_bytes
  File "bzrlib\smart\medium.pyo", line 817, in _read_bytes
  File "bzrlib\osutils.pyo", line 1991, in until_no_eintr
  File "paramiko\file.pyo", line 153, in read
  File "paramiko\channel.pyo", line 1215, in _read
  File "paramiko\channel.pyo", line 595, in recv
  File "paramiko\transport.pyo", line 1424, in _send_user_message
SSHException: Key-exchange timed out waiting for key negotiation

bzr 2.1.0 on python 2.5.4 (Windows-XP-5.1.2600-SP3)
arguments: ['bzr', 'get', 'lp:widelands-media']
encoding: 'cp1252', fsenc: 'mbcs', lang: None
plugins:
  bzrtools C:\Program Files\Bazaar\plugins\bzrtools [2.1.0]
  explorer C:\Program Files\Bazaar\plugins\explorer [1.0.1]
  launchpad C:\Program Files\Bazaar\plugins\launchpad [2.1.0]
  netrc_credential_store C:\Program Files\Bazaar\plugins\netrc_credential_store
[2.1.0]
  news_merge C:\Program Files\Bazaar\plugins\news_merge [2.1.0]
  qbzr C:\Program Files\Bazaar\plugins\qbzr [0.18.3]
  rebase C:\Program Files\Bazaar\plugins\rebase [0.5.5]
  svn C:\Program Files\Bazaar\plugins\svn [1.0.2]
  upload C:\Program Files\Bazaar\plugins\upload [1.0.0dev]
  xmloutput C:\Program Files\Bazaar\plugins\xmloutput [0.8.6]

*** Bazaar has encountered an internal error. This probably indicates a
    bug in Bazaar. You can help us fix it by filing a bug report at
        https://bugs.launchpad.net/bzr/+filebug
    including this traceback and a description of the problem.

C:\Documents and Settings\Chuck>

Related branches

Chuck Wilder (chuckw20) wrote :

Error was encountered approximately 20:20 EDT on April 5, 2010

Andrew Bennetts (spiv) wrote :

Interesting. We should probably catch this and raise a ConnectionError in this case. (And in general perhaps we should trap SSHException and raise a TransportError instead of dumping a traceback). So this error would look like:

C:\Documents and Settings\Chuck>bzr get lp:widelands-media
Connected (version 2.0, client Twisted)
Authentication (publickey) successful!
Secsh channel 1 opened.
Connection error: SSH failure: Key-exchange timed out waiting for key negotiation

As for why this happened to you, perhaps port 22 is firewalled? Or maybe there was simply an intermittent network problem. Either way, I don't think bzr can do much about it other than report the error as clearly as possible.

Changed in bzr:
importance: Undecided → Medium
status: New → Confirmed
tags: added: paramiko

On Thu, Apr 8, 2010 at 8:52 PM, Andrew Bennetts <
<email address hidden>> wrote:

> Interesting. We should probably catch this and raise a ConnectionError
> in this case. (And in general perhaps we should trap SSHException and
> raise a TransportError instead of dumping a traceback). So this error
> would look like:
>
> C:\Documents and Settings\Chuck>bzr get lp:widelands-media
> Connected (version 2.0, client Twisted)
> Authentication (publickey) successful!
> Secsh channel 1 opened.
> Connection error: SSH failure: Key-exchange timed out waiting for key
> negotiation
>
> As for why this happened to you, perhaps port 22 is firewalled? Or
> maybe there was simply an intermittent network problem. Either way, I
> don't think bzr can do much about it other than report the error as
> clearly as possible.
>
I am able to do bzr branch and bzr get successfully with other trunks.
Could the size play a role? widelands-media tilts the scales at over
1.5Gb. Plus it is only on Windows boxes (XP and Vista Pro) that I
encountered the error. I can bring it down on my Ubuntu boxes without the
error, so I have a work-around for now.

It's darn frustrating because it seems to transfer just about all of the
data, then craps out in the final stages. I thought about setting up a
different SSH key with Launchpad, but haven't gotten around to trying it.

Thanks for your response.

Andrew Bennetts (spiv) wrote :

Chuck Wilder wrote:
[...]
> > C:\Documents and Settings\Chuck>bzr get lp:widelands-media
> > Connected (version 2.0, client Twisted)
> > Authentication (publickey) successful!
> > Secsh channel 1 opened.
> > Connection error: SSH failure: Key-exchange timed out waiting for key
> > negotiation
> >
> > As for why this happened to you, perhaps port 22 is firewalled? Or
> > maybe there was simply an intermittent network problem. Either way, I
> > don't think bzr can do much about it other than report the error as
> > clearly as possible.
> >
> I am able to do bzr branch and bzr get successfully with other trunks.
> Could the size play a role? widelands-media tilts the scales at over
> 1.5Gb. Plus it is only on Windows boxes (XP and Vista Pro) that I
> encountered the error. I can bring it down on my Ubuntu boxes without the
> error, so I have a work-around for now.
>
> It's darn frustrating because it seems to transfer just about all of the
> data, then craps out in the final stages. I thought about setting up a
> different SSH key with Launchpad, but haven't gotten around to trying it.
>
> Thanks for your response.

Huh, that's weird! I'd expect key-exchange to be something that happens very
early in the SSH conversation, well before bzr starts making requests about
specific branches or anything like that. But you're right, the traceback shows
it has actually gotten a lot further than I thought. I wonder if paramiko is
giving an incorrect error message?

How long does it run for, exactly? If it's an hour, then maybe you're
experiencing bug 537116.

The contents of bzr's log file ("bzr --version" will tell you where to find it)
after doing "bzr get -Dhpss -Dhpssdetail lp:widelands-media" would be
interesting.

As with bug 537116, a workaround is probably to fetch it incrementally, e.g.:

    bzr branch -r1000 lp:widelands-media
    cd widelands-media
    bzr pull -r2000
    bzr pull -r3000
    ...

Adjust the numbers to whatever works well for you.

Chuck Wilder (chuckw20) wrote :

On Fri, Apr 9, 2010 at 12:09 AM, Andrew Bennetts <
<email address hidden>> wrote:

>
> Huh, that's weird! I'd expect key-exchange to be something that happens
> very
> early in the SSH conversation, well before bzr starts making requests about
> specific branches or anything like that. But you're right, the traceback
> shows
> it has actually gotten a lot further than I thought. I wonder if paramiko
> is
> giving an incorrect error message?
>
> How long does it run for, exactly? If it's an hour, then maybe you're
> experiencing bug 537116.
>
It runs well over an hour, closer to two.

> The contents of bzr's log file ("bzr --version" will tell you where to find
> it)
> after doing "bzr get -Dhpss -Dhpssdetail lp:widelands-media" would be
> interesting.
>
> As with bug 537116, a workaround is probably to fetch it incrementally,
> e.g.:
>
> bzr branch -r1000 lp:widelands-media
> cd widelands-media
> bzr pull -r2000
> bzr pull -r3000
> ...
>
> Adjust the numbers to whatever works well for you.
>

Thanks for the hint I'll give it a try tomorrow, if I can.

I also encountered this error (using the same repository), I have the following additional comments:

1) For some reason the problem seems to only come up on Windows, and not on ubuntu (I could personally finally circumvent it by branching in a virtual machine ubuntu), this could mean the package in ubuntu doesn't have the problems yet, or hundreds of other things, you probably know better than me :)

2) For me making a Lightweight checkout works, only branching (or not lightweight checkout) had this problem. This might just be because that is smaller and thus finishes sooner, but I don't think the whole branching took 1 hour either.

3) Just wanted to make note that this repository recently had some mix-ups in it's history (some bad commits that removed revisions, that were later put back), and the problems started appearing since then. So, it might be some problem with launchpad's storage of this repository, or something like that, I don't know if that's possible. (Those changes also increased the size, and undoing them didn't decrease them I think, so there might be some artifacts remaining)

Other than that, I can confirm that yes, it downloads the more than 1GB data, and only drops out then (with all data erased, the repository stays at ~0 size). If I have time I might try the things you mentioned.

Thanks for your time!

Chuck Wilder (chuckw20) wrote :
Download full text (885.8 KiB)

Here are my latest results:

Traceback follows:

Microsoft Windows [Version 6.0.6002]
Copyright (c) 2006 Microsoft Corporation. All rights reserved.

C:\Users\Chuck>bzr get -Dhpssdetail lp:widelands-media
Connected (version 2.0, client Twisted)
Authentication (publickey) successful!
Secsh channel 1 opened.
bzr: ERROR: paramiko.SSHException: Key-exchange timed out waiting for key negoti
ation

Traceback (most recent call last):
  File "bzrlib\commands.pyo", line 853, in exception_to_return_code
  File "bzrlib\commands.pyo", line 1055, in run_bzr
  File "bzrlib\commands.pyo", line 661, in run_argv_aliases
  File "bzrlib\commands.pyo", line 665, in run_direct
  File "bzrlib\cleanup.pyo", line 122, in run_simple
  File "bzrlib\cleanup.pyo", line 156, in _do_with_cleanups
  File "bzrlib\builtins.pyo", line 1232, in run
  File "bzrlib\bzrdir.pyo", line 1184, in sprout
  File "bzrlib\repository.pyo", line 1704, in fetch
  File "bzrlib\decorators.pyo", line 194, in write_locked
  File "bzrlib\repository.pyo", line 3432, in fetch
  File "bzrlib\fetch.pyo", line 83, in __init__
  File "bzrlib\fetch.pyo", line 109, in __fetch
  File "bzrlib\fetch.pyo", line 137, in _fetch_everything_for_search
  File "bzrlib\repository.pyo", line 4274, in insert_stream
  File "bzrlib\repository.pyo", line 4307, in _locked_insert_stream
  File "bzrlib\groupcompress.pyo", line 1598, in insert_record_stream
  File "bzrlib\groupcompress.pyo", line 1653, in _insert_record_stream
  File "bzrlib\versionedfile.pyo", line 1665, in read
  File "bzrlib\smart\repository.pyo", line 578, in iter_substream_bytes
  File "bzrlib\smart\repository.pyo", line 568, in iter_stream_decoder
  File "bzrlib\smart\message.pyo", line 336, in read_streamed_body
  File "bzrlib\smart\message.pyo", line 277, in _read_more
  File "bzrlib\smart\medium.pyo", line 464, in read_bytes
  File "bzrlib\smart\medium.pyo", line 476, in _read_bytes
  File "bzrlib\smart\medium.pyo", line 151, in read_bytes
  File "bzrlib\smart\medium.pyo", line 817, in _read_bytes
  File "bzrlib\osutils.pyo", line 1991, in until_no_eintr
  File "paramiko\file.pyo", line 153, in read
  File "paramiko\channel.pyo", line 1215, in _read
  File "paramiko\channel.pyo", line 595, in recv
  File "paramiko\transport.pyo", line 1424, in _send_user_message
SSHException: Key-exchange timed out waiting for key negotiation

bzr 2.1.0 on python 2.5.4 (Windows-Vista-6.0.6002-SP2)
arguments: ['bzr', 'get', '-Dhpssdetail', 'lp:widelands-media']
encoding: 'cp1252', fsenc: 'mbcs', lang: None
plugins:
  bzrtools C:\Program Files (x86)\Bazaar\plugins\bzrtools [2.1.0]
  explorer C:\Program Files (x86)\Bazaar\plugins\explorer [1.0.1]
  launchpad C:\Program Files (x86)\Bazaar\plugins\launchpad [2.1.0]
  netrc_credential_store C:\Program Files (x86)\Bazaar\plugins\netrc_credential_
store [2.1.0]
  news_merge C:\Program Files (x86)\Bazaar\plugins\news_merge [2.1.0]
  qbzr C:\Program Files (x86)\Bazaar\plugins\qbzr [0.18.3]
  rebase C:\Program Files (x86)\Bazaar\plugins\rebase [0.5.5]
  svn C:\Program Files (x86)\Bazaar\plugins\svn [1.0.2]
  upload C:\P...

On 9 April 2010 14:09, Andrew Bennetts <email address hidden> wrote:
> Huh, that's weird!  I'd expect key-exchange to be something that happens very
> early in the SSH conversation, well before bzr starts making requests about
> specific branches or anything like that.  But you're right, the traceback shows
> it has actually gotten a lot further than I thought.  I wonder if paramiko is
> giving an incorrect error message?

Perhaps SSH changes keys after a certain amount of time or traffic in
a single session? Or perhaps this is just bug 537116.
>
> How long does it run for, exactly?  If it's an hour, then maybe you're
> experiencing bug 537116.

Andrew Bennetts (spiv) wrote :

Martin Pool wrote:
> Perhaps SSH changes keys after a certain amount of time or traffic in
> a single session? Or perhaps this is just bug 537116.

Co-incidentally, I've just bumped into the fact that SSH does indeed, by
default, re-key after 1G to 4G, and glancing at paramiko it does it after 1G —
which is how much data is being reported by the activity logging in this case.

So, I'm pretty sure this is either a bug in paramiko, or Launchpad's Conch-based
SSH server (or both!), but not in bzr proper. The prime suspect would be
Launchpad I think, so I'm reassigning to LP guys to investigate. I'm leaving
this open against bzr though, because we should not be treating SSHException
like an internal error.

 affects launchpad-code

It seems that conch does have a bug wrt rekeying, as I can make bzr commands hang by lowering the rekeying limit in both conch and paramiko.

My WAG theory is that conch just blithely goes on sending data from the bzr subprocess to the client even though it's supposed to be waiting for the key exchange.

Fixing this is not likely to be a barrel of laughs.

Oh, this also suggests a workaround: pull in batches, as Andrew recommends, that transfer safely less than a gigabyte each.

Changed in twisted:
status: Unknown → New
Paul Hummer (rockstar) on 2010-04-15
Changed in launchpad-code:
status: New → Triaged
importance: Undecided → Medium

FWIW, the bug in conch is reported as http://twistedmatrix.com/trac/ticket/4395

Martin Pool (mbp) on 2010-11-19
summary: - bzr: ERROR: paramiko.SSHException:
+ bzr: ERROR: paramiko.SSHException: Key-exchange timed out; consistent
+ after sending 1GB data
Changed in launchpad:
assignee: nobody → Michael Hudson-Doyle (mwhudson)
milestone: none → 11.01
status: Triaged → In Progress
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
Changed in twisted:
status: New → Fix Released
tags: added: qa-ok
removed: qa-needstesting
Martin Pool (mbp) wrote :

Thanks for fixing this, Exarkun. And thanks, mwh, for both the big review and for landing this.

http://tahoe-lafs.org/trac/tahoe-lafs/wiki/SftpFrontend?action=diff&version=61&old_version=57 points out you can work around this by setting the SSH option 'RekeyLimit 0'.

Do we need to change anything here in Bazaar? It seems to me that the bug affects paramiko when it receives a rekeying request. Rekeying can be initiated by either the client or the server.

So we don't need to change bzr itself, but we should use a new version of Conch when we next build the windows/mac installers, and we should ask for conch to be updated in Ubuntu. The latter is less important, because people there will very likely be using openssh, so it may not even be worth filing a bug for it.

Changed in bzr:
status: Confirmed → Invalid

> So we don't need to change bzr itself, but we should use a new version
> of Conch when we next build the windows/mac installers, and we should

We don't include Conch in the installers do we? Paramiko is the SSH
client we bundle, and it doesn't use Twisted.

We use Conch as the SSH server for Launchpad, not as a client anywhere
that I know of.

Martin Pool (mbp) wrote :

On 10 January 2011 17:51, Andrew Bennetts <email address hidden> wrote:
>> So we don't need to change bzr itself, but we should use a new version
>> of Conch when we next build the windows/mac installers, and we should
>
> We don't include Conch in the installers do we?  Paramiko is the SSH
> client we bundle, and it doesn't use Twisted.
>
> We use Conch as the SSH server for Launchpad, not as a client anywhere
> that I know of.

You're right, of course.

--
Martin

Martin Pool (mbp) wrote :

I tested fetching the orginal (2.2GB) branch from Launchpad (lpnet) over ssh, and it worked properly. So I can confirm this is fixed.

Changed in launchpad:
status: Fix Committed → Fix Released
Martin Pool (mbp) on 2011-01-12
Changed in launchpad:
status: Fix Released → Fix Committed
Martin Pool (mbp) wrote :

Tim pointed out that the ssh server has not yet been upgraded to the revision that fixes this.

I think I got a false pass because OpenSSH defaults to never initiating rekeying from the client. If I set 'RekeyLimit 50k' then it does hang, and presumably will eventually time out. On the other hand running the same command against qastaging, which does have this fix, succeeds.

So this should be fixed on the next major/with-downtime rollout of Launchpad, in a couple of weeks(?).

Changed in launchpad:
status: Fix Committed → Fix Released

Somehow I didn't read this when I marked the bug fix released.

On Wed, 12 Jan 2011 03:47:11 -0000, Martin Pool <email address hidden> wrote:
> Tim pointed out that the ssh server has not yet been upgraded to the
> revision that fixes this.

I'm now confused by timezones, but when did he say this? The SSH server
was upgraded to a revision that tried to fix this bug by the 11.01
update that happened during the rally.

> I think I got a false pass because OpenSSH defaults to never initiating
> rekeying from the client.

Er. This is not true, according to my understanding of things.

man ssh_config ->

     RekeyLimit
             Specifies the maximum amount of data that may be transmitted
             before the session key is renegotiated. The argument is the num‐
             ber of bytes, with an optional suffix of ‘K’, ‘M’, or ‘G’ to
             indicate Kilobytes, Megabytes, or Gigabytes, respectively. The
             default is between ‘1G’ and ‘4G’, depending on the cipher. This
             option applies to protocol version 2 only.

I'm not sure whether openssh's sshd ever initiates rekeys (I think it
may do so based on time rather than data, according to some mailing list
posts I found?) but Launchpad's conch-based one certainly does not.

> If I set 'RekeyLimit 50k' then it does hang, and presumably will
> eventually time out. On the other hand running the same command
> against qastaging, which does have this fix, succeeds.

Note that rekeying involves many round trips and so frequent rekeying
over a high latency link (such as over wifi at the rally) will be very
slow. How did you determine it had hung?

> So this should be fixed on the next major/with-downtime rollout of
> Launchpad, in a couple of weeks(?).

Did you write this before or after the 11.01 rollout?

Cheers,
mwh

Martin Pool (mbp) wrote :

On 17 January 2011 16:28, Michael Hudson-Doyle
<email address hidden> wrote:
>> I think I got a false pass because OpenSSH defaults to never initiating
>> rekeying from the client.
>
> Er.  This is not true, according to my understanding of things.

Sorry, I meant to say "not rekeying within the first 1GB".

>> If I set 'RekeyLimit 50k' then it does hang, and presumably will
>> eventually time out.  On the other hand running the same command
>> against qastaging, which does have this fix, succeeds.
>
> Note that rekeying involves many round trips and so frequent rekeying
> over a high latency link (such as over wifi at the rally) will be very
> slow.  How did you determine it had hung?

It was stuck for over a minute, with no IO.

>> So this should be fixed on the next major/with-downtime rollout of
>> Launchpad, in a couple of weeks(?).
>
> Did you write this before or after the 11.01 rollout?

Before. It really is FixReleased now.

--
Martin

Daira Hopwood (daira) wrote :

To avoid confusion for anyone finding this bug via a web search, the option 'RekeyLimit 0' to ssh does not appear to work to disable rekeying (the error message, only displayed when you enable debugging, is "RekeyLimit too small").

Changed in twisted:
status: Fix Released → New
piotr zimoch (ebytyes) on 2013-06-14
Changed in bzr:
status: Invalid → New
piotr zimoch (ebytyes) on 2013-06-14
Changed in bzr:
status: New → Incomplete
status: Incomplete → Opinion
status: Opinion → Invalid
status: Invalid → Confirmed
status: Confirmed → In Progress
status: In Progress → Fix Committed
status: Fix Committed → Fix Released

piotr zimoch could you please explain what are you doing?

Martin Packman (gz) on 2013-06-14
Changed in bzr:
status: Fix Released → Invalid
Changed in twisted:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.