failure to fetch from 1.9 to a 2a over bzr+ssh (revision bdecode failure)

Bug #424444 reported by John A Meinel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Critical
Andrew Bennetts

Bug Description

I'm not sure what is failing here, but the situation is:

1) I have a local heavyweight checkout of a bzr+ssh://myserver branch
2) I'm trying to pull into that branch from lp (which is also then bzr+ssh)
3) I get a traceback with:
    readv_body=readv_body, body_stream=body_stream)
  File "C:\Users\jameinel\dev\bzr\bzr.dev\bzrlib\smart\client.py", line 42, in _send_request
    protocol_version)
  File "C:\Users\jameinel\dev\bzr\bzr.dev\bzrlib\smart\client.py", line 112, in _construct_protocol
    request = self._medium.get_request()
  File "C:\Users\jameinel\dev\bzr\bzr.dev\bzrlib\smart\medium.py", line 699, in get_request
    return SmartClientStreamMediumRequest(self)
  File "C:\Users\jameinel\dev\bzr\bzr.dev\bzrlib\smart\medium.py", line 904, in __init__
    raise errors.TooManyConcurrentRequests(self._medium)
TooManyConcurrentRequests: The medium 'SmartSSHClientMedium(connected=True, username=None, host='juj
u.arbash-meinel.com', port=None)' has reached its concurrent request limit. Be sure to finish_writin
g and finish_reading on the currently open request.

The branch I am pulling from is:
  lp:~johnf-inodes/bzr/ppa-doc

Which is a 1.9 format branch.

My best guess is that something about the conversion code is triggering a code path that is trying to open multiple connections to my master branch. I'm investigating now.

Revision history for this message
John A Meinel (jameinel) wrote :

I'll note that after the fetch fails, it leaves the master branch in a write-locked state.

So it is possible that the ConcurrentRequest issues is just because we are getting a different exception while streaming. And that the code to unlock is the bit responsible for the TooManyRequests failure, and it is masking the real failure.

Revision history for this message
John A Meinel (jameinel) wrote :

So I fetched into a different branch, and found the real error:
 File "C:\Users\jameinel\dev\bzr\work\bzrlib\remote.py", line 1912, in missing_parents_rev_handler
   revision = self.serialiser.read_revision_from_string(revision_bytes)
 File "C:\Users\jameinel\dev\bzr\work\bzrlib\chk_serializer.py", line 104, in read_revision_from_st
ing
   ret = bencode.bdecode(text)
 File "_bencode_pyx.pyx", line 218, in bzrlib._bencode_pyx.bdecode
 File "_bencode_pyx.pyx", line 83, in bzrlib._bencode_pyx.Decoder.decode
 File "_bencode_pyx.pyx", line 113, in bzrlib._bencode_pyx.Decoder._decode_object

So it seems there is something seriously wrong with johnf's ppa branch, but I don't quite understand what yet.

Now we have 2 bugs
1) something created invalid an invalid bencode stream
2) getting an error during streaming can cause TooManyConcurrentConnections during unlock, suppressing the real error

Revision history for this message
Robert Collins (lifeless) wrote :

Do we get an actual exception here? Or is it perhaps a pyx parser bug?

Revision history for this message
Robert Collins (lifeless) wrote :

Looking at this I don't think its directly tied to 2a, untargeting from 2.0: there is no reason to think its going to be widespread at this point.

I've done the following:
branched ppa-doc to /tmp [succeeds]
branched my 2.0 branch to /tmp to get a clean environment
merged from ppa-doc [succeeds]
recreated /tmp/2.0
in ppa-doc done 'bzr serve' with bzr.dev
in 2.0 done bzr merge bzr://localhost [succeeds]

So - I can't reproduce the bug with current code, and the networking layer does seem able to work.

Perhaps it is a bug in the 1.17 codebase launchpad is using?

Changed in bzr:
milestone: 2.0 → none
John A Meinel (jameinel)
summary: - failure to fetch from 1.9 to a 2a heavy checkout of bzr+ssh
+ failure to fetch from 1.9 to a 2a over bzr+ssh (revision bdecode
+ failure)
Revision history for this message
Robert Collins (lifeless) wrote :

I think this was determined to be a bug in the 1.17 smart server verb; we probably need to stop using that verb to avoid the bug. Andrew, assigning to you to get your commentary, not to ask you to fix :)

Changed in bzr:
assignee: John A Meinel (jameinel) → Andrew Bennetts (spiv)
Revision history for this message
Andrew Bennetts (spiv) wrote :

Yes, I believe this is in the 1.17 verb. Newer servers will refuse this request in this situation (cross-format fetch to 2a, IIRC), and newer clients should be using a newer verb without this bug.

We perhaps should make newer clients also fallback to vfs rather than the potentially doomed verb in this situation, so that newer clients with older servers won't fail either. (At a glance RemoteStreamSource could check self.from_repository._format.network_name() and self.to_format.network_name() before deciding whether or not it is safe to try the older verb).

Launchpad is now running 2.0.0 on the server though, so perhaps this isn't Critical importance anymore?

Revision history for this message
Andrew Bennetts (spiv) wrote :

This appears to be the same as bug 427736 (same branch, even!), which is now fixed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.