bzr: ERROR: Invalid range access when going through a SQUID proxy

Bug #57723 reported by Ian Jackson
6
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
John A Meinel

Bug Description

On edgy:

ian@liberator:/work/Upstart$ bzr get http://www.netsplit.com/bzr/libnih
bzr: ERROR: Invalid range access.
ian@liberator:/work/Upstart$ bzr get http://www.netsplit.com/bzr/libnih
bzr: ERROR: Target directory "libnih" already exists.
ian@liberator:/work/Upstart$ rm -r libnih
ian@liberator:/work/Upstart$ bzr --version
bzr (bazaar-ng) 0.9.0

In case it was due to the weirdo intercepting poxy we have here at the distro sprint I tried it going via my VPN connection to my house network's squid (sarge, parent is another sarge box in my colo) and that had no better effect.

Related branches

Revision history for this message
John A Meinel (jameinel) wrote :

Hmm.. I'm using Dapper, and I was able to get a copy of that branch.

How quickly does it fail for you?

Do you already have a copy of some of the ancestry? (In which case it might be trying to do a partial download, rather than a full download, and be placing an incorrect HTTP partial get request)

What server is running at http://www.netsplit.com/
The directory listing doesn't look like Apache.

Can you post the traceback from ~/.bzr.log?

Judging by the error, it seems we might be trying to read past the end of the file, or something like that.

You might also try this patch, since it should give a slightly better error message:=== modified file 'bzrlib/errors.py'
--- bzrlib/errors.py 2006-08-22 21:31:23 +0000
+++ bzrlib/errors.py 2006-08-25 15:44:35 +0000
@@ -776,11 +776,13 @@

 class InvalidRange(TransportError):
- """Invalid range access."""
+ """Invalid range access in %(path)s at %(offset)s."""

     def __init__(self, path, offset):
         TransportError.__init__(self, ("Invalid range access in %s at %d"
                                        % (path, offset)))
+ self.path = path
+ self.offset = offset

Revision history for this message
andi5 (andi5) wrote :

I experience this too, see the attachment for details.
I have no clue about bazaar_ng though.

Revision history for this message
John A Meinel (jameinel) wrote :

Thanks for the traceback. I was able to glean a couple things from it.

1) This should be failing fairly early, since it seems to be failing while accessing the inventory.knit.
http readv of inventory.knit collapsed 165 offsets => [[0, 82622], [83092, 91023]]

2) We are making a ranged request, such that we should be requesting something like:

Ranges: bytes=0-82622,83092-91023

3) And then we get this:
InvalidRange: Invalid range access in http://www.netsplit.com/bzr/libnih/.bzr/repository/inventory.knit at 0

The code is doing:
        i = bisect(self._ranges, self._pos) - 1
        if i < 0 or self._pos > self._ranges[i]._ent_end:
            raise errors.InvalidRange(self._path, self._pos)

So it might be an issue with how we are using bisect(). But in general, with 'bisect()' it returns the position *after* the first matching position.

Doing:
import bisect
print bisect.bisect([0,1,2], 0)
1

Can you use the attached patch ot give us some more understanding about why it is failing?

Revision history for this message
andi5 (andi5) wrote :

Right now I do not have the time to do more, but here is an update:
Bisect for pos: 0 failed. Found offset: -1, ranges:[]

Revision history for this message
John A Meinel (jameinel) wrote :

So this seems like it could be 1 of 2 things.

1) The download failed and we have 0 bytes available, and we are failing with the wrong error message.

2) The download succeeded, but with a complete file, rather than a set of byte-ranges.

However, if the server returns a '200' complete response, we just return a StringIO file, not a RangeFile, so we shouldn't even be here.

If the server returns a 206 partial content file, then there should either be a 'Content-Range' header for a single range, or a Content-Type header indicating it is Multipart.

My best guess is that the server is returning a Multipart file, but we aren't detecting this correctly, so we try to parse it as a single range file, but the 'Content-Range' header is empty.
*Or* we are treating it as a multipart, but we are failing to find the content boundaries properly.

Unfortunately, that specific branch seems to work just fine for me. So I'm not able to reproduce the problem here. I can probably work out some more patches to be tried. But I think it would be better to meet on IRC or something, so we can have a more interactive debugging session.

Revision history for this message
John A Meinel (jameinel) wrote :

Squid is setting the multipart boundary to something like:
'multipart/byteranges; boundary="squid/2.6.STABLE1:2F333CDABEAF766ABD9130F8A21BA7E1"'

Note that the boundary is surrounded by "double quotes"

This seems to be allowed according to this spec:
http://www.w3.org/Protocols/rfc1341/7_2_Multipart.html

Will attach a patch soon.

description: updated
Changed in bzr:
assignee: nobody → jameinel
importance: Untriaged → Medium
status: Unconfirmed → Confirmed
Revision history for this message
John A Meinel (jameinel) wrote :

The associated patch changes bzr to strip "" if it is found. This makes it work through a squid proxy.

Revision history for this message
John A Meinel (jameinel) wrote :

in bzr-0.11

Changed in bzr:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.