Invalid http response ... Expected a boundary

Bug #198646 reported by Stuart Bishop on 2008-03-05
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Bazaar
Medium
Martin Pool
2.3
Medium
Martin Pool
2.4
Medium
Martin Pool
Squid
Fix Released
Medium
Henrik Nordström

Bug Description

You probably want to flag this as invalid or wontfix, as the likely cause is that I am stuck behind the worlds worst transparent HTTP proxy (True Thailand).

$ bzr branch lp:bzr dev
lp:bzr is redirected to http://bazaar.launchpad.net/~bzr/bzr/trunk/
bzr: ERROR: Invalid http response for http://bazaar.launchpad.net/%7Ebzr/bzr/trunk/.bzr/repository/packs/4dc271c654e1e17ae1d88cb28202a418.pack: Expected a boundary (pWIiv3GLm(uHCu:NYZOy) line, got ''

Related branches

Stuart Bishop (stub) wrote :
John A Meinel (jameinel) wrote :

I actually got the same error from the Canonical squid proxy in the datacenter. It generally takes it about 15min before it realizes there is a problem (it seems to be copying the revision information but when it comes to copy the inventory information it fails.)

bzr: ERROR: Invalid http response for http://bazaar-vcs.org/bzr/.bzr/repository/packs/bf8c313b551d33e58f99ae1d130c3aa0.pack: Expected a boundary (squid/2.5.STABLE12:F0AD0F1A80E2FBA0AEE78B80B6E2369C) line, got ''

real 15m14.517s
user 0m10.920s
sys 0m0.370s

I'm not 100% positive how we are abusing the system. I'm guessing we are sending a complex range request, which squid decides it needs to grab the whole file and then parse. But then it actually falls over when it tries to respond to the original request.

John A Meinel (jameinel) wrote :

It seems this is happening because of how we are reading pack files. We generally make partial requests for the file, which works fine. However, we also tack on a partial request at the beginning of the file to make sure that the target file has the right header, and that seems to "abuse" proxies.

To give a concrete example, this request hangs for 15min and then dies with the boundary failure:
14.969 > GET http://bazaar-vcs.org/bzr/.bzr/repository/packs/bf8c313b551d33e58f99ae1d130c3aa0.pack
14.969 > Connection: Keep-Alive
> Accept: */*
> User-agent: bzr/1.2.0.dev.0 (urllib)
> Host: bazaar-vcs.org
> Pragma: no-cache
> Cache-control: max-age=0
> Range: bytes=0-41,4084836-6202195,6204756-9312268,9356994-21260316

If I split the requests so that the first range in any readv is sent as a separate request, then this succeeds:

17.372 > GET http://bazaar-vcs.org/bzr/.bzr/repository/packs/bf8c313b551d33e58f99ae1d130c3aa0.pack
17.372 > Connection: Keep-Alive
> Accept: */*
> User-agent: bzr/1.2.0.dev.0 (urllib)
> Host: bazaar-vcs.org
> Pragma: no-cache
> Cache-control: max-age=0
> Range: bytes=4084836-6202195,6204756-9312268,9356994-21260316

(Note that a lot of the other requests that are 0-41, XXXX are succeeding, it seems that it requires reaching a certain level of complexity/bytes/something before this gets triggered.)

Changed in bzr:
importance: Undecided → Medium
status: New → Triaged
John A Meinel (jameinel) wrote :

This is a possible patch:

=== modified file 'bzrlib/pack.py'
--- bzrlib/pack.py 2007-11-10 15:09:09 +0000
+++ bzrlib/pack.py 2008-03-05 09:28:02 +0000
@@ -161,10 +161,14 @@
 class ReadVFile(object):
     """Adapt a readv result iterator to a file like protocol."""

- def __init__(self, readv_result):
+ def __init__(self, readv_result, header=None):
         self.readv_result = readv_result
         # the most recent readv result block
- self._string = None
+ if header is not None:
+ self._string = StringIO(header)
+ self._string_length = len(header)
+ else:
+ self._string = None

     def _next(self):
         if (self._string is None or
@@ -197,10 +201,12 @@
     :param requested_records: The record offset, length tuples as returned
         by add_bytes_record for the desired records.
     """
- readv_blocks = [(0, len(FORMAT_ONE)+1)]
- readv_blocks.extend(requested_records)
+ # readv_blocks = [(0, len(FORMAT_ONE)+1)]
+ # readv_blocks.extend(requested_records)
+ header = list(transport.readv(filename, [(0, len(FORMAT_ONE)+1)]))[0][1]
     result = ContainerReader(ReadVFile(
- transport.readv(filename, readv_blocks)))
+ transport.readv(filename, requested_records),
+ header=header))
     return result

At the moment we always tack on the header read to the rest of the read, which is nice because we get one round-trip request to the server.

A *better* fix might understand the squid bug better, and then normally tack on the header, but when it would trigger the squid bug issue it separately.

Filing a bug in squid-cache.org/bugs/ with a wireshark trace of the failing traffic including both client<->squid and squid<->server traffic would help getting the Squid bug identified and fixed, if it is a Squid bug..

Caroline Ford (secretlondon) wrote :

I have this trying to get a bzr branch. I'm using Vodafone UK's hsdpa service which has web filtering etc (not sure about the internals)

bzr: ERROR: Invalid http response for http://bazaar.launchpad.net/%7Eubuntu-bugcontrol/hugday-tools/main/.bzr/repository/packs/37bbe990e05b4a29b2cf422c2c1d0a4c.pack: Expected a boundary (h1XgKUvx:.Ym.:JdawV.) line, got ''

I obviously don't have access to vodafone's internal network but I can run wireshark at my end if it helps..

On Tue, 2008-03-18 at 21:43 +0000, Caroline Ford wrote:

> I obviously don't have access to vodafone's internal network but I can
> run wireshark at my end if it helps..

A trace from your end is better than nothing.

Regards
Henrik

There is a Squid bug/misfeature which may be relevant here depending on the size of the requested .pack file.

If the first range in a partial request is byte 0 then Squid attempts to prefetch the whole file to cache it for future reference. This may take a considerable amount of time if the requested file is large. Normally this just causes a delay, but depending on timeout settings in firewalls between the client and Squid I guess this delay may cause the request to get terminated while waiting for the next chunk.

This was fixed in the 2.6.STABLE19 release made yesterday to not do this kind of prefetching unless explicitly enabled in squid.conf.

A workaround is to use a non-linear range request. I.e. instead of the first requested range being 0-400 use 200-400,0-199 or something similar.

This Squid prefetching induced delay issue is only triggered if the following conditions are all met:

- A ranged request with the first range starting at 0 (range_offset_limit setting in squid.conf, default 0)
- The ranges is sorted linearly
- There is a huge gap in the ranges of data to be skipped

The first two is the conditions for this unintentional prefetch to trigger in Squid.

The third (last) is the condition for this to cause a significant delay while processing the request

From reading the reported data in more detail I am fairly certain this is what is biting bzr.

Caroline Ford (secretlondon) wrote :
Martin Pool (mbp) wrote :

The strongest workaround is probably to use ssh rather than http,
therefore avoiding any alteration to your data.

To do this either:

1- use an explicit url like bzr+ssh://bazaar.launchpad.net/~bzr/bzr/trunk

2- run 'bzr launchpad-login' (in a recent bzr) and lp: urls should
automatically use ssh.

Martin

Caroline: It does not look like your bzr traffic is going via a Squid server at all, and quite different from the problem described by Stuart.

Your trace ends almost immediately with a short response from the server, only carrying the first of two requested ranges.

For Squid I consider this bug fixed with the release of 2.6.STABLE19. Stuart, you are welcome to confirm/reopen if still seeing the timeout after the Canonical data center Squid has been upgraded.

Squid patch: http://www.squid-cache.org/Versions/v2/2.6/changesets/11996.patch

Changed in squid:
assignee: nobody → henriknordstrom
importance: Undecided → Medium
status: New → Fix Released
Ace Suares (acesuares) wrote :

Maybe if bazaar.launchpad.net would run with https too, this problem could be avoided quickly and easily, sinc esquid doesn't mess with https... ?

Robert Collins (lifeless) wrote :

On Wed, 2008-05-07 at 18:44 +0000, Ace Suares wrote:
> Maybe if bazaar.launchpad.net would run with https too, this problem
> could be avoided quickly and easily, sinc esquid doesn't mess with
> https... ?

This isn't a launchpad problem, its a problem for any http server where
a buggy version of squid is present; also squid can do MITM of https
sessions where sysadmins have configured that (typically with enterprise
wildcard keys etc, or ssl acceleration).

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Ace Suares (acesuares) wrote :

On Wednesday 07 May 2008, Robert Collins wrote:
> On Wed, 2008-05-07 at 18:44 +0000, Ace Suares wrote:
> > Maybe if bazaar.launchpad.net would run with https too, this problem
> > could be avoided quickly and easily, sinc esquid doesn't mess with
> > https... ?
>
> This isn't a launchpad problem, its a problem for any http server where
> a buggy version of squid is present; also squid can do MITM of https
> sessions where sysadmins have configured that (typically with
> enterprise wildcard keys etc, or ssl acceleration).
>
> -Rob
> --
> GPG key available at: <http://www.robertcollins.net/keys.txt>.

I agree.

Meanwhile, I can't check out my sources.
How hard could it be to configure lp to also do https !?

But yeah, in all it's purity, it is not launchpads fault. It's the stupid
firewall.

ace

This also seems to occur with FreeProxy (http://www.handcraftedsoftware.org/).

And also ISA (the Microsoft proxy).

We have both sorts of proxy in our network. The only way around this for me would appear to be some kind of nefarious SSH/SOCKS tunnel, and getting Bazaar to use SOCKS has so far proven difficult.

It would appear the implementation used by Squid is rather common (or both my proxies are using Squid in the background :-) )

John A Meinel (jameinel) wrote :

I'm attaching a potential workaround. Basically, it adds an extra round-trip request to get the first bytes, in order to avoid the problems with the http proxies.

It, unfortunately, adds an extra round trip for all requests. A slightly better fix would be to do this only if the ranges in question would trigger the bugs in the proxy.

golions (yazfan2003) wrote :

I am also having a similar problem trying to pull from a remote sever, I get this error message:

bzr: ERROR: Invalid http response for http://bzr.savannah.gnu.org/r/gnash/.bzr/repository/indices/2c779c251220f8afce08bcec69277e32.rix: Expected a boundary ("b)LkBW?GfofYxLSueNd1", application/plain) line, got '--b)LkBW?GfofYxLSueNd1
'

The string after "Expected a boundary (" is different every time I try and pull.
I am using bzr 1.5, and I have attached my .bzr.log

John A Meinel (jameinel) wrote :

Something isn't right about the boundary line.

Specifically, the upstream server is quoting the boundary string, and then followin git up with ", application/plain". Which I'm guessing is not valid according to the RFC, but I'm not well versed in it.

golions-
It would help if you could run "bzr pull -Dhttp" and attach that log. As it will write out more details about the specific HTTP requests being made.

I think this is actually different from the original bug. Namely the original bug is that we trigger some proxies to fail to return anything, while your problem is that the boundary string is not being parsed correctly.

My guess is that Savannah is returning:

Boundary: "b)LkBW?GfofYxLSueNd1", application/plain

Rather than

Boundary: "b)LkB..."
Content-type: application/plain

We probably could work around this in bzr, but it seems more like something that Savannah is doing wrong.

golions (yazfan2003) wrote :

I tried to pull again using bzr pull -Dhttp.

Here is my Dhttp output.

golions (yazfan2003) wrote :

Here is my .bzr.log after using bzr pull -Dhttp.

John A Meinel (jameinel) wrote :

Well to start with, it only seems to affect multipart requests. Which isn't very surprising since there is no "boundary" for single-part requests.

If I'm reading the log correctly, though, I don't understand why we are failing. Specifically, it looks like Savannah is returning:
< Connection: Keep-Alive
< Content-Type: multipart/byteranges; boundary=":gn1T,l__lXvUYr73ia-"
< Date: Thu, 31 Jul 2008 16:12:17 GMT
< Server: Apache/2.2.3 (Debian) DAV/2
< Last-Modified: Thu, 26 Jun 2008 22:04:56 GMT
< ETag: "1c9c203-9e5b0-fbc3b200"
< Accept-Ranges: bytes
< Content-Type: application/plain
< Via: 1.0 hinet-C233.8

Which is perfectly clear about what the boundary should be. I wonder if it is because Savannah is returning 2 Content-type: strings, which is confusing pycurl.

Specifically, I'm guessing that pycurl is just appending the second value onto the first value, and then getting:

content_type = 'multipart/byteranges; boundary=";gn1...." application/plain'

And then it fails to parse the boundary="" portion. (It doesn't find the quotes to remove because of the trailing information.)

Could you try doing the same fetch without pycurl? Specifically try:

bzr pull http+urllib://original/path

We have 2-http implementations, and it is possible that urllib handles this better.

That said, is it legal to give 2 ContentType values? (Certainly urllib might break under this condition as well, but at least we could try it.)

golions (yazfan2003) wrote :

I still get the same error when using urllib. Here is my log file.

I should note that I started having this problem a few weeks ago, but it went away a few days ago, and I was able to successfully pull from http://bzr.savannah.gnu.org/r/gnash. However, since yesterday, this problem has returned.

John A Meinel (jameinel) wrote :

Either savannah changed their system to not include Content-type more than once, *or* it worked because you happened to have a fetch that didn't need multiple ranges. (eg. bytes = 10-60, rather than bytes = 10-20,50-60. In the former case, we don't get a multipart response, so we don't receive multiple Content-type results.)

Ultimately vila or lifeless are the ones who would know the spec, to know whether Savannah is wrong for including 2 Content-Type fields, or if urllib and pycurl are wrong for failing to parse a boundary string under these circumstances.

John A Meinel (jameinel) wrote :

golions-

I opened bug #253745 for your specific error. Which I believe is a different error than this bug. Can we switch future discussion over to that bug?

On tor, 2008-07-31 at 17:03 +0000, John A Meinel wrote:

> Which is perfectly clear about what the boundary should be. I wonder if
> it is because Savannah is returning 2 Content-type: strings, which is
> confusing pycurl.

Quite likely. Having two Content-Type in the same HTTP message is a
clear protocol violation, and different implementations react very
differently when seeing this.

> That said, is it legal to give 2 ContentType values?

No. Content-Type is a single-valued header, not a list header.

Regards
Henrik

Martin Pool (mbp) on 2010-03-18
Changed in bzr:
status: Triaged → Confirmed
Jelmer Vernooij (jelmer) on 2011-02-01
tags: added: http
Dirk T (miriup) wrote :

I'm having this or a similar problem with bazaar 2.3.1 and 2.2.2. I'm getting this error:

-----8<-----
 * bzr branch start -->
 * repository: lp:rescuetime-linux-uploader => /home/portage/distfiles/bzr-src/rescuetime-linux-uploader
You have not informed bzr of your Launchpad ID, and you must do this to
write to Launchpad or access private data. See "bzr help launchpad-login".
bzr: ERROR: Invalid http response for http://bazaar.launchpad.net/~rescuetime-developers/rescuetime-linux-uploader/trunk/.bzr/repository/packs/131ce2d0f2883b17b081d43af081770d.pack: Expected a boundary (Q(f9?SXuX1RnXykU6w)h) line, got ''
-----8<-----

The breaking HTTP looks like this (wireshark output):

-----8<-----
GET /~rescuetime-developers/rescuetime-linux-uploader/trunk/.bzr/repository/packs/131ce2d0f2883b17b081d43af081770d.pack HTTP/1.1
Accept-Encoding: identity
Host: bazaar.launchpad.net
Accept: */*
User-Agent: bzr/2.3.1 (urllib)
Connection: Keep-Alive
Pragma: no-cache
Cache-Control: max-age=0
Range: bytes=0-41,448-790

HTTP/1.1 206 Partial content
Content-Type: multipart/byteranges; boundary="Q(f9?SXuX1RnXykU6w)h"
Date: Fri, 06 May 2011 17:34:20 GMT
Server: Apache/2.2.14 (Ubuntu)
Last-Modified: Tue, 22 Dec 2009 20:57:07 GMT
ETag: "154623e-477-47b5772e7aec0"
Accept-Ranges: bytes
Via: 1.1 bazaar.launchpad.net
Connection: close
Age: 0

--Q(f9?SXuX1RnXykU6w)h
Content-Type: text/plain
Content-Range: bytes 0-41/1143

Bazaar pack format 1 (introduced in 0.18)
-----8<-----

The above request is the only one in the stream with Content-Type: multipart/byteranges. Notice that there are not two Content-Type headers in the server response. I'm also behind a transparent proxy server.

Dirk T (miriup) wrote :

I did some more investigation. What made this transfer special weren't the ranges. It were the multiple ranges. The transfer above is complete. My bzr client has requested the byte ranges 0-41 and 448-790 from the file 131ce2d0f2883b17b081d43af081770d.pack, but got only 0-41. Obviously it expects another boundary with the range 448-790 and missing that it dies.

I did the transfer once through my proxy and once through a SSH tunnel and filtering out some headers such as 'Host' that are different due to the method I get the following difference at the failing transfer:

-----8<-----
@@ -1498,18 +1436,1967 @@
 Cache-Control: max-age=0^M
 Range: bytes=0-41,448-790^M
 ^M
-HTTP/1.1 206 Partial content^M
-Content-Type: multipart/byteranges; boundary="Q(f9?SXuX1RnXykU6w)h"^M
+HTTP/1.1 206 Partial Content^M
 Server: Apache/2.2.14 (Ubuntu)^M
 Last-Modified: Tue, 22 Dec 2009 20:57:07 GMT^M
 ETag: "154623e-477-47b5772e7aec0"^M
 Accept-Ranges: bytes^M
+Content-Length: 579^M
+Content-Type: multipart/byteranges; boundary=4a29fa408f5635eed^M
 Via: 1.1 bazaar.launchpad.net^M
-Connection: close^M
-Age: 0^M
+Connection: Keep-Alive^M
 ^M
---Q(f9?SXuX1RnXykU6w)h^M
-Content-Type: text/plain^M
-Content-Range: bytes 0-41/1143^M
+^M
+--4a29fa408f5635eed^M
+Content-type: text/plain^M
+Content-range: bytes 0-41/1143^M
+^M
+Bazaar pack format 1 (introduced in 0.18)
+^M
+--4a29fa408f5635eed^M
+Content-type: text/plain^M
+Content-range: bytes 448-790/1143^M
+^M
+B337
+
# Additional binary + lines cut out.
-----8<-----

Notice also that the proxy server actually changes "Connection: Keep-Alive" to "Connection: close", which it didn't do for the earlier transfers, so it obviously has also noticed that there's something wrong with this transfer. The former means to keep the TCP channel (which we use for the HTTP) open and the latter means to close it after this transfer. So it's very apparent the fault lies at whatever is inbetween me and launchpad.

As the bigger chunk of the "developing world" enforces proxy servers on a national level, this breakage will be widespread. Though the fault lies actually at the man in the middle, in most cases it will be next to impossible to get it fixed there. My suggested workaround-fix would be to detect the above circumstances and to reissue the request for the second range again. Maybe optionally some internal flag could be set as well that for all remaining requests ranges are not pooled.

I'd even implement this modification if the regulars here consider my suggestion useful. I definitely have a good testing ground. ;)

Martin Pool (mbp) wrote :
Download full text (3.3 KiB)

Thanks for analyzing this. We'd be happy to mentor you to put in a
workaround for the problem.

Getting one range at a time may be somewhat slower so we probably don't want
to do it unless it's necessary.

I would be inclined to start by adding a config option to not send multiple
range requests, and then later see about automatically detecting the need to
use it. Just ask if you want help or a review.
On May 6, 2011 9:11 PM, "Miriup" <email address hidden> wrote:
> I did some more investigation. What made this transfer special weren't
> the ranges. It were the multiple ranges. The transfer above is complete.
> My bzr client has requested the byte ranges 0-41 and 448-790 from the
> file 131ce2d0f2883b17b081d43af081770d.pack, but got only 0-41. Obviously
> it expects another boundary with the range 448-790 and missing that it
> dies.
>
> I did the transfer once through my proxy and once through a SSH tunnel
> and filtering out some headers such as 'Host' that are different due to
> the method I get the following difference at the failing transfer:
>
> -----8<-----
> @@ -1498,18 +1436,1967 @@
> Cache-Control: max-age=0^M
> Range: bytes=0-41,448-790^M
> ^M
> -HTTP/1.1 206 Partial content^M
> -Content-Type: multipart/byteranges; boundary="Q(f9?SXuX1RnXykU6w)h"^M
> +HTTP/1.1 206 Partial Content^M
> Server: Apache/2.2.14 (Ubuntu)^M
> Last-Modified: Tue, 22 Dec 2009 20:57:07 GMT^M
> ETag: "154623e-477-47b5772e7aec0"^M
> Accept-Ranges: bytes^M
> +Content-Length: 579^M
> +Content-Type: multipart/byteranges; boundary=4a29fa408f5635eed^M
> Via: 1.1 bazaar.launchpad.net^M
> -Connection: close^M
> -Age: 0^M
> +Connection: Keep-Alive^M
> ^M
> ---Q(f9?SXuX1RnXykU6w)h^M
> -Content-Type: text/plain^M
> -Content-Range: bytes 0-41/1143^M
> +^M
> +--4a29fa408f5635eed^M
> +Content-type: text/plain^M
> +Content-range: bytes 0-41/1143^M
> +^M
> +Bazaar pack format 1 (introduced in 0.18)
> +^M
> +--4a29fa408f5635eed^M
> +Content-type: text/plain^M
> +Content-range: bytes 448-790/1143^M
> +^M
> +B337
> +
> # Additional binary + lines cut out.
> -----8<-----
>
> Notice also that the proxy server actually changes "Connection: Keep-
> Alive" to "Connection: close", which it didn't do for the earlier
> transfers, so it obviously has also noticed that there's something wrong
> with this transfer. The former means to keep the TCP channel (which we
> use for the HTTP) open and the latter means to close it after this
> transfer. So it's very apparent the fault lies at whatever is inbetween
> me and launchpad.
>
> As the bigger chunk of the "developing world" enforces proxy servers on
> a national level, this breakage will be widespread. Though the fault
> lies actually at the man in the middle, in most cases it will be next to
> impossible to get it fixed there. My suggested workaround-fix would be
> to detect the above circumstances and to reissue the request for the
> second range again. Maybe optionally some internal flag could be set as
> well that for all remaining requests ranges are not pooled.
>
> I'd even implement this modification if the regulars here consider my
> suggestion useful. I definitely have a good testing ground. ;)
>
> --
> You received this bug n...

Read more...

Vincent Ladeuil (vila) wrote :

>>>>> Miriup writes:

    > I did some more investigation. What made this transfer special weren't
    > the ranges. It were the multiple ranges. The transfer above is complete.
    > My bzr client has requested the byte ranges 0-41 and 448-790 from the
    > file 131ce2d0f2883b17b081d43af081770d.pack, but got only 0-41. Obviously
    > it expects another boundary with the range 448-790 and missing that it
    > dies.

    > I did the transfer once through my proxy and once through a SSH tunnel
    > and filtering out some headers such as 'Host' that are different due to
    > the method I get the following difference at the failing transfer:

This means the culprit is your proxy no ?

<snip/>

    > Notice also that the proxy server actually changes "Connection:
    > Keep- Alive" to "Connection: close", which it didn't do for the
    > earlier transfers, so it obviously has also noticed that there's
    > something wrong with this transfer.

Well, wrong for the proxy ;)

This is also bad for the client and in the end for the user as it means
the connection will have to be re-established incurring latency penalty.

<snip/>

    > As the bigger chunk of the "developing world" enforces proxy
    > servers on a national level, this breakage will be widespread.

:-/

<snip/>

    > I'd even implement this modification if the regulars here consider my
    > suggestion useful. I definitely have a good testing ground. ;)

You're welcome to work on this !

I suggest you look at the bzr http client implementations which already
implement a fallback mechanism when encountering servers which doesn't
implement multiple range requests.

I'm even surprised it didn't trigger in this case.

Ideally, this should be turned into a config option that could be
activated upfront on a server basis (so you can explicitly turn it on
without waiting for the first error)

Thinking about it, the fallback turn multiple ranges into single
requests merging all ranges into the single encompassing one (which
means we transfer more data than needed). But in your case, it may be
better to issue one request by range instead, while still delivering the
same result on the callers.

I'd be happy to help you work on this so feel free to create a merge
proposal as soon as you have questions or are blocked in any way.

The bzr client implementations are under bzrlib/transport/http and the
corresponding tests in bzrlib/tests/test_http*.

Amos Jeffries (yadi) wrote :

Indeed all the signs are that it is the proxy behaviour triggering the problem. Being an obsolete Squid release with a fix available for more than 4 years, as stated by Henrik earlier and re-confirmed by myself today, I doubt this will have much of a negative impact for new rollouts. The major reasons for rolling out proxies are to mediate between modern IP protocols and content adaptation/filtering. Capabilities the broken Squid releases are lacking, so even less likely that older versions will be installed in these critical places.

The major problem is the large legacy install base and non-proxy intermediaries cutting connections. Making bzr handle these old middleware or unexpected failures in general is a good thing.

Vincent Ladeuil (vila) wrote :

@Amos: /me nods

We didn't have much reports of this kind for a long time and I thought wee never will.

But the last report made me realize that in some cases the proxy may just be out of reach (or even invisible) for many people.

Martin Pool (mbp) wrote :

I think definitely giving a response that indicates what the problem
is will be good, so people don't blame bzr.

Dirk T (miriup) wrote :

Vincent Ladeuil wrote:
> I suggest you look at the bzr http client implementations which already
> implement a fallback mechanism when encountering servers which doesn't
> implement multiple range requests.
> I'm even surprised it didn't trigger in this case.

I've just found these code bits and just wanted to put a quick update here. As far as I can see, this mechanism kicks in in the following cirumstances (_readv in http/__init.py):

- errors.ShortReadvError: short-read; the the range is actually incomplete
- errors.InvalidRange
- errors.InvalidHttpRange: when the range headers are malformed.

However, none of these are applying for reading the boundary marker (read_boundary in http/response.py) when it is missing. I will first attempt fitting the necessary adaption there.

I'll see if I can use the existing mechanism for issues in Content-Range's (_degrade_range_hint in http/__init__.py). According to their documentation the degration flag propagates whenever the connection object is cloned - not sure when this is done, though. Even if it doesn't propagate and we're not doing requests in parallel, we'll be using the same HTTP request object for the remaining requests --- and that's the one who's keeping the HTTP connection open.

> This is also bad for the client and in the end for the user as it means
> the connection will have to be re-established incurring latency penalty.

Regarding latency I think flag on fail is a good solution. A user not affected by evil proxies will not take a performance hit. For the rest of us this re-establishment will happen exactly once per HTTP client object and then the degraded mode gets activated and successive requests over the same connection will request individual ranges and the connection will be kept alive. This is definitely preferable over a broken checkout. :)

Regarding a config option I have a few reasons to refrain from doing so:

1. For an ordinary user it will be very difficult to determine whether he's affected by this issue or not. He'd have to be aware of a transparent proxy being in place and he'd have to be aware that those bazaar requests are quite uncommon. Most of the web works very well without HTTP multipart responses. ;)
2. Affected are only users who fetch with bazaar through HTTP. That means amongst the people behind an evil proxies only people are hit that are *not* fetching sources using typical developer transport connections (like https or ssh). Most projects offer .tar.gz's and you'd ordinarily download these. I stumbled over the issue only myself, because I was writing a live ebuild for Gentoo that checks out another project using bazaar.
3. /me would have to fiddle with two proably quite distant (in terms of cross references) parts of bazaar. I leave that to you pros. :-P

Vincent Ladeuil (vila) wrote :
Download full text (4.1 KiB)

>>>>> Miriup writes:

<snip/>

    > I've just found these code bits and just wanted to put a quick update
    > here. As far as I can see, this mechanism kicks in in the following
    > cirumstances (_readv in http/__init.py):

    > - errors.ShortReadvError: short-read; the the range is actually incomplete
    > - errors.InvalidRange
    > - errors.InvalidHttpRange: when the range headers are malformed.

    > However, none of these are applying for reading the boundary marker
    > (read_boundary in http/response.py) when it is missing.

Correct.

    > I will first attempt fitting the necessary adaption there.

Great.

    > I'll see if I can use the existing mechanism for issues in Content-
    > Range's (_degrade_range_hint in http/__init__.py). According to their
    > documentation the degration flag propagates whenever the connection
    > object is cloned - not sure when this is done, though.

Hmm, yeah, it's a bit unclear and hairy (and there are edge cases where
it's bogus):

- it starts at None,
- it's set once the first multi-range request succeeds (in whatever form
  including retries),
- it propagates by cloning (see __init__ in HttpTransportBase),

So if cloning occurs before the first successful request, each clone
have to rediscover it.

A better way would be make this attribute part of the connection
parameters (which contains only the authentication data so far). Don't
worry about that too much in a first step if you feel it's too complex.

Anyway, the place you should detect the bug and raise a specific
exception is probably in http/response.py, catching it should occur
where the hint is handled.

    > Even if it doesn't propagate and we're not doing requests in
    > parallel, we'll be using the same HTTP request object for the
    > remaining requests --- and that's the one who's keeping the HTTP
    > connection open.

You mean connection not request here right ? The connection attribute in
the request object is the one that is shared between the transports and
this achieved via the request objects but we create a request object
for... each request.

    >> This is also bad for the client and in the end for the user as it
    >> means the connection will have to be re-established incurring
    >> latency penalty.

    > Regarding latency I think flag on fail is a good solution. A user
    > not affected by evil proxies will not take a performance hit. For
    > the rest of us this re-establishment will happen exactly once per
    > HTTP client object and then the degraded mode gets activated and
    > successive requests over the same connection will request
    > individual ranges and the connection will be kept alive. This is
    > definitely preferable over a broken checkout. :)

Yes, that's the idea.

    > Regarding a config option I have a few reasons to refrain from
    > doing so:

    > 1. For an ordinary user it will be very difficult to determine
    > whether he's affected by this issue or not. He'd have to be aware
    > of a transparent proxy being in place and he'd have to be aware
    > that those bazaar requests are quite uncommon. Most of the web
    > works very well without HTTP multipart respons...

Read more...

Dirk T (miriup) on 2011-05-08
Changed in bzr:
assignee: nobody → Miriup (dirk-miriup)
status: Confirmed → In Progress
Dirk T (miriup) wrote :

Hi,

sorry it took so long. After having made all these discoveries there wasn't really much work left for the patch. But somehow I haven't got around to fixing it any earlier.

I've tested this patch against a bazaar from the trunk (branched around my solution post) and against 2.3.1 from the Gentoo tree. In both cases the bug did not appear for me and I got a simingly clean repository. I have tested completeness by checking out lp:rescuetime-linux-uploader/trunk once through SSH and once through HTTP with this fix applied and then compared both copies with diff. The result:

-----8<-----
diff -ur /tmp/bug-674122/.bzr/branch/parent /tmp/bug2/.bzr/branch/parent
--- /tmp/bug-674122/.bzr/branch/parent 2011-05-28 22:11:56.000000000 +0400
+++ /tmp/bug2/.bzr/branch/parent 2011-05-28 22:29:22.000000000 +0400
@@ -1 +1 @@
-http://bazaar.launchpad.net/~rescuetime-developers/rescuetime-linux-uploader/trunk/
+bzr+ssh://bazaar.launchpad.net/%2Bbranch/rescuetime-linux-uploader/trunk/
diff -ur /tmp/bug-674122/.bzr/checkout/format /tmp/bug2/.bzr/checkout/format
--- /tmp/bug-674122/.bzr/checkout/format 2011-05-28 22:11:56.000000000 +0400
+++ /tmp/bug2/.bzr/checkout/format 2011-05-28 22:29:22.000000000 +0400
@@ -1 +1 @@
-Bazaar Working Tree Format 4 (bzr 0.15)
+Bazaar Working Tree Format 6 (bzr 1.14)
Only in /tmp/bug2/.bzr/checkout: views
-----8<-----

Not sure about the .bzr/checkout/views file. It seems empty.

Vincent Ladeuil (vila) wrote :

Thanks for the feedback and the patch.

We need tests before landing it but your patch already capture quite clearly what the issue is about and it's nice to know it works and I'm glad it works for *you* :)

I'll mark this bug with the patch-needswork tag so we know there is valuable work here that needs polish.

tags: added: patch-needswork
Dirk T (miriup) wrote :

Just tried a quick shot at it, but I can't get my head into your test suite that easily. I fail to see how to add a new test class to test_http.py. Are you guys hanging out on IRC somewhere?

The problem, as I understand it currently, should have been covered by TruncatedMultipleRangeRequestHandler. I don't understand why it didn't catch it.

Vincent Ladeuil (vila) wrote :

> Just tried a quick shot at it, but I can't get my head into your test suite that easily.

Hmm, yeah, test_http is not the easiest to start with :-/

> I fail to see how to add a new test class to test_http.py. Are you guys hanging out on IRC somewhere?

Yup, #bzr on freenode.

> The problem, as I understand it currently, should have been covered by TruncatedMultipleRangeRequestHandler. I don't understand why it didn't catch it.

Probably because it's different enough or we would not have encounter the bug ;-)

If you're willing to keep working on it, I'd gladly help you. The best way for that would be to push your branch and make a merge proposal so we can continue the discussion there instead of here. It also make is easier to collaborate on this branch. Roughly, in the directory where you committed your changes:

  bzr push lp:~dirk-miriup/bzr/198646-missing-boundary

will create a branch on launchpad

  bzr lp-propose

will help you create the merge proposal or just do:

  bzr lp-open

and click the 'Propose for merging' link.

Graham White (graham-alton) wrote :

I just had the "Invalid http response ... Expected a boundary" problem and found this bug report via a web search.

I applied the patch from comment #36 above to my bzr (version 2.2.4 on Fedora 14) and the problem appears to have been remedied with the bzr command I was attempting to run now working perfectly.

Martin Pool (mbp) wrote :

On 27 July 2011 17:18, Graham White <email address hidden> wrote:
> I just had the "Invalid http response ... Expected a boundary" problem
> and found this bug report via a web search.
>
> I applied the patch from comment #36 above to my bzr (version 2.2.4 on
> Fedora 14) and the problem appears to have been remedied with the bzr
> command I was attempting to run now working perfectly.

Thanks for confirming that, Graham; this is very likely triggered by a
bad proxy so none of us can reproduce it.

I think we would just about be better off merging this as it is,
though it would be nice to add a test that simulates what we believe
to be the behaviour.

Martin Pool (mbp) wrote :

I would have thought that patch <https://launchpadlibrarian.net/72583524/bzr-boundary-missing.patch> would crash if the error is ever raised, because it tries to refer to an apparently unbound variable 'range'. But the patch does seem pretty plausible for Henrik's description of the squid bug we're trying to work around, so we should probably take it.

Martin Pool (mbp) wrote :

> The problem, as I understand it currently, should have been covered by TruncatedMultipleRangeRequestHandler. I don't understand why it didn't catch it.

The difference is that it tests truncation between the boundary line and the content block, whereas this is occurring before the boundary line.

Martin Pool (mbp) wrote :
Changed in bzr:
assignee: Miriup (dirk-miriup) → Martin Pool (mbp)
John A Meinel (jameinel) on 2011-08-09
Changed in bzr:
status: In Progress → Fix Released
milestone: none → 2.5b1
Craig Weber (weberc2) wrote :

I'm still experiencing this bug in Ubuntu 10.04. This is the dump from `bzr version`:

Bazaar (bzr) 2.1.4
  Python interpreter: /usr/bin/python 2.6.5
  Python standard library: /usr/lib/python2.6
  Platform: Linux-2.6.32-45-generic-i686-with-Ubuntu-10.04-lucid
  bzrlib: /usr/lib/python2.6/dist-packages/bzrlib
  Bazaar configuration: /home/weberc2/.bazaar
  Bazaar log file: /home/weberc2/.bzr.log

Presumably, I'm using an older version of Bazaar; however, 10.04 is an LTS release (supported until this April), so it should get these fixes, right? If I'm mistaken, could someone please tell me how to get this fix in my Ubuntu installation?

Dirk T (miriup) wrote :

The bug report says the fix is released with version 2.3 forward, so with your 2.1 you're definitely lagging behind. ;-)

I'm not too much of a Ubuntu geek, but I think LTS only means they are going to release critical bug fixes. Purely technically speaking the bug here was in a 3rd party component and bazaar was just adjusted to enable a workaround when it detects the condition.

You can download updated bazaar packages from here apparently: https://launchpad.net/~bzr/+archive/ppa?field.series_filter=lucid.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers