Fail to parse boundary if multiple Content-Type headers are given

Bug #253745 reported by John A Meinel on 2008-07-31
4
Affects Status Importance Assigned to Milestone
Bazaar
High
Unassigned
Breezy
Medium
Unassigned

Bug Description

See the trailing discussion from bug #198646

If an HTTP Server returns multiple Content-Type entries for a multi-part response, we fail to properly parse the boundary="" string. This happens for both urllib and pycurl.

It seems that bzr.savannah.org is returning headers like:

7.658 < HTTP/1.0 206 Partial content
7.658 < Connection: Keep-Alive
< Content-Type: multipart/byteranges; boundary="zrbUwOxpkyBKv'eb)M,s"
< Date: Thu, 31 Jul 2008 17:21:46 GMT
< Server: Apache/2.2.3 (Debian) DAV/2
< Last-Modified: Thu, 26 Jun 2008 22:04:56 GMT
< ETag: "1c9c203-9e5b0-fbc3b200"
< Accept-Ranges: bytes
< Content-Type: application/plain
< Via: 1.0 hinet-C233.10

The first one is a valid entry for multi-part content, and includes a clear description of the boundary string that will be used for parsing.

What seems to be happening, is that both pycurl and urllib are concatenating the Content-Type strings to get:

content_type = 'multipart/byteranges; boundary="zrbUwOxpkyBKv'eb)M,s", application/plain'

And then finding boundary = '"zrbUwOxpkyBKv'eb)M,s", application/plain'

It is unknown at this time whether Savannah is incorrect to return multiple Content-Type fields, or whether urllib and pycurl are incorrect to concatenate them before parsing the boundary.

John A Meinel (jameinel) wrote :

There are http log traces available in bug #198646.

I'm marking this as 'High' because it means that bzr branches on Savannah cannot be reliably accessed. (It works as long as you don't need a multi-part request, otherwise it fails.)

Changed in bzr:
importance: Undecided → High
status: New → Triaged

On Thu, 2008-07-31 at 18:57 +0000, John A Meinel wrote:
> There are http log traces available in bug #198646.
>
> I'm marking this as 'High' because it means that bzr branches on
> Savannah cannot be reliably accessed. (It works as long as you don't
> need a multi-part request, otherwise it fails.)

We should clearly diagnose this to the user; however it is fundamentally
a savannah problem and not something we can reliably fix - we're getting
a vector where we expected a scalar; we can't tell reliably which one we
should take, and the complexity in trying to do so would be pure cruft.

-Rob

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

golions (yazfan2003) wrote :

I just tried to pull again, and it worked with no problems. This has happened before, where I would get an error messages for a few days, and then all of a sudden it would work again. Unfortunately, I did not do the pull with -Dhttp, but I have attached my log file any way. If that helps.

John A Meinel (jameinel) wrote :

Well, I see this:
15.562 http readv of 2c779c251220f8afce08bcec69277e32.rix offsets => 2 collapsed 2
...
21.370 http readv of 7c2a6579c7a01088577744c8b461dae4.pack offsets => 8 collapsed 3

Which means that we successfully issued several multi-part requests.

It would seem that Savannah isn't always broken in sending multiple Content-Type headers. I wonder if there is a transparent proxy somewhere that is breaking this.

On bug #198646 Henrik Nordström comments that Savannah is indeed wrong in giving 2 Content-Type headers.

Robert- How would you suggest diagnosing this? Change our urllib wrappers to recognize 2 Content-Type headers and abort with a message to the user? Do some sort of search to see:

  if found_boundary.startswith('--') and found_boundary[2:].rstrip() in expected_boundary

Which would be a likely indication that we miss-parsed the expected_boundary string.

Robert Collins (lifeless) wrote :

On Fri, 2008-08-01 at 16:05 +0000, John A Meinel wrote:
>
>
> Robert- How would you suggest diagnosing this? Change our urllib
> wrappers to recognize 2 Content-Type headers and abort with a message
> to
> the user?

I think this would be good.

> Do some sort of search to see:
>
> if found_boundary.startswith('--') and found_boundary[2:].rstrip()
> in
> expected_boundary
>
> Which would be a likely indication that we miss-parsed the
> expected_boundary string.

I don't know - is that a recurring pattern?

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Martin Pool (mbp) on 2010-03-18
Changed in bzr:
status: Triaged → Confirmed
Jelmer Vernooij (jelmer) on 2017-11-08
tags: added: check-for-breezy
Jelmer Vernooij (jelmer) on 2017-11-11
tags: removed: check-for-breezy
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments