Fail to parse boundary if multiple Content-Type headers are given

Bug #253745 reported by John A Meinel
4
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
High
Unassigned
Breezy
Triaged
Medium
Unassigned

Bug Description

See the trailing discussion from bug #198646

If an HTTP Server returns multiple Content-Type entries for a multi-part response, we fail to properly parse the boundary="" string. This happens for both urllib and pycurl.

It seems that bzr.savannah.org is returning headers like:

7.658 < HTTP/1.0 206 Partial content
7.658 < Connection: Keep-Alive
< Content-Type: multipart/byteranges; boundary="zrbUwOxpkyBKv'eb)M,s"
< Date: Thu, 31 Jul 2008 17:21:46 GMT
< Server: Apache/2.2.3 (Debian) DAV/2
< Last-Modified: Thu, 26 Jun 2008 22:04:56 GMT
< ETag: "1c9c203-9e5b0-fbc3b200"
< Accept-Ranges: bytes
< Content-Type: application/plain
< Via: 1.0 hinet-C233.10

The first one is a valid entry for multi-part content, and includes a clear description of the boundary string that will be used for parsing.

What seems to be happening, is that both pycurl and urllib are concatenating the Content-Type strings to get:

content_type = 'multipart/byteranges; boundary="zrbUwOxpkyBKv'eb)M,s", application/plain'

And then finding boundary = '"zrbUwOxpkyBKv'eb)M,s", application/plain'

It is unknown at this time whether Savannah is incorrect to return multiple Content-Type fields, or whether urllib and pycurl are incorrect to concatenate them before parsing the boundary.

Tags: http
Revision history for this message
John A Meinel (jameinel) wrote :

There are http log traces available in bug #198646.

I'm marking this as 'High' because it means that bzr branches on Savannah cannot be reliably accessed. (It works as long as you don't need a multi-part request, otherwise it fails.)

Changed in bzr:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 253745] Re: Fail to parse boundary if multiple Content-Type headers are given

On Thu, 2008-07-31 at 18:57 +0000, John A Meinel wrote:
> There are http log traces available in bug #198646.
>
> I'm marking this as 'High' because it means that bzr branches on
> Savannah cannot be reliably accessed. (It works as long as you don't
> need a multi-part request, otherwise it fails.)

We should clearly diagnose this to the user; however it is fundamentally
a savannah problem and not something we can reliably fix - we're getting
a vector where we expected a scalar; we can't tell reliably which one we
should take, and the complexity in trying to do so would be pure cruft.

-Rob

--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Revision history for this message
golions (yazfan2003) wrote :

I just tried to pull again, and it worked with no problems. This has happened before, where I would get an error messages for a few days, and then all of a sudden it would work again. Unfortunately, I did not do the pull with -Dhttp, but I have attached my log file any way. If that helps.

Revision history for this message
John A Meinel (jameinel) wrote :

Well, I see this:
15.562 http readv of 2c779c251220f8afce08bcec69277e32.rix offsets => 2 collapsed 2
...
21.370 http readv of 7c2a6579c7a01088577744c8b461dae4.pack offsets => 8 collapsed 3

Which means that we successfully issued several multi-part requests.

It would seem that Savannah isn't always broken in sending multiple Content-Type headers. I wonder if there is a transparent proxy somewhere that is breaking this.

On bug #198646 Henrik Nordström comments that Savannah is indeed wrong in giving 2 Content-Type headers.

Robert- How would you suggest diagnosing this? Change our urllib wrappers to recognize 2 Content-Type headers and abort with a message to the user? Do some sort of search to see:

  if found_boundary.startswith('--') and found_boundary[2:].rstrip() in expected_boundary

Which would be a likely indication that we miss-parsed the expected_boundary string.

Revision history for this message
Robert Collins (lifeless) wrote :

On Fri, 2008-08-01 at 16:05 +0000, John A Meinel wrote:
>
>
> Robert- How would you suggest diagnosing this? Change our urllib
> wrappers to recognize 2 Content-Type headers and abort with a message
> to
> the user?

I think this would be good.

> Do some sort of search to see:
>
> if found_boundary.startswith('--') and found_boundary[2:].rstrip()
> in
> expected_boundary
>
> Which would be a likely indication that we miss-parsed the
> expected_boundary string.

I don't know - is that a recurring pattern?

-Rob
--
GPG key available at: <http://www.robertcollins.net/keys.txt>.

Martin Pool (mbp)
Changed in bzr:
status: Triaged → Confirmed
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
tags: removed: check-for-breezy
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.