REST API: Support the Range header for GET content of files

Bug #1103136 reported by Raymond Hill on 2013-01-22
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu One Servers
Medium
Sidnei da Silva

Bug Description

Excerpt from RFC2616, section 14.35.2, "Range Retrieval Requests":

"HTTP/1.1 origin servers and intermediate caches ought to support byte ranges when possible, since Range supports efficient recovery from partially failed transfers, and supports efficient partial retrieval of large entities."

I tried to request a Range from Ubuntu One server for a file stored on my Ubuntu One account, and got the following response header:

/=====
Content-Disposition: inline; filename=blah.txt
Date: Mon, 21 Jan 2013 23:30:06 GMT
X-Bzr-Revision-Number: 6696
Etag: "sha1:7f96fcfa002cf5791c976c46aaf733dc523d9041;gzip"
Content-Range: bytes 0-143/3909
Via: 1.0 calamansi.canonical.com:3128 (squid/2.7.STABLE7) 1.1 files.one.ubuntu.com
Content-Type: application/octet-stream
Vary: Accept-Encoding,Cookie
X-Cache: MISS from calamansi.canonical.com
Set-Cookie: sessionid=...; Domain=.one.ubuntu.com; httponly; Path=/; secure
Server: TwistedWeb/10.0.0
Last-Modified: Mon, 21 Jan 2013 23:17:05 GMT
\=====

It is as if the server is fulfilling my request, telling me that range 0-143 is returned, along with a success status code (206, which is Partial content), but the body of the response is empty, and there is no Content-Length header in the returned response.

Raymond Hill (rhill) on 2013-01-22
summary: - Support the Range header
+ REST API: Support the Range header for GET content of files
Sidnei da Silva (sidnei) on 2013-01-22
Changed in ubuntuone-servers:
assignee: nobody → Sidnei da Silva (sidnei)
importance: Undecided → Medium
information type: Proprietary → Public
Sidnei da Silva (sidnei) wrote :

Hi Raymond,

Could you include more details for debugging? As far as I can tell range requests are working properly, as witnessed by the session below, using curl against a public file.

Note it does include a Content-Length header, and indeed I've confirmed that it returns 144 bytes as expected.

Looking over our code, there's two places a Content-Range header is set, one when a proper 206 response is generated, and for which the code path forcibly sets a Content-Length header, and another one when a Range request is not satisfiable, which generates a 416 response code, with Content-Range: */<content-length> but no body included. None of them match the response you included in the bug description.

I'm suspecting that Squid might be at fault here, but would need more information to confirm.

curl -s -v -o/dev/null -H "Range: bytes=0-143" http://ubuntuone.com/p/c9Y/
* About to connect() to ubuntuone.com port 80 (#0)
* Trying 91.189.89.205...
* connected
* Connected to ubuntuone.com (91.189.89.205) port 80 (#0)
> GET /p/c9Y/ HTTP/1.1
> User-Agent: curl/7.28.0
> Host: ubuntuone.com
> Accept: */*
> Range: bytes=0-143
>
< HTTP/1.1 206 Partial Content
< Date: Tue, 22 Jan 2013 21:12:31 GMT
< Server: TwistedWeb/10.0.0
< Content-Length: 144
< Content-Disposition: inline; filename=Selection_281.png
< Vary: Accept-Encoding
< Last-Modified: Tue, 08 Feb 2011 11:33:40 GMT
< Content-Range: bytes 0-143/103672
< ETag: "sha1:0d8a6d5947321c0a207e349f6a91b0d2dc445470"
< X-Bzr-Revision-Number: 6709
< Content-Type: image/png
< X-Cache: MISS from amatungulu.canonical.com
< X-Cache-Lookup: MISS from amatungulu.canonical.com:3128
< Via: 1.0 amatungulu.canonical.com:3128 (squid/2.7.STABLE7)
< Via: 1.1 www.ubuntuone.com
<
{ [data not shown]
* Connection #0 to host ubuntuone.com left intact
* Closing connection #0

Raymond Hill (rhill) on 2013-01-22
description: updated
Raymond Hill (rhill) wrote :

Hi Sidnei.

I receive a 206 status code for the case described in the bug report (forgot to mention, I added detailed re. status code), but no Content-Length.

I tried your curl command on http://ubuntuone.com/p/c9Y/ , and indeed it worked fine. Just to be sure, I dump the result in a file, and I could see a tiny part of the image, as expected.

However, if I try the your URL in my code, it fails the same way it fails for my file.

To be sure that the code was fine, I had earlier checked with an image on Wikipedia (https://upload.wikimedia.org/wikipedia/commons/8/8c/K2%2C_Mount_Godwin_Austen%2C_Chogori%2C_Savage_Mountain.jpg), and there my code works fine, the partial content is properly returned to me.

Sidnei da Silva (sidnei) wrote :

Hi Raymond,

Could you share your code, either publicly or privately to sidnei at canonical dot com? Maybe there's some extra header being set that's throwing something off.

Raymond Hill (rhill) wrote :

Here is the Go code which fails for http://ubuntuone.com/p/c9Y/, but succeeds for https://upload.wikimedia.org/wikipedia/commons/8/8c/K2%2C_Mount_Godwin_Austen%2C_Chogori%2C_Savage_Mountain.jpg :

http://play.golang.org/p/YBTHYFmCZF

The snippet of code cannot be executed by the Go sandbox (it won't allow to request URL), but it is a test case for whoever wants to try it locally.

Raymond Hill (rhill) wrote :

By the way, the output of the above program when GETting http://ubuntuone.com/p/c9Y/ is a runtime/abort error "panic: unexpected EOF", as the Go code inside io.Copy() expects content and there is none.

Sidnei da Silva (sidnei) wrote :

Here's a snippet of the response as captured by tshark:

tshark 'port 80 and (dst ubuntuone.com or src ubuntuone.com)' -V -R "http.request || http.response"

"""
Hypertext Transfer Protocol
    HTTP/1.1 206 Partial Content\r\n
        [Expert Info (Chat/Sequence): HTTP/1.1 206 Partial Content\r\n]
            [Message: HTTP/1.1 206 Partial Content\r\n]
            [Severity level: Chat]
            [Group: Sequence]
        Request Version: HTTP/1.1
        Status Code: 206
        Response Phrase: Partial Content
    Date: Wed, 23 Jan 2013 00:03:32 GMT\r\n
    Server: TwistedWeb/10.0.0\r\n
    Content-Length: 1001\r\n
        [Content length: 1001]
    Content-Disposition: inline; filename=Selection_281.png\r\n
    Content-Encoding: gzip\r\n
    Vary: Accept-Encoding\r\n
    Last-Modified: Tue, 08 Feb 2011 11:33:40 GMT\r\n
    Content-Range: bytes 0-1000/103567\r\n
    ETag: "sha1:0d8a6d5947321c0a207e349f6a91b0d2dc445470;gzip"\r\n
    X-Bzr-Revision-Number: 6709\r\n
    Content-Type: image/png\r\n
    X-Cache: MISS from calamansi.canonical.com\r\n
    X-Cache-Lookup: HIT from calamansi.canonical.com:3128\r\n
    Via: 1.0 ip-10-38-193-141.ec2.internal:3128, 1.1 calamansi.canonical.com:3128 (squid/2.7.STABLE7)\r\n
    Via: 1.1 www.ubuntuone.com\r\n
    \r\n
    Content-encoded entity body (gzip): 1001 bytes -> 1008 bytes
"""

So a couple things:

1. Your script (actually, the go library that it uses) is declaring that it supports gzip encoding
2. The server is returning 1001 bytes, gzipped, which expand to 1008 bytes (could argue this might be an issue)
3. Note there *is* a Content-Length header in the response. However for some reason it is not printed by the script

I suspect what is happening is that go is automatically handling the gzipped response, but it contains more bytes than Content-Length after being un-gzipped, or something along these lines.

Sidnei da Silva (sidnei) wrote :

Confirmed by changing the script to set 'Accept-Encoding: identity', which causes the server to send a non-compressed response, which then works.

For the record, upload.wikimedia.org does not reply with a compressed response, that's why the script works with it.

Sidnei da Silva (sidnei) wrote :

Digging deeper, I think the issue here is the combination of 'gzip' and 'Range'. When handing out a Range request, we truncate the body after the requested range has been satisfied. The problem is that this truncation ends up chopping off the gzip trailer from the transformed data (which is stored in zlib format, then transformed to gzip upon request).

As far as I can tell, the 'unexpected EOF' comes from the gzip.NewReader that wraps the response body choking when it doesn't find the gzip trailer, around http://golang.org/src/pkg/compress/flate/inflate.go#L637 apparently.

I'll confirm with the in-house golang experts, but seems like we have a bug on the U1 side, which can be temporarily be worked around by using Content-Encoding: deflate (which won't transform the stream) or Content-Encoding: identity which will uncompress the stream instead.

Changed in ubuntuone-servers:
status: New → Confirmed
Sidnei da Silva (sidnei) wrote :

Looking over many IETF discussion threads, seems like the behaviour of mixing Range and Content-Encoding: gzip is undefined at best. I'm going to disable gzip responses when a Range request is present to prevent this issue.

Raymond Hill (rhill) wrote :

Yes, I confirm settings 'Accept-Encoding: identity' works.

Regarding the proper behavior, there is this:

"Byte range specifications in HTTP apply to the sequence of bytes in the entity-body"
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35.1

This:

"The entity-body (if any) sent with an HTTP request or response is in a format and encoding defined by the entity-header fields"
http://www.w3.org/Protocols/rfc2616/rfc2616-sec7.html#sec7.2

And this:

"The Content-Length entity-header field indicates the size of the entity-body"
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.13

To me it looks like Go library might be wrong, I clearly read above that the Range applies to the bytes after they have been encoded. I will try to look at the code in the library to see if and where the error is, and maybe file a bug for this with Go devs.

In any case, request 'identity' works well, and in my specific case, it even is more sensible to do so given the small amount of bytes requested.

Raymond Hill (rhill) wrote :

Alright, I would argue the problem is in the Go library.

I did not set "Accept-Encoding" header in my request, which as per http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3 :

"If the Accept-Encoding field-value is empty, then only the 'identity' encoding is acceptable"

However, the Go code will set Accept-Encoding to gzip if the client did not do it. It surely is nice in general, but unfrotunately this breaks requests which have a Range header, as suddenly the request for [0-n] bytes of identity material requested by the user is transformed into a request of [0-n] bytes of gzipped material by Go's inners. This is obviously wrong, and Go should not temper with Accept-Encoding when a Range is requested, as it effectively change completely the semantic of the request made by the user.

I will file a bug with Go devs, and I believe Ubuntu Server should not change a thing on their side, as it was doing the proper thing. This bug can be close now.

Thanks for your help Sidnei, and sorry to have consumed of your time for an issue which is unrelated to Ubuntu One server.

Sidnei da Silva (sidnei) wrote :

For the record, I don't think the Go library is wrong, it's just choking on the lack of gzip trailer which is chopped off by us incorrectly implementing the Range when gzip is present, not leaving enough room for the gzip trailer.

In other words, if you request a Range X-Y, we need to return (Y-X)-Z bytes + gzip trailer, but instead we're returning (Y-X) bytes and leaving the gzip trailer off, and Go is then correctly pointing out the 'unexpected EOF', although it seems to hide the fact that that error happens inside the gzip reader.

Julien Funk (jaboing) on 2013-01-25
tags: added: u1-api u1-by-user u1-on-production
Raymond Hill (rhill) wrote :

"gzip trailer which is chopped off by us incorrectly implementing the Range when gzip is present"

Ah ok, so I won't file a bug with the Go dev. I re-read the RFC and now I understand it differently anyways.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers