Static caching bug with IE6, gzip, and Vary

Bug #409466 reported by Paul Everitt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
KARL3
Invalid
Medium
Shane Hathaway

Bug Description

Duncan Booth helped diagnose and issue we have with caching of static resources on IE6.

Below is information from his two emails.

=========== First Email ================

The karl.oxfam.org.uk URL doesn't have any headers in the response to control caching, just a last modified date. That means IE will cache it for an indeterminate time based on how long ago it was last modified (if it hasn't been modified for a long time it won't check very often). It will however request it every time IE has been restarted and the request is unconditional so the 51k is always transferred.

The kdi01.sixfeetup.com response does have caching headers, but it isn't being cached (I created an html page which just references your two urls and whenever I open that page in IE it fetches the kdi01 url but uses the karl one from cache). The response has a Vary header: according to http://www.fiddler2.com/fiddler/perf/aboutvary.asp IE6 should still allow caching for Vary: Accept-Encoding, but evidently in this case it doesn't. I configured Fiddler to delete the Vary: header from the response and IE stopped re-requesting the url every time.

Ah, I think I see what is happening. See http://www.ilikespam.com/blog/internet-explorer-meets-the-vary-header
which I think says that IE will cache content which is gzip compressed and has the Vary: Accept-Encoding header, if you aren't gzip compressing it then the presence of the Vary: header will stop it caching.

#
Result
Protocol
Host
URL
Body
Caching
Content-Type
Process
Comments
Custom
60 200 HTTP kdi01.sixfeetup.com /static/themedstyles.css 32,793 public, max-age=157680000 Expires: Mon, 04 Aug 2014 08:11:58 GMT text/css iexplore:5348

GET /static/themedstyles.css HTTP/1.1
Accept: */*
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB6; .NET CLR 2.0.50727)
Host: kdi01.sixfeetup.com
Proxy-Connection: Keep-Alive
Cookie: repoze.browserid=023d21ec9e59f296a0f96cba5d8cc96ed4bafe8b!300283c0374528658dc8fcc4a4bae5b6

HTTP/1.1 200 OK
Via: 1.1 OXGBPXY03
Connection: Keep-Alive
Proxy-Connection: Keep-Alive
Content-Length: 32793
Expires: Mon, 04 Aug 2014 08:11:58 GMT
Date: Wed, 05 Aug 2009 08:11:58 GMT
Content-Range: bytes 0-32792/32793
Content-Type: text/css
ETag: 1249068929.0-32793
Server: Apache/2.2.3 (CentOS)
Accept-Ranges: bytes
Cache-Control: public, max-age=157680000
Last-Modified: Fri, 31 Jul 2009 19:35:29 GMT
Vary: Accept-Encoding

#
Result
Protocol
Host
URL
Body
Caching
Content-Type
Process
Comments
Custom
62 200 HTTPS karl.oxfam.org.uk /static/r/themedstyles.css 51,840 text/css iexplore:5348

GET /static/r/themedstyles.css HTTP/1.1
Accept: */*
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB6; .NET CLR 2.0.50727)
Host: karl.oxfam.org.uk
Connection: Keep-Alive
Cookie: __utma=32404064.1073916141190078000.1245942391.1245942391.1248433838.2; __utmz=32404064.1245942391.1.1.utmcsr=intranet.oxfam.org.uk|utmccn=(referral)|utmcmd=referral|utmcct=/programme/heard/overview/general_resources; I18N_LANGUAGE="en"; __ac="ZGJvb3RoOml0dm9kcnQ2aA%3D%3D"

HTTP/1.1 200 OK
Server: nginx/0.5.33
Date: Wed, 05 Aug 2009 08:12:06 GMT
Content-Type: text/css
Last-Modified: Thu, 14 May 2009 20:37:42 GMT
Connection: keep-alive
Content-Length: 51840

========== Second Email ==================

You certainly should be able to repeat the tests I did (if you can bring yourself to run IE): I couldn't see any evidence that the proxy was affecting the outcome. A large part of the problem is of course that we're still stuck on IE6: IE7 doesn't solve all the caching issues but it is a lot better.

I used the following HTML file to test on IE6 with Fiddler 2.2.4.0 beta
<html>
<style type="text/css"><!-- @import url(https://karl.oxfam.org.uk/static/r/themedstyles.css); --></style>
<style type="text/css"><!-- @import url(http://kdi01.sixfeetup.com/static/themedstyles.css); --></style>
<body>
Hello world
</body>
</html>

On the first visit to the page each time IE was started I saw both css files loaded, subsequent visits only re-requested the second one (putting the cursor into the address bar and hitting return is sufficient to revisit the page without flushing cache). Fiddler also has a 'filters' tab which can be used to modify the request and/or response: I used it to remove the Vary header from the response.

Changed in karl3:
assignee: nobody → Shane Hathaway (shane-hathawaymix)
Revision history for this message
Shane Hathaway (shane-hathawaymix) wrote :

Apache's mod_deflate is adding the Vary header, but as Duncan explained, IE6 doesn't cache responses with a nontrivial Vary header. I applied the workaround described here:

http://development.lombardi.com/?p=946

It removes the Vary header from the response when the user agent is detected as MSIE 6.

Changed in karl3:
status: New → Fix Committed
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

I'm going to re-open this, as I suspect that IE6 is still problematic, per Des. I need to see if I can get an IE6 installation somewhere.

Changed in karl3:
milestone: m26 → m27
status: Fix Committed → In Progress
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

Shane, can you take a look at this one this morning (Monday)? Is IE6 something you have access to?

Revision history for this message
Shane Hathaway (shane-hathawaymix) wrote :

I don't see a problem with IE6 anymore. My test is as follows.

1) I created a file called bugtest.html, which loads CSS from kdi01.sixfeetup.com.

2) I started "tcpdump -n -i wlan0 port 80", which shows all important HTTP traffic to/from my computer.

3) I opened bugtest.html in the ies4linux version of IE6.

4) Once the page loaded, I looked at the output of tcpdump. If any packets were transferred to/from kdi01.sixfeetup.com, the test failed. If no packets were transferred, the test passed.

When the Vary header was present, the test always failed. Once I configured Apache to not send the Vary header to IE6, the test always passed. I concluded that I had fixed the bug. I confirmed this again today.

Perhaps we're seeing a proxy issue. If the HTTP proxy replaces the User-Agent string, Apache might have no way to know that the client is running IE6.

Revision history for this message
Paul Everitt (paul-agendaless) wrote : Re: [Bug 409466] Re: Static caching bug with IE6, gzip, and Vary
Download full text (6.0 KiB)

Hi Duncan. Based on Shane's analysis, I think we'll close our
investigation into caching performance on IE6.

--Paul

On Aug 10, 2009, at 7:48 PM, Shane Hathaway wrote:

> I don't see a problem with IE6 anymore. My test is as follows.
>
> 1) I created a file called bugtest.html, which loads CSS from
> kdi01.sixfeetup.com.
>
> 2) I started "tcpdump -n -i wlan0 port 80", which shows all important
> HTTP traffic to/from my computer.
>
> 3) I opened bugtest.html in the ies4linux version of IE6.
>
> 4) Once the page loaded, I looked at the output of tcpdump. If any
> packets were transferred to/from kdi01.sixfeetup.com, the test failed.
> If no packets were transferred, the test passed.
>
> When the Vary header was present, the test always failed. Once I
> configured Apache to not send the Vary header to IE6, the test always
> passed. I concluded that I had fixed the bug. I confirmed this again
> today.
>
> Perhaps we're seeing a proxy issue. If the HTTP proxy replaces the
> User-Agent string, Apache might have no way to know that the client is
> running IE6.
>
> --
> Static caching bug with IE6, gzip, and Vary
> https://bugs.launchpad.net/bugs/409466
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Porting KARL to a new architecture: In Progress
>
> Bug description:
> Duncan Booth helped diagnose and issue we have with caching of
> static resources on IE6.
>
> Below is information from his two emails.
>
> =========== First Email ================
>
> The karl.oxfam.org.uk URL doesn't have any headers in the response
> to control caching, just a last modified date. That means IE will
> cache it for an indeterminate time based on how long ago it was last
> modified (if it hasn't been modified for a long time it won't check
> very often). It will however request it every time IE has been
> restarted and the request is unconditional so the 51k is always
> transferred.
>
> The kdi01.sixfeetup.com response does have caching headers, but it
> isn't being cached (I created an html page which just references
> your two urls and whenever I open that page in IE it fetches the
> kdi01 url but uses the karl one from cache). The response has a Vary
> header: according to http://www.fiddler2.com/fiddler/perf/aboutvary.asp
> IE6 should still allow caching for Vary: Accept-Encoding, but
> evidently in this case it doesn't. I configured Fiddler to delete
> the Vary: header from the response and IE stopped re-requesting the
> url every time.
>
> Ah, I think I see what is happening. See http://www.ilikespam.com/blog/internet-explorer-meets-the-vary-header
> which I think says that IE will cache content which is gzip
> compressed and has the Vary: Accept-Encoding header, if you aren't
> gzip compressing it then the presence of the Vary: header will stop
> it caching.
>
> #
> Result
> Protocol
> Host
> URL
> Body
> Caching
> Content-Type
> Process
> Comments
> Custom
> 60 200 HTTP kdi01.sixfeetup.com /static/themedstyles.css 32,793
> public, max-age=157680000 Expires: Mon, 04 Aug 2014 08:11:58 GMT
> text/css iexplore:5348
>
> GET /static/themedstyles.css HTTP/1.1
...

Read more...

Revision history for this message
Paul Everitt (paul-agendaless) wrote :

As it turns out, Shane demonstrated that we have no problem.

Changed in karl3:
status: In Progress → Invalid
Revision history for this message
Paul Everitt (paul-agendaless) wrote :

On Aug 12, 2009, at 5:34 AM, <email address hidden> wrote:

>
> I checked Karl this morning with IE6 and what I'm seeing is
> everything still has the Vary header. That means every page request
> is reloading more than 500k of javascript files. Fortunately many
> actions don't actually result in a full page reload: when you just
> reload the content panel everything is fine, but going to a page
> like /people that is a fresh page results in 33 requests for css,
> js, and images.

Hmm, that's very (no pun intended) surprising and geesh, I bet the
performance feels awful in IE6.

Is it possible that the is changing the User Agent when it sends the
request to KARL? We are using user agent sniffing in Apache to remove
the header.

Stated differently, are you able to test with and without your proxy,
just so we can possibly eliminate it from the equation?

> The problem is that I used an https: url to access Karl: when I use
> http: then there is no Vary: header and the static content is
> cached, when I use https: there is a Vary: header.

Ah HA, I wonder if there is something messed up in our vhost config.

> BTW, I see that every response for static content includes a Set-
> Cookie header. Is that deliberate? I think it will prevent the proxy
> servers from ever caching the static content so everyone's browser
> has to retrieve their own copy.

Hmm, interesting point. We made a change, late in the game, to have
the app server (BFG) send back static resources instead of Apache. It
is adding the cookie automatically due to repoze.who.

We'll see if we can't fix that.

> That may be bad in a poorly connected office, but on the other hand
> without it the proxy servers could cache static content requested by
> Firefox (with the Vary header) and then serve it up to IE users (who
> shouldn't get the Vary): the fix of course would be to include
> "Vary: User-Agent" whenever the Vary header is set.

Do you have any offices using a proxy besides the Oxford office?

--Paul

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.