Activity log for bug #290921

Date Who What changed Old value New value Message
2008-10-30 00:16:30 Roger Binns bug added bug
2008-10-30 12:45:16 Michael Vogt None: status New In Progress
2008-10-30 12:45:16 Michael Vogt None: statusexplanation Thanks for your bugreport. I forwarded it to the sysadmin team as ticket #32129
2008-10-30 17:05:20 Chris Jones bug added subscriber The Canonical Sysadmins
2008-10-30 18:17:27 Roger Binns description I don't know of any better way to report this bug. The configuration of archive.ubuntu.com prevents proxy servers from caching packages. This may be because of poor configuration of archive.ubuntu.com, buggy behaviour of that software or an opportunity for the apt client to ignore one part of web standards to improve things. To reproduce, configure a Squid proxy server and then use it to upgrade one machine eg to Intrepid. Now do the same on a second machine. If you monitor the Squid logs you will see all the files being redownloaded due to a TCP_REFRESH_MISS. This means the cache had the file but on checking with the server found the cached file to be stale. The root cause is a round robin set of ip addresses are returned for us.archive.ubuntu.com. For example I see 91.189.88.31, 91.189.88.45 and 91.189.88.46. You can then query each one of these for the same file, looking at the headers returned replacing the IP address as appropriate. wget --no-proxy --header="Host: us.archive.ubuntu.com" -O /dev/null -S http://91.189.88.31/ubuntu/pool/main/libh/libhtml-tagset-perl/libhtml-tagset-perl_3.20-2_all.deb The Last-Modified header is identical across all 3 servers, but the ETag is different. Because the ETag is different the proxy server has to conclude that the content is stale. The bad effect is that upgrading N computers at a site that uses a normal proxy server requires N downloads which for a dist upgrade can be close to 1GB. That sucks up more of your bandwidth and pointlessly increases server utilization. Note that .deb with the same name do not change anyway. Some suggested fixes: * Stop sending back ETag from servers and only rely on Last-Modified/Content-Length to detect cache invalidation * Don't include inode in ETag calculation: http://httpd.apache.org/docs/2.2/mod/core.html#fileetag * Calculate the ETag from the file md5/sha * Have a really long DNS cache timeout for round robin returned values (eg two weeks) rather than the very short interval so the same server will be hit from the same proxy I haven't been able to work out a way for the apt client to prevent the Squid proxy from paying attention to ETag. Note that this same issue will affect any caching/proxy server that obeys web standards. One other workaround is to install an apt/deb specific proxy (which then ignores the web standards) but that then requires me to manage to proxy/cache servers when just the one would work fine if the ubuntu.com ones were fixed. I don't know of any better way to report this bug. The configuration of archive.ubuntu.com prevents proxy servers from caching packages. This may be because of poor configuration of archive.ubuntu.com, buggy behaviour of that software or an opportunity for the apt client to ignore one part of web standards to improve things. To reproduce, configure a Squid proxy server and then use it to upgrade one machine eg to Intrepid. Now do the same on a second machine. If you monitor the Squid logs you will see all the files being redownloaded due to a TCP_REFRESH_MISS. This means the cache had the file but on checking with the server found the cached file to be stale. The root cause is a round robin set of ip addresses are returned for us.archive.ubuntu.com. For example I see 91.189.88.31, 91.189.88.45 and 91.189.88.46. You can then query each one of these for the same file, looking at the headers returned replacing the IP address as appropriate. wget --no-proxy --header="Host: us.archive.ubuntu.com" -O /dev/null -S http://91.189.88.31/ubuntu/pool/main/libh/libhtml-tagset-perl/libhtml-tagset-perl_3.20-2_all.deb The Last-Modified header is identical across all 3 servers, but the ETag is different. Because the ETag is different the proxy server has to conclude that the content is stale. The bad effect is that upgrading N computers at a site that uses a normal proxy server requires N downloads which for a dist upgrade can be close to 1GB. That sucks up more of your bandwidth and pointlessly increases server utilization. Note that .deb with the same name do not change anyway. Some suggested fixes: * Stop sending back ETag from servers and only rely on Last-Modified/Content-Length to detect cache invalidation * Don't include inode in ETag calculation: http://httpd.apache.org/docs/2.2/mod/core.html#fileetag * Calculate the ETag from the file md5/sha * Have a really long DNS cache timeout for round robin returned values (eg two weeks) rather than the very short interval so the same server will be hit from the same proxy I haven't been able to work out a way for the apt client to prevent the Squid proxy from paying attention to ETag. Note that this same issue will affect any caching/proxy server that obeys web standards. One other workaround is to install an apt/deb specific proxy (which then ignores the web standards) but that then requires me to manage two proxy/cache servers when just the one would work fine if the ubuntu.com ones were fixed.
2008-12-29 06:43:28 James Troup None: status In Progress Fix Released
2008-12-29 06:43:28 James Troup None: statusexplanation Thanks for your bugreport. I forwarded it to the sysadmin team as ticket #32129 {us.,}.archive.ubuntu.com (and *.archive.ubuntu.com not provided by mirrors) no longer send ETag headers; thanks for the suggestion.