apt-get update size is too big

Reported by Javier P.L. on 2012-05-19
220
This bug affects 42 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Undecided
Unassigned
Ubuntu
Undecided
Unassigned

Bug Description

I ran a clean install to Ubuntu 12.04 and so far everything has been working well. I especially commend the Ubuntu team for this release.

I only noticed that the size of repository update is now about ~13MB. Normally, it is about this size for the first time you run apt-get update after a clean install and then ~ 23kb - 1300kb for subsequent updates. However now it looks like it gets 13MB and bigger every now and then.

Using us.archive.ubuntu.com archive, I see that the Universe Package files are being recreated a couple of times an hour, but contain the same content. The file modification date and expiration date, and in particular, the etag, are changing each time causing apt-get update to reload the Package file again even though it hasn't changed.

Launchpad shouldn't recreating the Package files when no changes have been made to the contained packages.

#http://askubuntu.com/questions/135818/the-apt-get-update-cache-size-is-too-big

Curtis Hovey (sinzui) on 2012-05-19
affects: launchpad → apt (Ubuntu)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apt (Ubuntu):
status: New → Confirmed
summary: - apt-get update cache size is too big
+ apt-get update size is too big
zpletan (zpletan) wrote :

A look at us.archive.ubuntu.com/ubuntu/dists/$DIST/[main|universe]/binary-i386 reveals that timestamps on the package info for hardy, lucid, maverick, natty, oneiric, and precise are being updated. Based on this, I would say that the problem is *not* specific to Precise. This concurs with my experience; in addition to real and virtual machines I have with Precise, this happened on my virtual machine of Oneiric before I blew it away a few weeks ago.

John S. Gruber (jsjgruber) wrote :

Terminal logs of http headers showing changing modification times and md5sums of the unchanging files. Included both archives.ubuntu.com and us.archives.ubuntu.com. A packet trace showed that apt is using a "If-Modified-Since" header which is being subverted. The problem applies to main and universe but not to, for example, the security repo.

John S. Gruber (jsjgruber) wrote :

Is this possibly related to the optimizations mentioned in https://lists.ubuntu.com/archives/ubuntu-devel/2011-December/034577.html ?

William Grant (wgrant) wrote :

It's not a Launchpad problem. From the master copy of the archive:

-rw-r--r-- 1 lp_publish lp_publish 1.5M Sep 20 2008 /srv/launchpad.net/ubuntu-archive/ubuntu/dists/hardy/main/binary-i386/Packages.gz

Since http://archive.ubuntu.com/ubuntu/dists/hardy/main/binary-i386/ shows the bad mtime, it's probably something in the internal mirroring pipeline.

But one could also argue that apt still shouldn't regrab the file, since Release still has the same hashes for it.

Changed in launchpad:
status: New → Invalid
William Grant (wgrant) wrote :

https://rt.admin.canonical.com/Ticket/Display.html?id=53097 (Canonical-only link, sorry) filed to track the mirror script problem.

John S. Gruber (jsjgruber) wrote :

I've been able to circumvent this problem by touching the appropriate files in /var/lib/apt/lists/ right before updating the apt cache.

zpletan (zpletan) wrote :

@jsjgruber, if I touch the files, will they still be downloaded if info has been changed?

John S. Gruber (jsjgruber) wrote :

@zpletan, touching them stops them from being downloaded. You should only touch the files that haven't changed since you downloaded them. By appropriate files I'm speaking of the files that were frozen at the release of the release you are running (those are the ones giving people trouble until the bug is fixed). You shouldn't use the touch command on any others, or against Quantal--its main and universe repos are active.

In this case touching a file just says it was current at the time you run the touch command (it makes the last modification time the current time).

There's detail in: http://askubuntu.com/questions/135818/the-apt-get-update-cache-size-is-too-big

David T (ubuntuwiki-datmail) wrote :

touching 4 files on 50+ servers every 60mins is not practical. Sure hope a patch appears soon. Problem started 4/26/2012 (+ or - a day), I saw the major spike in my bandwidth usage around there.

John S. Gruber (jsjgruber) wrote :

A fix would save a lot of bandwidth for users, mirrors, and Canonical. A fix would be great. Most users don't know about any circumventions, of course, and many don't even realize there is a problem.

Nevertheless I'm afraid I don't understand your comment. Your system should normally be downloading from the repositories on just one server and you should be touching just these four files on your system. Am I missing something? If the directions on askubuntu.com aren't clear I'd like to clarify them.

David T (ubuntuwiki-datmail) wrote :

I am feeling this bug so much because I am a specialized hosting provider and chose Ubuntu as my primary linux distro. I have 50+ individual servers in 2 sets of racks, all of them checking for security updates every 60 minutes.

I feel the 120GB in the last 25'ish days :/

J Phani Mahesh (phanimahesh) wrote :

Suggestion:
----------------
How about using zsync for downloading the lists?
Practically feasible solution, with no additional server side overload.

@David: If you have 50 servers that poll for updates every 60 minutes, I suggest you set up a caching server. That saves a lot of bandwidth. I suggest squid-deb-proxy or something similar.

could somebody please assign an adequate importance to this bug

David T (ubuntuwiki-datmail) wrote :

I'm amazed that every ubuntu mirror hasn't gone up in arms about this bug. They must be feeling it. Went from 40k-13000k that's what a 325+ fold increase in their outbound bandwidth levels. They must have massive pipes and don't care. :/

Jane Atkinson (irihapeti) wrote :

Not everyone has high-bandwidth plans. I'll probably have to move to a low-bandwidth plan shortly and downloading 12 MB or so daily is going to hurt.

William Grant (wgrant) wrote :

The internal mirror script has been fixed to not clobber the timestamps for old files, so this should no longer be a big problem.

Greg A (etulfetulf) on 2012-08-07
affects: apt (Ubuntu) → ubuntu
Changed in ubuntu:
status: Confirmed → Fix Released
Martin Lee (hellnest) wrote :

The issue is stille exist on 13.10

Javier P.L. (chilicuil) wrote :

I've seen this issue again in Ubuntu 14.04 currently being developed. On every $ sudo apt-get update #even when they're run one after the other, apt-get keeps downloading 19.8MB~

apt:
  Installed: 0.9.13.1~ubuntu1
  Candidate: 0.9.13.1~ubuntu1
  Version table:
 *** 0.9.13.1~ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main i386 Packages
        100 /var/lib/dpkg/status

deb http://us.archive.ubuntu.com/ubuntu/ trusty main restricted
deb-src http://us.archive.ubuntu.com/ubuntu/ trusty main restricted
deb http://us.archive.ubuntu.com/ubuntu/ trusty-updates main restricted
deb-src http://us.archive.ubuntu.com/ubuntu/ trusty-updates main restricted
deb http://us.archive.ubuntu.com/ubuntu/ trusty universe
deb-src http://us.archive.ubuntu.com/ubuntu/ trusty universe
deb http://us.archive.ubuntu.com/ubuntu/ trusty-updates universe
deb-src http://us.archive.ubuntu.com/ubuntu/ trusty-updates universe
deb http://us.archive.ubuntu.com/ubuntu/ trusty multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ trusty multiverse
deb http://us.archive.ubuntu.com/ubuntu/ trusty-updates multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ trusty-updates multiverse
deb http://us.archive.ubuntu.com/ubuntu/ trusty-backports main restricted universe multiverse
deb-src http://us.archive.ubuntu.com/ubuntu/ trusty-backports main restricted universe multiverse
deb http://security.ubuntu.com/ubuntu trusty-security main restricted
deb-src http://security.ubuntu.com/ubuntu trusty-security main restricted
deb http://security.ubuntu.com/ubuntu trusty-security universe
deb-src http://security.ubuntu.com/ubuntu trusty-security universe
deb http://security.ubuntu.com/ubuntu trusty-security multiverse
deb-src http://security.ubuntu.com/ubuntu trusty-security multiverse

Alan (alanjas) wrote :

I have the same problem in Ubuntu 14.04 (trusty).
I run apt-get update and every time it downloads 15,5 MB !!
Seeing the output, I see that every time downloads the same file!!
It's the SAME file because have exactly same size! Maybe only changes the timestamp..

there are some.. this are the first:

Des:1 http://security.ubuntu.com saucy-security Release.gpg [933 B]
Des:2 http://security.ubuntu.com saucy-security Release [49,6 kB]
Des:3 http://archive.ubuntu.com trusty Release.gpg [933 B]
Des:4 http://security.ubuntu.com saucy-security/main Sources [27,9 kB]
...
Des:8 http://security.ubuntu.com saucy-security/universe Sources [8.372 B]
Des:32 http://archive.ubuntu.com trusty/universe i386 Packages [5.878 kB]

I'm also having this bug on Ubuntu 14.04 LTS
Each time I run 'sudo apt-get update' it fetches 22.5MB.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.