feature request: apt-get update --if-necessary

Bug #1429285 reported by Scott Moser on 2015-03-06
This bug affects 5 people
Affects Status Importance Assigned to Milestone
apt (Ubuntu)

Bug Description

In many cases (juju, lxc containers .. ) we find ourselves in the position of not knowing if the apt-cache has been udpated recently. So, you either risk not doing it, or do it and it takes some time and generates load.

so long story short, you always run 'apt-get update' which is quite often

Would it be possible to add (or is there now) something like
'--if-necessary' or '--if-necessary=5m'. I could imagine that that would
look at /var/lib/apt/lists and check timestamps on files for each url that
/etc/apt/sources.list[.d/*] would hit. If nothing was needed and
reasonably recent, then it would not do the update.

There exist other solutions to this like:

It'd be nice if we had a sane way to say:
  update if you need to, otherwise don't waste time and resources

ProblemType: Bug
DistroRelease: Ubuntu 15.04
Package: apt
ProcVersionSignature: Ubuntu 3.19.0-7.7-generic 3.19.0
Uname: Linux 3.19.0-7-generic x86_64
ApportVersion: 2.16.2-0ubuntu1
Architecture: amd64
CurrentDesktop: Unity
Date: Fri Mar 6 17:06:22 2015
EcryptfsInUse: Yes
InstallationDate: Installed on 2015-01-02 (63 days ago)
InstallationMedia: Ubuntu 15.04 "Vivid Vervet" - Alpha amd64 (20150101)
SourcePackage: apt
UpgradeStatus: No upgrade log present (probably fresh install)

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :

Just realized, that ideally 'apt-get update' would respect headers that were put in place by the source.
$ wget -S -q http://azure.archive.ubuntu.com/ubuntu/dists/vivid/Release -O /dev/null
  HTTP/1.1 200 OK
  Date: Tue, 31 Mar 2015 13:32:55 GMT
  Server: Apache/2.2.22 (Ubuntu)
  Last-Modified: Tue, 31 Mar 2015 12:28:00 GMT
  ETag: "34f32-51294baae5400"
  Accept-Ranges: bytes
  Content-Length: 216882
  Cache-Control: max-age=0, proxy-revalidate
  Expires: Tue, 31 Mar 2015 13:32:55 GMT
  Keep-Alive: timeout=5, max=100

Ie, by default apt-get should not bother pulling that again until the 'Expires' date. Subsequent 'apt-get update' would just skip it, unless told '--force' or some thing. Such a policy would drastically reduce load (and traffic on mirrors or original mirrors).

David Kalnischkies (donkult) wrote :

ähm, did you realize that "Expires" is the exact time of your request (compare "Date") in your example? (See also the HTTP1.1 spec which will tell you that 'Expires' doesn't really mean what you think it does, so that the value it has is actually 'okay').

APT is using If-Modified-Since in its requests so (if a server supports it… not all do, but at least most) a server can respond with just a "304 Not Modified", so at least there isn't much traffic wasted even if you happen to request updates every few minutes (less effective for load itself of course, apt is trying to be nice here as well by e.g. being a proper keep-alive HTTP1.1 client, pipelining and not opening multiple connections to the same server). A hypothetical average website loaded by a average browser seems to be much worse from a load and traffic point of view…

Regarding the snippets:
The puppet one just runs update if the sources.list changed. That isn't your usecase as I haven't changed my sources.list for months…
The ansible one, well, its a hack. A hack which in the worst case opts you out of security updates for 12 hours. That can be a long time, so I really don't want to define an arbitrary value for "not necessary" which is (a lot) larger than zero. And I am not very keen on suggesting by providing an option for it that there is a good value for it which I just don't want to figure out myself.

The problem is basically that you don't know at which point an update makes sense. Your last update can be 10 seconds ago, but even if that is close, it could still be outdated data as the repository was updated in the meantime. What we would need is a 'soft' valid-until which specifies the time after which the next update is/was deployed. Just that this is pretty hard to predict (its easy to specify when the the update will start on the master [expect for times you want to do an emergency rollout], not so much the point it is finished and don't even try to speculate about when this will reach your mirror…).

Scott Moser (smoser) wrote :

$ echo "now: $(TZ=GMT date)"; wget -S -q http://azure.archive.ubuntu.com/ubuntu/dists/vivid/Release -O /dev/null
now: Fri Apr 3 14:37:31 GMT 2015
  HTTP/1.1 200 OK
  Date: Fri, 03 Apr 2015 14:37:31 GMT
  Server: Apache/2.2.22 (Ubuntu)
  Last-Modified: Fri, 03 Apr 2015 14:28:00 GMT
  ETag: "34f32-512d2c15bbc00"
  Accept-Ranges: bytes
  Content-Length: 216882
  Cache-Control: max-age=1228, proxy-revalidate
  Expires: Fri, 03 Apr 2015 14:58:00 GMT
  Keep-Alive: timeout=5, max=100
  Connection: Keep-Alive

So the 'Expires' is most definitely in the future. This one is currently 21 minutes in the future.

I didn't say that the solutions I pointed to were complete, but the fact that people are attempting to solve the problem (and doing so incorrectly) should indicate a need for a good solution.

It would seem possible for apt to go through the sources.list (+sources.list.d/*) and figure out what it was going to download. It could decide based on previously stored headers or a global default "5 minutes" or a per-mirror default if it should bother. Of course it would make sense to have the ability to force.

running 'apt-get update' on a system does take real time. On a cloud instance with a on-network mirror:
% TIMEFORMAT='real=%3lR user=%3lU sys=%3lS' bash -c 'for i in $(seq 1 10); do printf "%-3s " $i ; time apt-get update -q >/dev/null; done'
1 real=0m3.987s user=0m3.628s sys=0m0.244s
2 real=0m4.052s user=0m3.688s sys=0m0.220s
3 real=0m3.980s user=0m3.656s sys=0m0.204s
4 real=0m4.077s user=0m3.676s sys=0m0.272s
5 real=0m4.068s user=0m3.732s sys=0m0.228s
6 real=0m4.052s user=0m3.688s sys=0m0.236s
7 real=0m4.201s user=0m3.812s sys=0m0.248s
8 real=0m4.247s user=0m3.852s sys=0m0.252s
9 real=0m4.059s user=0m3.688s sys=0m0.256s
10 real=0m4.064s user=0m3.680s sys=0m0.252s

It just seems reasonable to me, that operations 2->10 could take .002 seconds.

Robie Basak (racb) wrote :

IMHO, in the use case for "apt install", "apt update" should be considered an implementation detail and the user shouldn't need to call it directly. "apt install" should just do the right thing. The same applies to "apt-cache search", etc. It's seems quite tedious and unnecessary to have to run both commands when starting a container manually for some quick testing.

There should be a default of some automatic "update" if the cache is older than some set time, and a default setting of whether failure to update fails the entire operation. There could be CLI options to override these defaults (change the time, always update first, never update, try to update but proceed anyway if that fails, etc).

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in apt (Ubuntu):
status: New → Confirmed
Julian Andres Klode (juliank) wrote :

I think automatic updating on old caches is not really the best idea, because it will always happen when you expect it the least. Or maybe you do in fact want the old one.

On Thu, Jul 06, 2017 at 01:32:00PM -0000, Julian Andres Klode wrote:
> I think automatic updating on old caches is not really the best idea,
> because it will always happen when you expect it the least. Or maybe you
> do in fact want the old one.

From the perspective of users, I absolutely disagree. Users always want
the latest. Users expect a download, so hitting the archive index should
never be a problem.

A user not wanting the latest is actually being a developer.

> Or maybe you do in fact want the old one.

Developers might, sure. Developers would have to override using a CLI
option or set some global configuration to override. But I think the way
forward for the "apt" CLI is to be user-centric. This is the entire
point of it, no?

The defaults can either be sensible for users, or for developers. This
is an example where I think it cannot be both. So what should "apt" be?

Robie Basak (racb) wrote :

> IMHO, in the use case for "apt install", "apt update" should be considered an implementation detail and the user shouldn't need to call it directly. "apt install" should just do the right thing.

Here's basically the same opinion with some commentary from others: https://twitter.com/chr1sa/status/894048628284604416

Robie Basak (racb) wrote :

I have filed bug 1709603 to track my request of having "apt update" called automatically, as really it's independent of Scott's request for --is-necessary here.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers