apt doesn't detect file corruption in /var/lib/apt/lists

Bug #1809174 reported by Stuart MacDonald on 2018-12-19
This bug affects 1 person
Affects Status Importance Assigned to Milestone
apt (Ubuntu)

Bug Description

The Problem

/var/lib/apt/lists contains the repository index caches or similar; I'm not sure what the correct apt-terminology is.

I've installed Chrome on my laptop, so I have:

smacdonald@L247:/var/lib/apt/lists$ dir *goog*
-rw-r--r-- 1 root root 943 Dec 19 14:02 dl.google.com_linux_chrome_deb_dists_stable_Release
-rw-r--r-- 1 root root 819 Dec 19 14:02 dl.google.com_linux_chrome_deb_dists_stable_Release.gpg
-rw-r--r-- 1 root root 4457 Dec 19 14:02 dl.google.com_linux_chrome_deb_dists_stable_main_binary-amd64_Packages

for example.

dl.google.com_linux_chrome_deb_dists_stable_Release contains checksums for the dl.google.com_linux_chrome_deb_dists_stable_main_binary-amd64_Packages file:

smacdonald@L247:/var/lib/apt/lists$ cat dl.google.com_linux_chrome_deb_dists_stable_Release
Origin: Google LLC
Label: Google
Suite: stable
Codename: stable
Version: 1.0
Date: Wed, 19 Dec 2018 18:51:54 UTC
Architectures: amd64
Components: main
Description: Google chrome-linux software repository
 9e0d0ad6a4f5ccf8e3971c32e9bb22d3 4457 main/binary-amd64/Packages
 a17f6de0ef487b82af58ccd91df52d04 1109 main/binary-amd64/Packages.gz
 156e5ea7a0c6bed5973a68a45e546dc9 151 main/binary-amd64/Release
 4c2cde4f71476d7881262d9a07e33cf4506232a7 4457 main/binary-amd64/Packages
 e002924c9ddfe41ee2033594ec768ed9e4545909 1109 main/binary-amd64/Packages.gz
 0f4348c2d4d7cc1f8e59b5934d87f1ca872f6e34 151 main/binary-amd64/Release
 fb0e586c2b5ec5afa17965d0bbc6bd46c2071336f75e2b0f0c7f3e7b090a7844 4457 main/binary-amd64/Packages
 2462cff732765679a56373a7ca9a5b8b029fdb445e707b1aba10d01fbdb853b3 1109 main/binary-amd64/Packages.gz
 c1e3c9318381862306adcdc4fd4fe2d85be8aa4c4f3dcbb40fce80413f588286 151 main/binary-amd64/Release

If the dl.google.com_linux_chrome_deb_dists_stable_main_binary-amd64_Packages file has become corrupt in the specific manner of being 0 bytes in length, apt does not detect this, and the repository is effectively unreachable until one of two things occurs: a) the repository has an update causing apt to re-fetch the repository information and accidentally fix-by-over-writing the corrupt 0 byte file, or, b) the user removes the corrupt 0-byte file and does an apt update to refetch the repository information.

The Context

Our IoT devices run Ubuntu 16.04, and their main storage is eMMC. Sometimes there are catastrophic power cuts, and, despite other precautions, files are occasionally corrupted in the manner of becoming 0 bytes in length. We're not sure exactly why or how.

Today a deployed device suffered the above scenario. We maintain a debian package repository for updating our devices in the field, and we suddenly couldn't install packages from it. A bit of investigation turned up the 0 byte *_Packages file for our repo, and we worked around the problem.

Part of the situation is our debian repository doesn't have updates very often, so 'sudo apt-get update' was giving a Hit: instead of a Get: result all the time, and everything from the "normal user command line" side of things looked okay. There were no logs in /var/log/syslog either. We just could not see our packages from our repo, despite 'apt-get update' looking good.

What I Expected to Happen

Given that the the *_Release file contains checksums for the *_Package file, I would expect that apt verifies the checksum, and if it fails, refetches the repository information even if there hasn't been an update, during any given 'apt update' operation.

Further Information

I checked apt's project in Debian at https://bugs.debian.org/cgi-bin/pkgreport.cgi?pkg=apt and there don't appear to be any bugs about this filed already, so I'm starting by filing one here.

The situation occurred on an Ubuntu 16.04 system, but is 100% reproducible with Google's chrome repository on my Ubuntu 18.04.1 laptop. I can provide a set of reproduction steps if needed, but it's fairly straight-forward.

The fact that this corruption appears to be "everything working okay" to the end user, except that apt doesn't know about packages it says it knows about, and there is no error logging for any sort, is partly why I'm filing this.

Note the "if one of two things happens" case a) above: if the repository has updates, apt re-fetches the repository information, and overwrites/removes the existing. This has the effect of accidentally fixing the problem without any data indicating the problem occurred in the first place. So it is probable that the problem is under-reported because it's not visible. Especially for frequently updated repositories like the core Ubuntu repos.

System Details

smacdonald@L247:/var/lib/apt/lists$ sudo lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.1 LTS
Release: 18.04
Codename: bionic

smacdonald@L247:/var/lib/apt/lists$ sudo apt policy apt
  Installed: 1.6.6
  Candidate: 1.6.6
  Version table:
 *** 1.6.6 500
        500 http://ca.archive.ubuntu.com/ubuntu bionic-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1.6.3ubuntu0.1 500
        500 http://security.ubuntu.com/ubuntu bionic-security/main amd64 Packages
     1.6.1 500
        500 http://ca.archive.ubuntu.com/ubuntu bionic/main amd64 Packages

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: apt 1.6.6
ProcVersionSignature: Ubuntu 4.15.0-42.45-generic 4.15.18
Uname: Linux 4.15.0-42-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
CurrentDesktop: ubuntu:GNOME
Date: Wed Dec 19 16:21:16 2018
InstallationDate: Installed on 2018-05-11 (222 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
SourcePackage: apt
UpgradeStatus: No upgrade log present (probably fresh install)

Julian Andres Klode (juliank) wrote :

I'm just curious, but what filesystem are those filesystems on? My understanding is that 0 byte files should not happen on ext4 at least.

Julian Andres Klode (juliank) wrote :

Verifying the checksums in normal operation is too expensive. I think it would be reasonable to make the clean command do that, and remove broken files.

Filesystem is ext4

ubuntu@T000000-tx2b:~$ mount
/dev/mmcblk0p1 on / type ext4 (rw,relatime,data=ordered)

The problem with deferring the situation to the clean command is that the system appears to be working properly with no problem, and the user will have to _guess_ that cleanup is required.

Running md5sum on the packages is not expensive:

ubuntu@T000000-tx2b:/var/lib/apt/lists$ time md5sum ports.ubuntu.com_ubuntu-ports_dists_xenial_universe_binary-arm64_Packages
005695bf761c4719325927b5a236a77e ports.ubuntu.com_ubuntu-ports_dists_xenial_universe_binary-arm64_Packages

real 0m0.166s
user 0m0.136s
sys 0m0.036s

Certainly it's much faster than the downloading of the indexes in the first place.

Julian Andres Klode (juliank) wrote :

So, 0.166s is very expensive in the sense that we can't do it every time we open the cache. We're literally optimizing every 10ms we can get there - apt-cache show would be twice as fast.

There are two more points where we could do those checks, though:

(1) in update - we can check if the lists we have are correct before deciding whether to download new ones
(2) in install - we can abort with an error before installing packages

The first one should essentially be free, and is probably the best way to fix the issue, as you don't see any upgrades, run update, and see them.

I was picturing scenario (1); if there are updates or the checksum on the existing doesn't match, download the index.

apt has trained users to "apt update" before doing anything else, so this would be okay.

Julian Andres Klode (juliank) wrote :

I agree.

Changed in apt (Ubuntu):
status: New → Triaged
David Kalnischkies (donkult) wrote :

Note that the file we have in lists/ is not what we downloaded as we have downloaded a highly compressed version of the content (e.g. xz), but store it either uncompressed (for which we have a checksum) or lightly compressed (e.g. lz4 for which we have no checksum and can not as different versions of a compressor could produce different files). So such a check is not exactly free as we need to potentially uncompress the content we want to check – we can't even do a size check in the general case "for free". It might be worth it paying the price in "update", but there are a bunch of people who believe we shouldn't, reporting bugs to the effect that a no-change update should finish instantly.

That said, files of size 0 could be made always invalid: We don't download such files nowadays (as an empty file compressed has a [small] size), so files in lists/ should have at least some content and if they don't something is absolutely fishy.

"reporting bugs to the effect that a no-change update should finish instantly."

By definition, if the on-disk copy is corrupt, there is an change-update available.

The algorithm for update should look like this:
- does the local copy exist? No -> update available
- is the local copy valid (checksums match)? No -> update available
- does the remote repo report a change? Yes -> update available

"That said, files of size 0 could be made always invalid"

My case happened to be corruption of the form of an empty file. Checking the checksum will detect all forms of corruption.

tags: added: id-5c336e5b216dc852b7a80d86
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers