apt-ftparchive does not correctly cache filesizes for packages > 4GB

Bug #1710911 reported by David McBride
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
apt (Debian)
New
Unknown
apt (Ubuntu)
New
Undecided
Unassigned

Bug Description

Release: 16.04
Version: 1.2.19

apt-ftparchive is a utility for, among other things, generating a Packages file from a set of .deb packages.

Because generating Packages files for a large directory tree of .deb packages is expensive, it can cache the properties of .deb packages it has already inspected in a Berkeley database file.

Historically, apt-ftparchive stored the size of a .deb package as a 32-bit unsigned integer in network byte-order in this cache. This field was later enlarged to 64-bits - which caused other problems; see: LP #1274466.

However, even after this integer field was enlarged, apt-ftparchive continued to use the htonl() and ntohl() libc functions to convert file-sizes to and from network byte-order when reading and writing to its cache. These functions unconditionally emit 32-bit unsigned integers, which means that apt-ftparchive remained unable to correctly record the file-sizes for packages > 32bits (i.e. > 4GB).

Consequently, if apt-ftparchive is asked (with caching enabled) to generate a Packages file for a .deb package larger than 4GB, it will produce a Packages file with the correct Size: field the first time, but with incorrect Size: fields subsequently.

I have developed a small patch which replaces the use of the ntohl() family of functions with suitable replacements from <endian.h>. This produces correct output on new installations.

However, caution is necessary: the existing code is incorrectly storing the 32 least significant bits of a 64-bit number in the upper 32-bits of a 64-bit field, in big-endian byte order. The application of this patch will cause new values to be stored correctly, but in a binary-incompatible way with existing caches.

For example, a package of size 7162161474 bytes will today have the following sequence of bytes stored in its cache entry:

\xaa\xe5\xe9\x42\x00\x00\x00\x00

(When re-read, this will produce a file-size value of 7162161474 mod 32bits, i.e. 2867194178.)

With this patch applied, apt-ftparchive will correctly store this entry:

\x00\x00\x00\x01\xaa\xe5\xe9\x42

However, this correct cache entry, when interpreted by the current broken code, will return a file-size of 1 byte. Worse, the existing broken entry will be interpreted by my fixed code as containing the value 12314505225791602688.

It would be good to have this patch, or some derivative of it, applied to the main APT code-base. Before this can happen, however, some mechanism to detect and correct broken cache entries will be needed if we are to avoid a repeat of LP #1274466.

I would suggest this could be done by checking the trailing four bytes of the 64-bit filesize field: if they are all zero, then the cache entry is broken, and should be rewritten.

Revision history for this message
David McBride (david-mcbride) wrote :
Revision history for this message
Julian Andres Klode (juliank) wrote :

Just increment the cache version, thus forcing a regeneration. It's needed anyway so you don't destroy stuff on downgrades.

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "Fix truncation of 64-bit file-sizes caused by htonl()" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
David McBride (david-mcbride) wrote :

juliank: From what I can see, cache files generated by apt-ftparchive do not contain versioned entries. They can also be located anywhere on a machine's filesystem, depending on how the sysadmin chose to apply the utility; there isn't one canonical file that can be automatically refreshed on package upgrade.

At present, the ':st' part of a cache entry has the following structure:

   enum FlagList {FlControl = (1<<0),FlMD5=(1<<1),FlContents=(1<<2),
                  FlSize=(1<<3), FlSHA1=(1<<4), FlSHA256=(1<<5),
                  FlSHA512=(1<<6), FlSource=(1<<7)

[...]

   // WARNING: this struct is read/written to the DB so do not change the
   // layout of the fields (see lp #1274466), only append to it
   struct StatStore
   {
      uint32_t Flags;
      uint32_t mtime;
      uint64_t FileSize;
      uint8_t MD5[16];
      uint8_t SHA1[20];
      uint8_t SHA256[32];
      uint8_t SHA512[64];
   }

(From ftparchive/cachedb.h)

One of my colleagues has pointed out that my original suggestion of checking the lower four bytes for null values will misbehave if the corrected code is asked to record information for a file that is an exact integer multiple of 4GB. You could stat the file on the filesystem in this case to resolve the ambiguity.

Alternatively, you could modify the data-structure above to make it unambiguous what kind of record is being read:

* A flag bit could be used in Flags;
* An explicit version field could be added to the end of this struct.
* The use of the cache to store package file sizes could be discontinued in favour of always statting the file-size directly;
* The packed data-structure could be discarded completely, in favour of storing different pieces of metadata in independent DB keys.

Revision history for this message
David McBride (david-mcbride) wrote :

I believe this issue also affects apt-ftparchive in Debian, so have raised a bug there:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=872334

Revision history for this message
Julian Andres Klode (juliank) wrote :

Removed the patch tag here and there, as the patch is incomplete.

tags: removed: patch
Changed in apt (Debian):
status: Unknown → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.