orig files for source packages are being counted for each version

Bug #432152 reported by William King on 2009-09-17
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Launchpad itself
High
Michael Nelson

Bug Description

I have a 160MB orig file and it is counted for each version of my package. So I can't have more than 4-5 versions in my ppa, even though I only upload the orig file once.

Related branches

Celso Providelo (cprov) wrote :

Willian, what's your ppa url ?

Changed in soyuz:
status: New → Incomplete
William King (quentusrex) wrote :

https://launchpad.net/~pbxbuntu-drivers/+archive/ppa

I've deleted all of the back packages to try to recover some space. I've also filed a request for more size. I'll increment the package shortly to show you the size miscalc.

https://answers.launchpad.net/soyuz/+question/83128

William King (quentusrex) wrote :

Now you can see the miscalculation. I can't have two versions of the application in my PPA at any one time...

William Grant (wgrant) wrote :

I had a poke around to work out what was going on, as the situation will change quickly as things are uploaded or removed.

There are four sources currently in the archive: one with a 160MiB orig.tar.gz, and three with the same 412MiB orig.tar.gz. That adds up to around the 1.4GiB that we are seeing as the PPA size, even thought the latter three orig.tar.gzs are the same file on the archive disk.

Using the Librarian's convenient SHA1 search functionality, I discovered that there were indeed three LFAs for the large orig.tar.gz. It is obvious from the orig.tar.gz and .changes LFA IDs that they were created during the upload of each source.

Digging into lp.archiveuploader.dscfile, I can see that it will download any missing files, write them next to the rest of the source files, go along with the rest of the upload processing, and then blindly upload them all at the end. This is going to create duplicate LFAs and LFCs for any files that were downloaded, although librarian-gc will collapse the LFCs when it runs. Looking at Archive.sources_size, it appears that it sums the sizes of all distinct LFCs associated with published files. So the duplicate LFCs will be counted, and the world will end.

I suspect the easiest fix (also an efficiency improvement!) is to just alter lp.archiveuploader.dscfile.DSCFile.storeInDatabase to check if the file already exists in the librarian, and use that instead. I don't think there's an easy way to alter the size query to reliably determine the unique files on disk.

Julian Edwards (julian-edwards) wrote :

Thanks for the investigation William G, and thanks for the bug report William K. I am surprised nobody noticed this until now.

Changed in soyuz:
importance: Undecided → High
milestone: none → 3.1.10
status: Incomplete → Triaged
Changed in soyuz:
assignee: nobody → Michael Nelson (michael.nelson)
Changed in soyuz:
status: Triaged → In Progress
Changed in soyuz:
status: In Progress → Fix Committed
Changed in soyuz:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions