multiple copies of orig.tar.gz's in the librarian
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
High
|
Celso Providelo | ||
Ubuntu |
Invalid
|
Medium
|
Unassigned |
Bug Description
The sync-source tool deliberately explodes when it finds more than one orig.tar.gz in the archive for a given source_version. This appears to be the case for, e.g. advi_1.
*****
<SourcePackageF
filename: advi_1.
alias: 1268214
distrorelease: hoary, component: universe, source: advi, status: 2
*****
<SourcePackageF
filename: advi_1.
alias: 1331574
distrorelease: breezy, component: universe, source: advi, status: 2
*****
<SourcePackageF
filename: advi_1.
alias: 1331574
distrorelease: dapper, component: universe, source: advi, status: 3
*****
<SourcePackageF
filename: advi_1.
alias: 1331574
distrorelease: dapper, component: universe, source: advi, status: 2
E: advi_1.
Changed in launchpad-upload-and-queue: | |
assignee: | nobody → dsilvers |
status: | Unconfirmed → Confirmed |
Not directly helpful, but this background information about how the Librarian handles duplicates may be of interest:
The librarian tries to ensure that identical files are only stored once (and so the LibraryFileContent table will only have one row for that file), but by design allows duplicate aliases to that content. (Additionally, there can be duplicate content if the same new file is uploaded simultaneously in two seperate but concurrent transactions, but the Librarian GC process will find and collapse duplicate content rows daily).
Basically, this means this constraint needs to be enforced somewhere other than the librarian. A workaround might be to try relying on the librarian's existing duplicate detection, i.e. check if the duplicate aliases are linked to the same content or not, but I don't think this will be bulletproof.
Judging from a bit of grepping, the place where these files are added is nascentupload.py, in insert_ source_ into_db, so this would be the obvious place to start fixing.