importer: having only one import record for every publish means the commit date is no longer identical to the publishing date
Bug #1730734 reported by
Nish Aravamudan
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
git-ubuntu |
Triaged
|
Critical
|
Robie Basak |
Bug Description
launchpad_
Related branches
~nacc/git-ubuntu:lp1730734-cache-importer-progress
On hold
for merging
into
git-ubuntu:master
- Server Team CI bot: Approve (continuous-integration)
- git-ubuntu developers: Pending requested
-
Diff: 475 lines (+136/-8)6 files modifiedgitubuntu/importer.py (+50/-3)
gitubuntu/source_information.py (+16/-2)
man/man1/git-ubuntu-import.1 (+18/-2)
scripts/import-source-packages.py (+18/-0)
scripts/scriptutils.py (+16/-1)
scripts/source-package-walker.py (+18/-0)
Changed in usd-importer: | |
importance: | Undecided → Critical |
milestone: | none → lp-beta |
status: | New → Confirmed |
tags: | added: import-edge-case |
tags: | added: hash-abi-break |
Changed in usd-importer: | |
status: | Confirmed → In Progress |
assignee: | nobody → Nish Aravamudan (nacc) |
tags: | added: import |
tags: | added: spec |
To post a comment you must log in.
To be clear, here is my analysis:
In 0f3c943054abc82 734febac91896b8 0f401fcaa7, 47d2ec2bef4bc81 efa688610113863 ebcfddaa22 and 5aa33fa08078998 6d90cbe27b086fa 63b298a164, we modified the import algorithm to simply reset the branch pointers for series & pocket to the corresponding commit of the source package publication. Similarly, for the series & pocket -devel, and ubuntu/devel branches. This reflects the publication event. We also dropped the publishing parent.
In effect, we now have a commit graph of all the publications, tied together by their changelog entries, and the various branch references are just pointers into that graph. They may or may not fast-forward between publications to the corresponding series & pocket.
This subtly breaks our catch-up model in the importer. We store the publication timestamp into the publication commit's commit date. We iterate the latest Launchpad publishes backwards for a given source package until we found a branch that was identical to a given published version *and* publication timestamp and then iterated forward from there. This works well when the commits in the various branches store the publication data specific to that branch (series & pocket).
With the aforementioned change, though, there is now only one commit for a given source package version, which occurs when we first see the version (in either Debian or Ubuntu, Debian first). That commit's metadata, specifically the commit date, is the publication date of only that first publication record. Thus, even if a branch has been correctly updated to point to that commit object, we have no means of recording that event corresponds to a specific publication event.
As far as I can tell, there is no way to do this within the Git metadata. We could add a field to the commit message (a la LP: #1728685), which stores the unique link to the publication record. That would change all hashes, of course, so this needs to be resolved before LP beta.
This might just be an efficiency problem (we would just be importing source packages that are already imported, which should be fine: we iterate far enough back that we re-import something that is an ancestor of where a branch currently is. We'll end up not importing it again (the treeish will match) and then we'll move the branch pointer to an old spot and back to where it is now) but I'm worried it would lead to breaks in the commit graph as that is confusing :)