output non reproducible inhibiting verification of changes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
germinate (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
Hi,
there are multiple classes of issues that make germinate output hard to compare for "what changed".
1. ordering issues
Even the output of the running program changes. I didn't see subprocess spawning or any other asynchronicity. So I'd assume that there are lists generated/gathered that we process as-is.
Example:
1120 Resolving supported dependencies ...
1121 * Chose dict to satisfy dictd
1122 * Chose probert-storage to satisfy curtin
1123 Rescued python-
1124 Rescued libmemcached-dbg from extra to supported
Rerun on same content:
1120 Resolving supported dependencies ...
1121 * Chose dict to satisfy dictd
1122 * Chose probert-storage to satisfy curtin
1123 Rescued default-
1124 Rescued libmpc-dev from extra to supported
1125 Rescued python-
1126 Rescued libgnome-menu-3-dev from extra to supported
Why is that - could we just sort any of the gathered lists before we iterate on them.
That could in turn make many other things in the output reproducible.
2. If a package is depended on by multiple packages or seeds a random one is reported.
Example (run with the same seeds multiple times):
all:
-binutils-multiarch | binutils | binutils-
+binutils-multiarch | binutils | binutils-
It is correct that both packages and seeds depend on them but I'd think it could be much better if we'd either:
- report an ordered full list of dependency sources (could become very long but complete)
or
- report the first element out of a sorted list (as short as today, but reproducible)
We could even get "best of both worlds" if we sort the list of dependency anchors, then concat the fist X (an arbitrary limit we set) of them and if some are left append ", ..." to reflect that.
This way we would be reproducible, in many cases even complete, and in corner cases don't explode the list size.
There could be more sources of non reproducibility that come to mind for experts in germinate (I'm not) and I'd appreciate if there could be an overhaul to get as close as reasonable to reproducible output.
Because then if someone modifies seeds (or checks any follow on output based on it) it would stay the same and indicate the actual impact a change has caused.
Well, maybe this was already discussed but I didn't find anything. If it was and considered undoable please point me to a log of the discussion if possible.
Related branches
- Colin Watson: Approve
-
Diff: 23 lines (+3/-2)1 file modifiedgerminate/germinator.py (+3/-2)
- Colin Watson: Approve
- Iain Lane: Pending requested
-
Diff: 26 lines (+4/-3)1 file modifiedgerminate/germinator.py (+4/-3)
I see some efforts to keep order already: self._inherit)
1843 pkglist = sorted(pkgset)
1900 srclist = sorted(srcset)
1924 snaplist = sorted(snapset)
524 self._names = topo_sort(
...
So I'm not introducing an idea that would be the polar opposite how it is supposed to work :-)
I already have a commit that works for the log output, looking into the generated files now ...