Comment 18 for bug 2075337

Revision history for this message
Dave Jones (waveform) wrote : Re: py3clean fails when using alternate character set

Having spent some time digging into this (at the request of the SRU team), I'll summarise my findings:

Firstly, the analysis is correct: py3clean is ultimately the issue; packages that both use py3clean and have diversions are affected (diversions are the only circumstances that produce localised output in dpkg-query -L).

Secondly, the proposed fix upstream is also fine, though personally I'd also like to see the stdout decode use errors='replace' (there's no good reason to fail here in the event of dodgy UTF-8 in a translation, for instance). I'll attach debdiffs for noble and oracular to illustrate my intent.

Thirdly, the proposed workaround for affected packages: using Breaks on python3-minimal with a version strictly less than the fixed version also appears to be sufficient (in all the cases I've tested this causes python3-minimal to be at least unpacked prior to the prerm script of the affected package being executed).

So, if the fix for python3-minimal can be uploaded, it simply remains to determine which packages are affected. This is where things get tricky.

Benjamin's efforts are much appreciated here, and will likely be decisive in light of the following: determining which packages use py3clean is relatively simple (pretty much anything that installs a python module). Determining which packages have diversions turns out to be extremely difficult.

It's not enough to detect if a package *itself* uses dpkg-divert. Take the cloud-init case where this was first detected: the postinst calls dpkg-divert but only to *remove* an old diversion. The diversion that actually affects this package comes from usr-merge (because cloud-init still has files under the unmerged paths for various reasons). In essence, any package (foo) can cause a diversion in another package (bar) without the affected package (bar) showing any sign of this in either its source or its binary artefacts. Further, the diversion may or may not affect the package (bar) as the source of the diversion (foo) may be optional to install.

Hence, there are two approaches. The thorough, but likely impractical, approach would be to apply the "Breaks" fix to all packages using py3clean: i.e. all packages that install python modules. Then there's (Benjamin's) empirical approach: attempt to install everything from the former set and see what fails. As noted above, this cannot guarantee correctness as we cannot be certain that all packages that may divert files in our target package are installed, but in practice it's probably (hopefully!) "good enough" given the rarity of diversions and avoids updates to (presumably) several thousand packages.