Create github python3 branch for contributors

Bug #1756458 reported by David Gu on 2018-03-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
Unassigned

Bug Description

I am aware that a python3 version is not currently planned due to the amount of tedious work involved in porting the code however this is a good opportunity to leverage the open source community to contribute.

Please create a development branch containing the codebase after running the 2to3 automatic upgrader (2to3 --output-dir=python3-version/mycode -W -n python2-version/mycode) on the existing python 2 codebase.

The reason I'm requesting this of you, instead of performing the upgrade on a local branch myself. is that the automatic tool produces a large number of file changes, resulting in large commits which difficult to review for malicious code if they are sent by someone as a pull request. Therefore, after the bulk of changes are made automatically, contributors will be able to send small commits to fix resulting bugs.

Kovid Goyal (kovid) wrote :

There is just no way that calibre is ever going to be ported to python 3. There are over half a million lines of code in calibre including lots of python C extensions. And that's not even mentioning its third party dependencies. In my experience "leveraging the open source community" is never going to work, because no one is going to do the *huge* amount of tedious work involved.

Not to mention that using 2to3 is the wrong way to go about it, since it means that the 2to3ed branch will then get out of sync with master. The only sane way to do it is to make the code base work with both pythons.

Changed in calibre:
status: New → Won't Fix

Yes, as Kovid said it does absolutely no good whatsoever to maintain a
separate branch with a onetime automated conversion that will become
useless anyways as it does not get updated with the extensive changes in
the python2 branch which actually gets developed.

If someone was willing to do the work to move the single unified
codebase closer to being python3 compatible, that would be a different
story entirely.

David Gu (davidgu) wrote :

Regarding your point about changes in the python2 branch causing desync with master, updates to the main branch could be periodically merged into the python3 branch with python3 compatibility changes. This itself would not be a large amount of work, and could likely be managed by a single contributor (I'm willing).

The reason I proposed using 2to3 is because much of the tedium is a result of repetitive changes that can be automated in this way.

Assuming that test coverage is good, individual contributors can patch and debug subsets of the resulting converted code, verifying that functionality is restored with successful test runs. Therefore, no single contributor will "do the *huge* amount of tedious work involved." I have personally found that people are quite willing to contribute small, self-contained pull requests.

Eli Schwartz (eschwartz) wrote :

... the alternative would be that anyone who just wanted to contribute
non-2to3 related changes is more than welcome, and you're even welcome
to contribute 2to3-related changes as long as you write polyglot code
that continues to work painlessly.

Pretty sure this is how codebases are *usually* migrated.

As it is, if you're volunteering to maintain a python3 branch completely
independently of the main codebase, with no help from Kovid... you could
just do that anyways in a separate repo. It would be no *more* work to
do so.

Kovid Goyal (kovid) wrote :

OK lets test that. Start by porting one of calibre's dependencies to python3. https://github.com/python-mechanize/mechanize
That one actually has good test coverage unlike large parts of calibre, which were written decades ago before unittesting was widespread, or that aren't suitable for unit testing. Lets see how it goes. It has no native code and I have already done lots of modernizing/pruning on its code base, so it should be ideal for 2to3. I'll be happy to create a 2to3 based branch for it. Just post on https://github.com/python-mechanize/mechanize/issues/9

And you are greatly underestimating the amount of work porting changes from python2 to python3 is. python3 is a fundamentally worse language than python2 for calibre's use case, which involves lots of dealing with bytes for binary file formats that calibre has to handle (the python3 bytes type is a travesty) and lots of interfacing with system APIs that use UTF-16 encoded data (on windows).

The difficulty of porting is not in the trivially automatable bits of the code base that 2to3 can handle. It comes from these things:

1) python3's awful bytes type
2) The necessity to keep converting strings to UTF-16 in the native code extensions that interface with system APIs on windows which is a huge performance sink and code bloat
3) The extremely high probability of introducing regressions for no real benefit.
4) The various performance regressions in python3 which there is no easy way to mitigate.
5) The large bits of legacy code in calibre that dont have unittests / aren't written in modern idioms.

As I have stated before, I am perfectly willing to keep maintaining python2 indefinitely, I already maintain a port of it that builds with VS2015. That is far less work and far less potential for bugs. That might change someday if one of calibre's python dependencies stops supporting python2 and it is complex enough that I dont feel comfortable maintaining it, but I doubt there are any such dependencies. Probably the two most complex are lxml and PyQt, both of which use code generators which means they are trivial to keep working with py2.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.