Tweaks for Author Sort don't process extended ASCII properly

Bug #1701138 reported by dwig gang on 2017-06-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Undecided
Unassigned

Bug Description

Ongoing bug for a year or two. Steps:
Access Preference>Tweaks>Author sort name algorythm
Edit "Author_name_suffixes" to include "père" (the usual suffix for Alexandre Dumas "senior")
Apply and exit Prefs.
Restart calibre
Select any book and access "Edit Metadata"
note error:

-------
calibre, version 3.1.1
ERROR: Unhandled exception: <b>UnicodeDecodeError</b>:'utf8' codec can't decode bytes in position 1-2: invalid continuation byte

calibre 3.1.1 embedded-python: True is64bit: False
Windows-10-10.0.15063-SP0 Windows ('32bit', 'WindowsPE')
32bit process running on 64bit windows
('Windows', '10', '10.0.15063')
Python 2.7.12+
Windows: ('10', '10.0.15063', 'SP0', u'Multiprocessor Free')
Interface language: None
Successfully initialized third party plugins: DeDRM (6, 1, 0) && Resize Cover (1, 0, 2) && Generate Cover (1, 5, 21) && DOC Input (1, 0, 1) && KindleUnpack - The Plugin (0, 81, 4) && Modify ePub (1, 3, 13) && Open With (1, 5, 10) && Diaps Editing Toolbag (0, 3, 4) && EpubSplit (2, 2, 0)
Traceback (most recent call last):
  File "site-packages\calibre\gui2\metadata\basic_widgets.py", line 494, in update_state_and_val
  File "site-packages\calibre\ebooks\metadata\__init__.py", line 101, in authors_to_sort_string
  File "site-packages\calibre\ebooks\metadata\__init__.py", line 72, in author_to_author_sort
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1-2: invalid continuation byte
----------

Reaccessing the Tweak reveals that calibre has saved the new entry as a string of characters containing 2 backslashes. Apparently it is misencoding the accented character when saved or misreading it later when relaunched.

I'm currently running Win10 64bit, with current updates, and 32bit calibre 3.1.1 and decided to test this old issue, which proved to still be present. I had this same error for months around a year ago (older Win10 and a long series of releases of calibre 2.x) and finally tracked it down to using the proper French spelling in the Tweak. Using a simple "e" without the accent works around the issue. Manually editing the author name and author sort to include the accented e works, though the Tag Browser misfiles the entry when the author name is changed until it is found (filed by first name) and the author sort is fixed, which returns it to its proper location in the Tag Browser.

Actually, since your system appears to be using a non UTF-8 encoding, to
be perfectly safe, write it as

u'p\xe8re'

Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.

 status fixreleased

Changed in calibre:
status: New → Fix Released
dwig gang (dwig) wrote :

I just tested in v3.3 and it appears fixed, thanks.

I tests 3.3 and the bug seems to be fixed.

thanks,

-----
dwig

On 2017-07-06 10:27 AM, Kovid Goyal wrote:
> Fixed in branch master. The fix will be in the next release. calibre is
> usually released every alternate Friday.
>
> status fixreleased
>
> ** Changed in: calibre
> Status: New => Fix Released
>

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers