Error using "Subset all Fonts"

Bug #1349856 reported by Yury Donskoy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Undecided
Unassigned

Bug Description

Trying to subset all fonts in an epub, I get the following:
calibre, version 1.46.0
ERROR: Unhandled exception: <b>ValueError</b>:Extra data: line 1 column 147160 - line 1 column 147161 (char 147159 - 147160)

calibre 1.46 isfrozen: True is64bit: True
Linux-3.15.4-200.fc20.x86_64-x86_64-with-fedora-20-Heisenbug Linux ('64bit', 'ELF')
('Linux', '3.15.4-200.fc20.x86_64', '#1 SMP Mon Jul 7 14:24:41 UTC 2014')
Python 2.7.5
Linux: ('Fedora', '20', 'Heisenbug')
Traceback (most recent call last):
  File "site-packages/calibre/ebooks/oeb/polish/stats.py", line 136, in _pass_json_value_setter
  File "json/__init__.py", line 338, in loads
  File "json/decoder.py", line 368, in decode
ValueError: Extra data: line 1 column 147160 - line 1 column 147161 (char 147159 - 147160)

This was not a problem until I upgraded to 1.46.0 from 1.45.0

Calibre version: 1.46.0

OS: Linux - Fedora 20 64bit

Steps: Click subset all fonts.

Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1349856

I cannot reproduce this, clicking subset all fonts continues to work as
it always has, with both files that have fonts and files that do not.

 status invalid

Changed in calibre:
status: New → Invalid
Revision history for this message
Yury Donskoy (yury-donskoy) wrote :

Okay, I figured it out, and I think it IS a bug, although an obscure one. Anything you can do with the user interface, the rest of the application should be able to handle without crashing. To that end, here is the result of my investigation.

The culprit, ultimately, is the following bit of text: 4000𝑐

That isn't just a lowercase "c" after 4000. Calibre identifies it as MATHEMATICAL ITALIC SMALL C. I inserted it into the text using the "Insert special character" button, 4th from right. Select "Mathematical Symbols", and then "Mathematical Alphanumeric Symbols". This character is listed as U+1D450. The first time I tried overtyping that character, calibre crashed, but not the second time I tried it. In any case, if the character is removed or replaced with a regular c, subsetting fonts works once again.

In a moment, I will be attaching an epub for you to look at. The offending second paragraph is commented it out, and subsetting works. If you uncomment it, it will stop.

Revision history for this message
Yury Donskoy (yury-donskoy) wrote :
Revision history for this message
Kovid Goyal (kovid) wrote :

This is a bug in the version of the javascript engine shipped with Qt 4 (it appends extra null bytes at the end of JSON that contains non-BMP unicode characters).

Use the Qt 5 version of calibre from here: http://www.mobileread.com/forums/showthread.php?t=242223
and the bug will not occur.

I will add some code to strip out the null bytes before decoding to the next calibre release.

Revision history for this message
Kovid Goyal (kovid) wrote : Fixed in master

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.