indexing fails silently with non-utf8 text attachments
I use zim rev336 on Ubuntu 10.04.1. Once I attached a simple .txt file to a page and saw that it appears in zims outline sidebar as if it was a subpage. Well, this is surprising but the attached text can be displayed within zim, which is nice. But afterwards every Search got no results, and later every clicked page was empty! (just white background - quite scary for a moment)
Starting "zim -V -D" shows that indexing throws an exception (see below) if the .txt attachment contains non-utf8 characters. And later also zims page history gets poisoned somehow, so that clicking to open other pages throws similar exceptions. For normal users who don't start zim in a terminal this all happens silently.
I propose to treat .txt files as pages only if they contain a valid zim header. Now as workaround I just rename .txt to .log, then zim treats the text as normal attachment and does not get confused anymore.
To reproduce add a non-utf8 .txt file to your notebook and start zim:
$ echo "Chuchichästli" | iconv -t latin1 >~/mynotebook/
$ zim -V -D
ERROR: Got an exception while indexing "<IndexPath: non-utf8>":
Traceback (most recent call last):
for type, href, _ in page.get_links():
tree = self.get_
self._parsetree = self._fetch_
lines = lines or self.source.
lines = self._readlines()
lines = file.readlines()
data = self.read()
newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 8-10: invalid data
|Jaap Karssenberg (jaap.karssenberg) wrote : Re: [Bug 705479] Re: indexing fails silently with non-utf8 text attachments||#3|
|tags:||added: error-handling import|
removed: error-handling import missing redesign