Zim

Zim 0.46 don't start with 'C' locale

Bug #561121 reported by Eugene Mikhantiev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Zim
Won't Fix
High
Unassigned

Bug Description

eugene@eugene-laptop:~$ LANG=C
eugene@eugene-laptop:~$ zim
Traceback (most recent call last):
  File "/usr/bin/zim", line 41, in <module>
    zim.main(argv)
  File "/usr/lib/pymodules/python2.6/zim/__init__.py", line 276, in main
    default = zim.notebook.get_default_notebook()
  File "/usr/lib/pymodules/python2.6/zim/notebook.py", line 234, in get_default_notebook
    return get_notebook(path)
  File "/usr/lib/pymodules/python2.6/zim/notebook.py", line 225, in get_notebook
    return Notebook(dir=path)
  File "/usr/lib/pymodules/python2.6/zim/notebook.py", line 396, in __init__
    self.config = ConfigDictFile(dir.file('notebook.zim'))
  File "/usr/lib/pymodules/python2.6/zim/config.py", line 451, in __init__
    self.read()
  File "/usr/lib/pymodules/python2.6/zim/config.py", line 460, in read
    self.parse(self.file.readlines())
  File "/usr/lib/pymodules/python2.6/zim/fs.py", line 679, in readlines
    lines = self._readlines()
  File "/usr/lib/pymodules/python2.6/zim/fs.py", line 696, in _readlines
    file = self.open('r')
  File "/usr/lib/pymodules/python2.6/zim/fs.py", line 607, in open
    fh = open(self.path, mode=mode)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 23-29: ordinal not in range(128)

eugene@eugene-laptop:~$

Revision history for this message
Johannes Reinhardt (johannes-reinhardt) wrote :

I cannot reproduce this problem. From the stack trace it seems to me that the a note contains some characters that make problems. Have you tried it with an empty notebook?

Revision history for this message
Eugene Mikhantiev (mehanik) wrote :

Yes. You're right, with empty notebook everything works well. The problem occurs if the path to the notebook contains Cyrillic characters.

Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote :

Related to bug #572805. Not sure if sys.getfilesystemencoding() would detect it correctly in this case.

Could you test the following ? Run in a terminal:

 $ python
 >>> import sys
 >>> sys.getfilesystemencoding()

And let us know the result ?

Changed in zim:
status: New → Confirmed
importance: Undecided → High
Revision history for this message
Eugene Mikhantiev (mehanik) wrote :

eugene@eugene-laptop:~$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getfilesystemencoding()
'UTF-8'
>>> exit()

eugene@eugene-laptop:~$ LANG=C
eugene@eugene-laptop:~$ python
Python 2.6.5 (r265:79063, Apr 16 2010, 13:09:56)
[GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getfilesystemencoding()
'ANSI_X3.4-1968'
>>> exit()

Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote : Re: [Bug 561121] Re: Zim 0.46 don't start with 'C' locale

Now here lies a problem. When you use LANG=C the system believes your
file system to be ASCII encoded, not UTF-8. However in fact your
filesystem contains UTF-8 encoded file names containing characters
that simply can not be encoded in ASCII notation.

Solving this in means we can not trust the reported filesystem
encoding and need some heuristic for checking it. It also means we
need some way to encode non-ascii chars in ascoii (url encoding comes
to mind) AND make sure there is no conflict with pre-existing files in
different encodings :S

Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote :

After investigating various options I'm afraid this will be a "Won't fix".

I will follow up on bug #572805 to make sure the preferred filesystem encoding is used properly. However in this case the files on the file system simply do not follow the given LANG environment . In short, LANG=C is wrong when you have UTF-8 encoded file names.

Will try to fallback to UTF-8 decoding when environment is ASCII, but not guarantee this will work.

Revision history for this message
Eugene Mikhantiev (mehanik) wrote :

Thank you very much for the detailed response. Now I use LANG=en_US.UTF8 instead LANG=С, and everything works fine. The problem really is related to the system locale, Zim works absolutely correctly.

Revision history for this message
Jaap Karssenberg (jaap.karssenberg) wrote :

Flagging this as "Won't fix".

Final investigation note. We can do some logic to detect illegal characters e.g. in page names and work around it (although we won't to keep things simple). However in this case the notebook path was stored in the config file, this file is written in utf-8 so it does not preserve original filesystem encoding. So if locale settings change we can only guess what it should have been.

I will make the encoding fall back to utf-8, and which happens to be the right thing in your case, but this is not a proper fix,

Changed in zim:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.