'bzr status' crash if .bzrignore containts Latin-2 chars

Bug #183504 reported by KISS, Zoltán
12
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Low
Jason Spashett

Bug Description

bzr: ERROR: exceptions.UnicodeDecodeError: 'utf8' codec can't decode bytes in position 125-127: invalid data

Traceback (most recent call last):
  File "bzrlib\commands.pyc", line 802, in run_bzr_catch_errors
  File "bzrlib\commands.pyc", line 758, in run_bzr
  File "bzrlib\commands.pyc", line 492, in run_argv_aliases
  File "bzrlib\commands.pyc", line 768, in ignore_pipe
  File "bzrlib\builtins.pyc", line 189, in run
  File "bzrlib\status.pyc", line 118, in show_tree_status
  File "bzrlib\workingtree.pyc", line 1668, in is_ignored
  File "bzrlib\workingtree.pyc", line 1647, in get_ignore_list
  File "bzrlib\ignores.pyc", line 104, in parse_ignore_file
  File "encodings\utf_8.pyc", line 16, in decode
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 125-127: invalid
 data

bzr 1.0.0 on python 2.5.1.final.0 (win32)
arguments: ['bzr', 'st']
encoding: 'cp1250', fsenc: 'mbcs', lang: None
plugins:
  launchpad C:\Program Files\Bazaar\lib\library.zip\bzrlib\plugins\launchpad [unknown]
  multiparent C:\Program Files\Bazaar\lib\library.zip\bzrlib\plugins\multiparent.pyc [unknown]
*** Bazaar has encountered an internal error.
    Please report a bug at https://bugs.launchpad.net/bzr/+filebug
    including this traceback, and a description of what you
    were doing when the error occurred.

Windows XP ENU Prof / Hungarian Envireonment

Tags: easy

Related branches

Revision history for this message
Martin Albisetti (beuno) wrote :

Hello, thanks for the bug report.
Could you attach the .bzrignore file so we can look into it a bit more?

Thanks!

Revision history for this message
Alexander Belchenko (bialix) wrote :

.bzrignore supposed to be in utf-8 encoding. To achieve this is better to add new ignore patterns with `bzr ignore PATTERN` command.

bzr itself should not fails with traceback in this case though. I think it's enough to print warning in this case and go on.
Should be easy to fix.

Changed in bzr:
importance: Undecided → Low
status: New → Confirmed
Changed in bzr:
assignee: nobody → Jason Spashett (jspashett)
Revision history for this message
Jason Spashett (jspashett) wrote :

As suggested fix is:

* Print warning. include line number?
* Carry on

I have modified "ignores.py" as follows. But I would like to use the warning call, must I move my code into "builtin.py" and catch the exception there? In which case should I make a custom exception and include the line number within it?

def parse_ignore_file(f):
    """Read in all of the lines in the file and turn it into an ignore list"""
    ignored = set()
    line_number = 0 # Line counting to report character decode errors
    for line in f.read().split('\n'):
        line_number +=1
        # Decode the line here, and catch any decoding errors
        try:
            line = line.decode('utf8').rstrip('\r\n')
            if not line or line.startswith('#'):
                continue
            ignored.add(globbing.normalize_pattern(line))
        except UnicodeDecodeError:
            print('ignore file: Line %d, malformed utf8 character. ignoring line.'
                  ' please ensure file is utf8 encoded' % (line_number))
    return ignored

---------- output example. (msg TBD) ---------------
C:\bzr\bazaar\183504_latin_2_ignore_file>python bzr st
ignore file: Line 68, malformed utf8 character. ignoring line. please ensure file is utf8 encoded
modified:
  .bzrignore
  bzr*
  bzrlib/ignores.py
unknown:
  .pydevproject
  .settings/
  src/

Revision history for this message
Jason Spashett (jspashett) wrote :

I've read the developer docs, and it seems a custom exception may be appropriate. If that's the case then the first and/or last line number of the decode error can be stored in the exception, and it would be raised on exit from parse_ignore_file (ignores.py) and caught in get_ignore_list (workingtree.py) which already imports "mutter, note" and i'd add "warning" to that.

Perhaps this should not be an exception thowing event? Advice welcomed.

Changed in bzr:
status: Confirmed → In Progress
Revision history for this message
Martin Pool (mbp) wrote : Re: [Bug 183504] Re: 'bzr status' crash if .bzrignore containts Latin-2 chars

2009/9/26 Jason Spashett <email address hidden>:
> I've read the developer docs, and it seems a custom exception may be
> appropriate. If that's the case then the first and/or last line number
> of the decode error can be stored in the exception, and it would be
> raised on exit from parse_ignore_file (ignores.py) and caught in
> get_ignore_list (workingtree.py) which already imports "mutter, note"
> and i'd add "warning" to that.
>
> Perhaps this should not be an exception thowing event? Advice welcomed.

I think probably not; you should give a warning through trace.warning.

--
Martin <http://launchpad.net/~mbp/>

Changed in bzr:
status: In Progress → Fix Committed
John A Meinel (jameinel)
Changed in bzr:
milestone: none → 2.2b4
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.