UnicodeDecodeError when find encounters a non-ascii file

Bug #1211841 reported by Curtis Hovey
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gedit Developer Plugins
Fix Released
High
Curtis Hovey

Bug Description

Wile search for text in a firectory with utf-8 files, this error was seen.

Traceback (most recent call last):
  File "/usr/lib/python3.3/threading.py", line 637, in _bootstrap_inner
    self.run()
  File "/home/curtis/.local/share/gedit/plugins/gdp/find.py", line 143, in run
    self.pattern, substitution=self.substitution):
  File "/home/curtis/.local/share/gedit/plugins/gdp/find.py", line 55, in find_matches
    file_path, match_re, substitution=substitution)
  File "/home/curtis/.local/share/gedit/plugins/gdp/find.py", line 85, in extract_match
    for lineno, line in enumerate(file_):
  File "/usr/lib/python3.3/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc2 in position 83: invalid continuation byte

Each file is opened as text, but the encoding is not specified. This looks like the utf-8 file is being decoded as ascii. the simple fix is to open the file as utf-8. There is an important nuance with unsupported characters though....substitute needs to know the correct number of characters when replacing text in a line. The 'replace' option is the safest means to handle errors so ensure files are not corrupted.

Curtis Hovey (sinzui)
Changed in gdp:
milestone: none → 1.0.0
Curtis Hovey (sinzui)
Changed in gdp:
status: In Progress → Fix Committed
Curtis Hovey (sinzui)
Changed in gdp:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.