PoProofRead. poproofread has MOVED to github

Can not use file which is not UTF-8 encoded

Bug #908182 reported by Byrial Jensen on 2011-12-23

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	PoProofRead. poproofread has MOVED to github	Won't Fix	High	TLE	PoProofRead. poproofread has MOVED to github 0.1.8

Bug Description

I tried to open a file which uses latin1 (ISO-8859-1) encoding. It gave this warning message to the console:

/usr/lib/pymodules/python2.7/poproofread/poproofread_gtk.py:261: GtkWarning: gtk_text_buffer_emit_insert: assertion `g_utf8_validate (text, len, NULL)' failed
textbuffer.insert(startiter, text)

I can go forward and back in the file with PageUp and PageDown keys, but diff and comment windows are all empty, except for last one:

=============================================================================
Number of messages: 3
=============================================================================

which happens to be the only chunk with only ASCII chars.

Revision history for this message

Ask Hjorth Larsen (askhl) wrote on 2011-12-23:

Maybe the best way to fix this is to always use Python unicode objects internally in poproofread, since it has to be handled correctly by gtk. The decode() method in gtparse can be used to do this easily (if it works right now; otherwise we should fix it first).

TLE (k-nielsen81) on 2011-12-23

Changed in poproofread:
status:	New → Confirmed
importance:	Undecided → High
assignee:	nobody → TLE (k-nielsen81)
milestone:	none → 0.1.8

Revision history for this message

TLE (k-nielsen81) wrote on 2011-12-23:

Hallo Byrial

Thanks for reporting this. Character encoding was one of those things that I had deliberately not done yet because it is tricky and not very funny :| But it definitely needs to be done so now is a good a time as any.

@Ask. I agree that the best way to handle this is to go all Python unicode internally. So we'll decode at parse time en possibly encode back at export time. Regarding how to determine the character encoding I'll give that a little more thought. I would like it to remain independent of pyg3t for essential functions, so my initial thoughts is to:
1: Look for the magic character encoding words from the po-files in the first chunk.
2: If that fails I think I have read that there is a character encoding guessing lib that might be used as a fall back

In both cases do the read and re-read with correct encoding trick from the parser.

But actually. Since we have just determined that podiff's will always contain a header and that the program is designed to work on podiffs and po-files 1 really should cover it.

Regards Kenneth

Revision history for this message

TLE (k-nielsen81) wrote on 2012-02-15:

Byrial, can you provide a test case file for this (preferably a podiff). It has been some time since I have encountered a file in an encoding different from UTF-8.

Revision history for this message

Byrial Jensen (byrial-t) wrote on 2012-02-15:

po diff file with ISO8859-1 encoding Edit (1.5 KiB, text/plain)

I attach a diff file produced by podiff with ISO8859-1 encoding.

Revision history for this message

TLE (k-nielsen81) wrote on 2012-02-15:

Thanks.

Revision history for this message

TLE (k-nielsen81) wrote on 2012-04-02:

Note to self. Missing:
Handle char set warnings on save in poproofread_gtk and uncomment return statements in __detect_character_encoding

Changed in poproofread:
status:	Confirmed → In Progress

Revision history for this message

TLE (k-nielsen81) wrote on 2012-04-10:

Note to self. Missing:
Trim down the codec list with invalid codecs, add comment about trying to save in the dialog and test.

TLE (k-nielsen81) on 2012-04-28

Changed in poproofread:
status:	In Progress → Fix Committed

Revision history for this message

TLE (k-nielsen81) wrote on 2012-04-28:

Fixed with revision 92

TLE (k-nielsen81) on 2017-10-25

Changed in poproofread:
status:	Fix Committed → Won't Fix

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

po diff file with ISO8859-1 encoding Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.