Wrong Line length for lines with Unicode characters.

Bug #889648 reported by Adi Roiban
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
pocket-lint
Fix Released
High
Adi Roiban

Bug Description

Here is an example for an unicode line, that when split it is encoded

>>> initial_line = u'mâț mițișor:x:2000:2010:Mâț Mițișor,,,,:/home/mâț mițișor:/bin/bash\n'
>>> line_encoded = "u'm\xc3\xa2\xc8\x9b mi\xc8\x9bi\xc8\x99or:x:2000:2010:M\xc3\xa2\xc8\x9b Mi\xc8\x9bi\xc8\x99or,,,,:/home/m\xc3\xa2\xc8\x9b mi\xc8\x9bi\xc8\x99or:/bin/bash\\n'"

>>> len(line_encoded)
84
>>> len(line_encoded.decode('utf-8'))
72

----

I am not sure if always converting long lines to Unicode will solve all problems.

Maybe we can change pocket-lint to be smart and convert only if the Python headers contains an unicode declaration.

I will attach a branch with a naive fix and the required tests.

Please let me know how do you think this problem should be solved.

Cheers,
Adi

Related branches

Curtis Hovey (sinzui)
Changed in pocket-lint:
milestone: none → future
status: New → Fix Committed
importance: Undecided → High
assignee: nobody → Adi Roiban (adiroiban)
Curtis Hovey (sinzui)
Changed in pocket-lint:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.