Glib::ustring mapping fails with non ascii-7 characters

Bug #1006819 reported by Michael Brown
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
PyBindGen
Fix Released
Undecided
Unassigned

Bug Description

This is for two reasons (self inflicted unfortunately):

1) The parse tuple format strings "s#" gives a length in bytes, but Glib::ustring takes a length as a count of characters.

2) "s#" also converts unicoode to the local encoding in 2.x of python, which results in high valued characters that Glib::ustring cannot take.

3) When converting from Glib::ustring to a python string, the size method returns a length in characters, but python expects a length in bytes.

The solution is:

1) Stop taking a length output from the parse tuple method, and just pass a pointer to a zero terminated character string to the Glib::ustring constructor, letting it detect the length internally. In most bindings it is not acceptable to pass strings with embedded zeros, and the parse method will throw a meaningful exception if such a string is parsed. So I think that this is a practical and acceptable solution.

2) Use the format string "et" which will
    a) Convert unicode to utf-8
    b) or leave an 8-bit string as is, assuming it to be utf-8

3) Use the 'bytes' method of Glib::ustring to obtain a length in bytes, not characters.

I am attaching a patch that applies the above fixes: We are already using this with success.

Tags: glib ustring

Related branches

Revision history for this message
Michael Brown (mbrown-7) wrote :
Revision history for this message
Gustavo Carneiro (gjc) wrote :

Pushed; thanks for the patch.

Changed in pybindgen:
status: New → Fix Committed
Gustavo Carneiro (gjc)
Changed in pybindgen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.