Glib::ustring mapping fails with non ascii-7 characters
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
PyBindGen |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
This is for two reasons (self inflicted unfortunately):
1) The parse tuple format strings "s#" gives a length in bytes, but Glib::ustring takes a length as a count of characters.
2) "s#" also converts unicoode to the local encoding in 2.x of python, which results in high valued characters that Glib::ustring cannot take.
3) When converting from Glib::ustring to a python string, the size method returns a length in characters, but python expects a length in bytes.
The solution is:
1) Stop taking a length output from the parse tuple method, and just pass a pointer to a zero terminated character string to the Glib::ustring constructor, letting it detect the length internally. In most bindings it is not acceptable to pass strings with embedded zeros, and the parse method will throw a meaningful exception if such a string is parsed. So I think that this is a practical and acceptable solution.
2) Use the format string "et" which will
a) Convert unicode to utf-8
b) or leave an 8-bit string as is, assuming it to be utf-8
3) Use the 'bytes' method of Glib::ustring to obtain a length in bytes, not characters.
I am attaching a patch that applies the above fixes: We are already using this with success.
Related branches
Changed in pybindgen: | |
status: | Fix Committed → Fix Released |
Pushed; thanks for the patch.