Multibyte UTF-8 characters in a TinyMCE text field, turned into question marks by embedded image processing
Bug #1639635 reported by
Aaron Wells
This bug report is a duplicate of:
Bug #1582778: Multibyte UTF8 characters (e.g. Japanese text) jumbled when submitted with image.
Edit
Remove
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Mahara |
Fix Released
|
Medium
|
Aaron Wells |
Bug Description
As reported on the forum: https:/
A user reported that Greek characters with accent marks (ό έ ά) get turned into question marks (? ? ?) if they're in a TinyMCE text box, and the user embeds images into that text box. It only happens when images are involved, which suggests it's a bug in the TinyMCE embedded image processor, possibly from us failing to use multibyte string functions.
summary: |
- Two-part UTF-8 characters in a TinyMCE text field, turned into question + Multibyte UTF-8 characters in a TinyMCE text field, turned into question marks by embedded image processing |
To post a comment you must log in.
The user shared with us a Mahara page showing the affected text. Looking at it in a hex viewer, I can see that the affected Greek letters are two-byte UTF-8 characters, in the parts where they're displaying correctly, and one-byte "?" characters where they're displaying incorrectly.
So that actually suggests it may not just be something like these characters getting split by an unfortunate str_replace(), but that we've got some text processing function that is filtering them out.
ά : ce ac ( http:// www.mclean. net.nz/ ucf/?c= U+03AC ) www.mclean. net.nz/ ucf/?c= U+03AD ) www.mclean. net.nz/ ucf/?c= U+03CC )
έ : ce ad ( http://
ό : cf 8c ( http://
? : 3f ( http:// www.mclean. net.nz/ ucf/?c= U+003F )