Multibyte UTF-8 characters in a TinyMCE text field, turned into question marks by embedded image processing

Bug #1639635 reported by Aaron Wells
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mahara
Fix Released
Medium
Aaron Wells

Bug Description

As reported on the forum: https://mahara.org/interaction/forum/topic.php?id=7723&post=31207

A user reported that Greek characters with accent marks (ό έ ά) get turned into question marks (? ? ?) if they're in a TinyMCE text box, and the user embeds images into that text box. It only happens when images are involved, which suggests it's a bug in the TinyMCE embedded image processor, possibly from us failing to use multibyte string functions.

Tags: i18n utf8
Revision history for this message
Aaron Wells (u-aaronw) wrote :

The user shared with us a Mahara page showing the affected text. Looking at it in a hex viewer, I can see that the affected Greek letters are two-byte UTF-8 characters, in the parts where they're displaying correctly, and one-byte "?" characters where they're displaying incorrectly.

So that actually suggests it may not just be something like these characters getting split by an unfortunate str_replace(), but that we've got some text processing function that is filtering them out.

ά : ce ac ( http://www.mclean.net.nz/ucf/?c=U+03AC )
έ : ce ad ( http://www.mclean.net.nz/ucf/?c=U+03AD )
ό : cf 8c ( http://www.mclean.net.nz/ucf/?c=U+03CC )

? : 3f ( http://www.mclean.net.nz/ucf/?c=U+003F )

Revision history for this message
Aaron Wells (u-aaronw) wrote :

Okay, it looks like this bug is a duplicate of Bug 1582778. It was fixed 15.10.4, 16.04.1, and 16.10.0, and can't be reproduced in any of those releases.

The bug reported described their environment as:

OS: Ubuntu 14.04
Apache: 2.4.7
PGSQL: 9.3.14
PHP: 5.5.52
Mahara: 15.10.1

So that explains why they are still seeing this issue.

Changed in mahara:
milestone: 16.10.1 → none
status: In Progress → Fix Released
Aaron Wells (u-aaronw)
summary: - Two-part UTF-8 characters in a TinyMCE text field, turned into question
+ Multibyte UTF-8 characters in a TinyMCE text field, turned into question
marks by embedded image processing
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.