Mahara

Multibyte UTF-8 characters in a TinyMCE text field, turned into question marks by embedded image processing

Bug #1639635 reported by Aaron Wells on 2016-11-06

This bug report is a duplicate of: Bug #1582778: Multibyte UTF8 characters (e.g. Japanese text) jumbled when submitted with image. Edit Remove

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Mahara	Fix Released	Medium	Aaron Wells

Bug Description

As reported on the forum: https://mahara.org/interaction/forum/topic.php?id=7723&post=31207

A user reported that Greek characters with accent marks (ό έ ά) get turned into question marks (? ? ?) if they're in a TinyMCE text box, and the user embeds images into that text box. It only happens when images are involved, which suggests it's a bug in the TinyMCE embedded image processor, possibly from us failing to use multibyte string functions.

Tags:

Revision history for this message

Aaron Wells (u-aaronw) wrote on 2016-11-06:

imagetest - ATS2020.html Edit (19.5 KiB, text/html)

The user shared with us a Mahara page showing the affected text. Looking at it in a hex viewer, I can see that the affected Greek letters are two-byte UTF-8 characters, in the parts where they're displaying correctly, and one-byte "?" characters where they're displaying incorrectly.

So that actually suggests it may not just be something like these characters getting split by an unfortunate str_replace(), but that we've got some text processing function that is filtering them out.

ά : ce ac ( http://www.mclean.net.nz/ucf/?c=U+03AC )
έ : ce ad ( http://www.mclean.net.nz/ucf/?c=U+03AD )
ό : cf 8c ( http://www.mclean.net.nz/ucf/?c=U+03CC )

? : 3f ( http://www.mclean.net.nz/ucf/?c=U+003F )

Revision history for this message

Aaron Wells (u-aaronw) wrote on 2016-11-06:

Okay, it looks like this bug is a duplicate of Bug 1582778. It was fixed 15.10.4, 16.04.1, and 16.10.0, and can't be reproduced in any of those releases.

The bug reported described their environment as:

OS: Ubuntu 14.04
Apache: 2.4.7
PGSQL: 9.3.14
PHP: 5.5.52
Mahara: 15.10.1

So that explains why they are still seeing this issue.

Changed in mahara:
milestone:	16.10.1 → none
status:	In Progress → Fix Released

Aaron Wells (u-aaronw) on 2016-11-06

summary:

- Two-part UTF-8 characters in a TinyMCE text field, turned into question
+ Multibyte UTF-8 characters in a TinyMCE text field, turned into question
marks by embedded image processing

Report a bug

This report contains Public information

Everyone can see this information.

Duplicate of bug #1582778 Remove

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

imagetest - ATS2020.html Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.