Selection works incorrectly with 4-byte Unicode characters

Bug #1422445 reported by grofaty on 2015-02-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Pinta
Medium
Unassigned

Bug Description

This bug is extended test for reported bug 1361560. I intentionally reported new bug to have in depth Unicode (UTF-8) tests.

=======
THEORY
=======
First little bit of theory. Characters in UTF-8 code page can be encoded in one to four bytes.

There are English upper/lower letters, punctuation (.,;!?), numbers and similar which are encoded with one byte. This are first 128 bits (from byte 0 to 127) from character table. First 128 characters are referred as ASCII characters.

All other characters for non-English text (e.g. Cyrillic) are encoded with characters from 128 to 255 bits. But to encode such a characters two, three or four bytes above 127 bits are required.

There are some languages that characters are encoded with two bytes like Cyrillic (e.g. Russia characters) in two bytes.

There are some languages (specially Asian characters like Chinese, Japan, all sort of Indian writtings...) in three bytes.

There are some characters (for example old writings like Gothic, old Egyptian Hieroglyphs and similar) in four bytes.

=======
TESTS
=======

For all bellow tests do the following:
a) Click on Text tool and click on canvas.
b) Paste text right of "Characters".
c) Press Shift key and hold and then Left-arrow-key to mark a text.
d) Release Shift and press Home key to move cursor to the most left.
e) Press Shift key and hold and then Right-arrow-key to mark a text.

For actions c) and e) one press (Left or Right) button and one character should get selected.

Bellow are four tests. UTF-8 code page have 1 to 4 bytes per single character. I did one test for each of bytes.

Test 1: English (single byte character in UTF-8)
Characters: ABC
Result: OK

Test 2: Cyrillic (two byte characters in UTF-8)
Characters: АБВГ
Result: OK

Test 3: Telugu (three byte characters in UTF-8)
Characters: ధಷಹ
Result: OK

Test 4: Gothic (four byte characters in UTF-8)
Characters: 𐌸𐌼𐌰
Result: Not working correctly. It is required to press two Left keys (step c) or two Right keys (step e) to select single character.

Test 5: All above characters in one text:
Characters: ABCАБВГధಷಹ𐌸𐌼𐌰
Result: Same as test 4. 4-byte characters not selected correctly, two Left/Right press needed to select single character.

As the results show 4-bytes characters are not marked correctly. Because 4-bytes characters are characters for some old writings this bug is no need to have like High priority, most probably enough is Low priority. But if you know where the problem may be, then this bug can be fixed.

grofaty (grofaty) wrote :

Tests above were performed with latest zip file version 1.6.0.58 on Ubuntu 14.04.

grofaty (grofaty) on 2015-02-16
description: updated
tags: added: text-tool
description: updated
Changed in pinta:
importance: Undecided → Medium
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers