[136 new] Expansion: (ả, ẻ, ủ, ẽ, ớ, ờ) for Vietnamese in Latin Extended Additional

Bug #656690 reported by Lê Hoàng Phương
306
This bug affects 66 people
Affects Status Importance Assigned to Milestone
Ubuntu Font Family
Medium
Unassigned

Bug Description

The Vietnamese alphabet is nearly like the Latin alphabet, but Vietnamese has tone mark such as grave accent (à ,è, ù,...), accute accent (á, é, ú,...), falling accent (ả, ẻ, ủ,...), rising accent (ã, ẽ, ũ,..), drop tone (ạ, ẹ, ụ,...), and some differently special letters like ă, â, ơ, ư,... These letters are not displayed correctly now, esspecially the special letters with tone mark. If the font size is chosen smaller than 8 pt, these letters still remain big while the other Latin-only letters are small. In the other hand, the more it's bigger, the more the special letters with tone mark are incorrectly displayed.
(Sorry for my bad English.)

Paul Sladen (sladen)
tags: added: uff-latin uff-latin-extended-additional uff-vietnamese
Revision history for this message
Paul Sladen (sladen) wrote :

Hello Phương, your English is wonderful! The characters (ả, ẻ, ủ, ẽ, ớ, ờ, ...) that you have mentioned as rendering differently, being drawn from another font. These characters are in the "Latin Extended Additional" block which the Ubuntu Font Family does not currently support; you can hopefully find all of the characters them in the following PDF:

  http://unicode.org/charts/PDF/U1E00.pdf

Currently the "Latin Extended Additional" block is scheduled for doing later, and probably for expansion by the community. At the moment the focus is on trying to get Monospace, Arabic and Hebrew in progress. (Making a font takes a *very* long time, and it is not possible to do everything and all languages at once. Instead we need to do one language/script at a time).

Please could you ask other people to vote on this bug report (Click "Affects me too...") if they are also Vietnamese users: this will help to judge which scripts and languages to focus on, in which order. In the mean-time I've milestoned this against the "latin-extended-additional" milestone, which you should be able to keep track of.

It would be very useful if you could give a full (and complete) list of *all* of the characters (with *all* their accents) that are needed to get Vietnamese fully covered, when the time comes.

Changed in ubuntu-font-family:
importance: Undecided → Wishlist
milestone: none → latin-e-a
status: New → Confirmed
summary: - Expansion: Ubuntu font should support more in Vietnamese
+ Expansion: (ả, ẻ, ủ, ẽ, ớ, ờ) for Vietnamese in Latin Extended
+ Additional
Revision history for this message
Lê Hoàng Phương (herophuong93) wrote : Re: Expansion: (ả, ẻ, ủ, ẽ, ớ, ờ) for Vietnamese in Latin Extended Additional

I have attached the vietnamese unicode character chart here.
@Paul Sladen: Thanks for your reply!

Changed in ubuntu-font-family:
assignee: nobody → thesun_tuan (hoang-tuan1600)
Revision history for this message
Paul Sladen (sladen) wrote :

Phương, thank that is an excellent PDF (even if copied from somewhere else, in which case we should probably just link to it).

Looking at the number of glyphs in each category, we should already have everything in the "Latin-1 Supplement" and "Latin Extended B" blocks. But, the majority needed for Vietnamese (90 characters!) are in the "Latin Extended Additional", which is a fairly large chuck of the whole of Extended Additional (256 glyphs).

I'm wondering whether (when we get to doing Vietnamese) we should just try and get the whole of that block filled out at once, instead of just little bits.

Changed in ubuntu-font-family:
assignee: thesun_tuan (hoang-tuan1600) → nobody
Revision history for this message
Lê Hoàng Phương (herophuong93) wrote :

I think the Latin Extended Additional is (may be too) large. If we focus in one language instead (may be Hebrew, Arabic, or Vietnamese, of course), it will be more effective for the end-users to see the changes. Who needs the other characters which they have never seen it before in that Charset?

Revision history for this message
Bruno Maag (bruno-daltonmaag) wrote :

We extending the fonts it's important to do this according to set and agreed standards, and Unicode is definitely the way forward. All future development for the Ubuntu fonts needs to be done according to Unicode blocks, not drip-feeding glyphs, essentially doing patch-work and keeping a perpetual beta-cycle.

It is important to start thinking of the Ubuntu font suite as a product that adheres to specifications and fulfills expectations by users. Remember that these fonts will be used outside the Ubuntu community who use the fonts precisely because they support Unicode blocks, are hinted and have great design quality.

It's also important to remember that so far the Ubuntu community has only experienced the four basic fonts that are shipped today with 10.10. Altogether there will be 13 different font styles, all of which eventually have to have matching character sets. It's a huge piece of work that needs careful co-ordinating and quality control to make this one of the most successful and comprehensive font products in modern times.

Revision history for this message
Paul Sladen (sladen) wrote :

The subsetting code used for the Google Font Directory performs the Vietnamese subsetting by pulling in the range U+1EA0..U+1EF2 (82 codepoints), plus U+20AB from the currency block:

  http://code.google.com/p/googlefontdirectory/source/browse/tools/subset/subset.py

  Lines:
  181 if 'vietnamese' in subset:
  182 result += range(0x1ea0, 0x1ef2) + [0x20ab]

Revision history for this message
Paul Sladen (sladen) wrote :

Phương: the discrepancy between the PDF and the Python subsetting code are the eight glyphs:

  Ỳ ỳ Ỵ ỵ Ỷ ỷ Ỹ ỹ

Are these eight definitely used in Vietnamese?

Revision history for this message
Lê Hoàng Phương (herophuong93) wrote :

Only the second, the fourth, the sixth and the last are used in Vietnamese. Thanks for responding me.

Revision history for this message
Lê Hoàng Phương (herophuong93) wrote :

I am so sorry about the above reply. I forgot to pay attention to the serif font that the website display. All these eight are used in Vietnamese. The first, third, fifth and the seventh are upper-case letter. The others are lower-case letter.
Sorry again about that if it made you confused.

Revision history for this message
Matt Sturgeon (mattsturgeon) wrote :

Wouldn't most of those characters simply be a matter of merging already implemented letters and diacritics?

Revision history for this message
Paul Sladen (sladen) wrote :

Yes, there are 90 of them and they are all pre-composed diacritical compound arrangements of existing glyphs and accents. Vietnamese is quite an extensively used written language and something that I hope we can support as soon as possible as the initial focus around Latin A+B/Greek/Cyrillic/Arabic/Hebrew is finished.

Revision history for this message
Matt Sturgeon (mattsturgeon) wrote :

It would be nice if there was a bazaar branch for people to contribute these simple patches...

Revision history for this message
Paul Sladen (sladen) wrote :

Matt: it would indeed. The problem is the same as a "quick fix" to a PNG or JPEG image. At the moment Bzr has no understanding of structured binary files and is only able to do a wholesale replacement of one blob with the next.

At the moment a manual merge is necessary; but if you can provide a test-case we can look into developing the possibility of creating tools/filters in order to handle .ttf files at a finer granularity.

Revision history for this message
Frédéric Grosshans (fgrosshans) wrote :

Note that the diacritical stacking of Vietnamese is not the standard one, especially in the case involving circumflex accent and grave/accute accent, where the accent are kerned side by side. The y with dot below (ỵ) is also preferred with the dot on the right side. This behaviour might be contradictory with the expectations of some African languages, which also need diacritic stacking (see e.g. http://www.openroad.net.au/languages/african/ife-2.html).

Some URL about that problem :
* http://<email address hidden>/msg14279.html
* http://<email address hidden>/msg14280.html
* http://www.riverland.net.au/~clytie/viet/fonts.html
* There is an allusion to that in the Unicode standard (p257/19 below fig5-7, http://www.unicode.org/versions/Unicode6.0.0/ch05.pdf

Paul Sladen (sladen)
tags: added: uff-later
Revision history for this message
Hồng Quân (ng-hong-quan) wrote :

Please support soon, because Vietnamese is among the top of "most used languages" in Wikipedia (Group 100 000+).

Revision history for this message
Nguyễn Hải Nam (jcisio) wrote :

I want to add to #14 that some fonts do not correctly place the accents. This paper http://unicode.org/charts/PDF/U1E00.pdf (page 2) does not have them correctly (it is http://dev.jcisio.com/snap/20120602074248.png instead of the correct one http://dev.jcisio.com/snap/20120602074323.png: side-by-side, but circumflex accent first).

summary: - Expansion: (ả, ẻ, ủ, ẽ, ớ, ờ) for Vietnamese in Latin Extended
+ [136 new] Expansion: (ả, ẻ, ủ, ẽ, ớ, ờ) for Vietnamese in Latin Extended
Additional
Changed in ubuntu-font-family:
status: Confirmed → Triaged
Changed in ubuntu-font-family:
importance: Wishlist → Medium
tags: added: uff-dm-new
Revision history for this message
Adolfo Jayme (fitojb) wrote :

Implementing this bug will, as a side effect, fix most of bug 1688764 (incomplete Guarani support).

Revision history for this message
Le Minh Hao (anka97) wrote :

I wish Ubuntu font family supported Latin Extended Additional characters for a long time.
Please !

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers