Ubuntu
openoffice.org package

Unicode Phoencian block, 1090X, not displayed in correct direction

Bug #459991 reported by Phil Stone on 2009-10-24

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	openoffice.org (Ubuntu)	Expired	Undecided	Unassigned

Bug Description

Binary package hint: openoffice.org

Ubuntu 9.10 added a system font for Phoenician so it displays correctly.

Open Office can display the characters, but should display them like Hebrew, where they advance right-to-left. This is not happening and is a bug.

Note the right-to-left icons do allow whole paragraphs of Phoenician, but like Hebrew the individual words should advance just like Hebrew.

Phil

ProblemType: Bug
Architecture: i386
Date: Sat Oct 24 12:56:25 2009
DistroRelease: Ubuntu 9.10
Package: openoffice.org-core 1:3.1.1-5ubuntu1
ProcEnviron:
LANGUAGE=
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-14.48-generic
SourcePackage: openoffice.org
Uname: Linux 2.6.31-14-generic i686
XsessionErrors: (<unknown>:3577): Gdk-CRITICAL **: gdk_window_get_origin: assertion `GDK_IS_WINDOW (window)' failed

Tags:

Revision history for this message

Phil Stone (philstone) wrote on 2009-10-24:

Dependencies.txt Edit (3.6 KiB, text/plain; charset="utf-8")

Revision history for this message

Chris Cheney (ccheney) wrote on 2009-10-28:

Did you actually put the document into that language setting? If not it is probably set to your default language of US English...

Changed in openoffice.org (Ubuntu):
status:	New → Incomplete

Chris Cheney (ccheney) on 2010-05-13

tags:

added: karmic

Revision history for this message

Chris Cheney (ccheney) wrote on 2010-05-14:

We're closing this bug since it is has been some time with no response from the original reporter. However, if the issue still exists please feel free to reopen with the requested information. Also, if you could, please test against the latest development version of Ubuntu, since this confirms the bug is one we may be able to pass upstream for help.

Changed in openoffice.org (Ubuntu):
status:	Incomplete → Expired

Revision history for this message

Phil Stone (philstone) wrote on 2010-05-14:

Download full text (4.3 KiB)

Let me give a few closing comments for anyone that may find this bug/thread in the future and wonder what happened.

This was one of several bugs I found while investigating the Phoenician unicode block. I was setting out to type set a Bible using Phoenician, since this was the alphabet the Bible was originally written in. I also needed a complete tool chain for dealing with this alphabet. I eventually did get that done, link below. If anyone reading this in the future doesn't have a launchpad account, and needs the Phoenician resources I'll mention below, please use the contact links from the following website/page.

http://www.bibletimepress.com/bibles

This Open Office bug was by far the smallest of the bugs I found, though it was an early one since it was easy to test parts of the needed tool-chain.

Turns out most Java apps cannot handle either, the key one being Eclipse, because apparently nobody respects the surrogate pairs that are used for these code block values. Surrogate pairs were added after the original Java language specification was written. Remember Phoenician is 1090X, a 20 bit value. So Eclipse was full of bugs related to syntax highlighting and editing when any surrogate pair related unicode value is entered. Once these values made it onto a line in a file edited by eclipse the line could no longer be safely edited.

I opened a bug there too, and over the months learned a lot. The problem is so pervasive the Eclipse guys seem to think this will never be solved. I would add that it will never be solved in Java apps because it not in the control of the Java team to fix, they've set surrogate pair standards that nobody follows in practice. Early Java language educational resources get wrong, so to do most Java programmers.

I also found that the use of these values in web browsers is not supported enough for any practical use, especially server side
fonts and the MS .eot file format does not handle, at least not using open source .ttf to .eot conversion tools. It may be that Windows cannot handle anything more than 16 bit unicode, though I don't know for sure. The system for displaying the Unicode value in a box for missing characters does not work above 16 bits.

The default font used for Phoenician in the Unicode standard and thus used in Ubuntu is from the last known historical inscription, about 318 AD, probably the worst choice that could have been made, as this was a language used for 1800 years earlier using a very different, and much better, and much more common, letter form.

Kate, the Kubuntu text editor, could not handle these unicode values either.

Latex, the type setting program, was also unable to handle this range well, though with some unusual, pre-alpha, macro packages designed primarily for typesetting the Koran, it came close.

I also found that this particular code block is missing several very important values, including the most important inter-word separator, so the block itself is defective. Since they only assigned 5bits, or 32 possible values, there isn't room to fix and also include the missing vowels.

I also found that this alphabet was originally bi-directional, boustrophedon. T...

Let me give a few closing comments for anyone that may find this bug/thread in the future and wonder what happened.

This was one of several bugs I found while investigating the Phoenician unicode block.  I was setting out to type set a Bible using Phoenician, since this was the alphabet the Bible was originally written in.  I also needed a complete tool chain for dealing with this alphabet.  I eventually did get that done, link below.  If anyone reading this in the future doesn't have a launchpad account, and needs the Phoenician resources I'll mention below, please use the contact links from the following website/page.

http://www.bibletimepress.com/bibles

This Open Office bug was by far the smallest of the bugs I found, though it was an early one since it was easy to test parts of the needed tool-chain.

Turns out most Java apps cannot handle either, the key one being Eclipse, because apparently nobody respects the surrogate pairs that are used for these code block values.  Surrogate pairs were added after the original Java language specification was written.  Remember Phoenician is 1090X, a 20 bit value.  So Eclipse was full of bugs related to syntax highlighting and editing when any surrogate pair related unicode value is entered.  Once these values made it onto a line in a file edited by eclipse the line could no longer be safely edited.

I opened a bug there too, and over the months learned a lot.  The problem is so pervasive the Eclipse guys seem to think this will never be solved.  I would add that it will never be solved in Java apps because it not in the control of the Java team to fix, they've set surrogate pair standards that nobody follows in practice. Early Java language educational resources get wrong, so to do most Java programmers.

I also found that the use of these values in web browsers is not supported enough for any practical use, especially server side
fonts and the MS .eot file format does not handle, at least not using open source .ttf to .eot conversion tools.  It may be that Windows cannot handle anything more than 16 bit unicode, though I don't know for sure.  The system for displaying the Unicode value in a box for missing characters does not work above 16 bits.

Kate, the Kubuntu text editor, could not handle these unicode values either.

Latex, the type setting program, was also unable to handle this range well, though with some unusual, pre-alpha, macro packages designed primarily for typesetting the Koran, it came close.

I also found that this alphabet was originally bi-directional, boustrophedon.  There is essentially no support, anywhere, for that.  But, that also suggested that many problems would be solved using it left-to-right in most cases.  This turned out to solve a bunch of problems, including the ease of learning the language.

My fix was to rebuild the Phoenician code page at 0xEF00, within the 16 bit Unicode private use area, left-to-right, with better choices of letter placements, including the inter-word space character and Phoenician vowels.

I have the X keyboard files needed for this block, various fonts and related tools should someone need to type them, or display them.  The keyboard layout is designed for English language touch typists, and can be learned in an hour.

Open Office does work fine with this solution, as does eclipse, as does Latex (XeTeX), as does Kate, even KMail, and the .ttf to .eot format converters also work, so too all the browsers that someone might be using.  (Though it is still not easy to style.)

This solution risks collisions with other private use area code blocks.  That has not been a problem in practice.

Thanks to everyone who looked at this hard problem.  It was the tip of an iceberg.

Phil

Revision history for this message

Chris Cheney (ccheney) wrote on 2010-05-14:

Phil,

Thanks for the information. I am uncertain but perhaps the graphite library may be working towards supporting these types of languages?

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&cat_id=RenderingGraphite

If so then it is supported in OpenOffice.org but I doubt there is font support for the language you are interested in yet.

Chris

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Dependencies.txt Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntuopenoffice.org package

Unicode Phoencian block, 1090X, not displayed in correct direction

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
openoffice.org package