feature request/bug - upgrade to UTF8 needed

Bug #513702 reported by Teffania
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canon Lore
Fix Released
Low
Paul Harrison
Gratian
New
Low
Unassigned

Bug Description

*from old bugbase*
"Hróđny Aradóttir" name unable to be reproduced by gratian as "đ" not in standard ANSI character set, only UTF8.

This is the only person listed in canonlore who is currently known to be affected by this limitation.

workarounds tried: entering "Hróony Aradóttir?" produces the characters as seen here.

*end old report*

update:
still needed. Still only applies to one person - that has reported it anyway. Jan 2010

Revision history for this message
Paul Harrison (paul-francis-harrison) wrote :

The database tables store text in latin1, I think. MySQL may default to UTF8 for communication. Don't know what other parts of the system use.

At what point is the đ lost?: Can you enter it in Gratian? When you navigate away from the person then go back to them, is it still there? Is it there the next time you log into Gratian? What does it turn into in Gratian and CanonLore?

Revision history for this message
Teffania (teffania) wrote : Re: [Bug 513702] Re: feature request/bug - upgrade to UTF8 needed

On 16 February 2010 12:14, pfh <email address hidden> wrote:
> The database tables store text in latin1, I think. MySQL may default to
> UTF8 for communication. Don't know what other parts of the system use.
>
> At what point is the đ lost?: Can you enter it in Gratian? When you
> navigate away from the person then go back to them, is it still there?
> Is it there the next time you log into Gratian? What does it turn into
> in Gratian and CanonLore?

forgot to check last night. will try to do so tonight.
The UTF8 is based on Bat saying this is what was needed to make this work.

And yes I'm pretty sure the database is in somethign like UTF1 or
Latin 1 currently.

Tiff

--
. ___
 {o,o} The blog you are not looking for
 |)__) is definitely not at
 -"-"- http://teffania.blogspot.com

Revision history for this message
Teffania (teffania) wrote :

when pasting "đ" into gratian, it appears as "?". I see no point in further tests as it seems to be lost at this stage (unenterable).

So I guess this bug might need moving to gratian bugs?

Changed in canonlore:
assignee: nobody → pfh (paul-francis-harrison)
Revision history for this message
Paul Harrison (paul-francis-harrison) wrote :

Looks like there are problems both in CanonLore and Gratian preventing this from working.

The problem in CanonLore is that the text in the database is stored in latin1, this would be relatively easy to fix. The PHP code in CanonLore gets results from MySQL in utf-8, and for the most part treats them as opaque blobs of bytes, which is fine.

I don't know how hard it would be to fix Gratian, but it might well be a huge project.

latin1 should be pretty good for European names. If it's just one person, I would suggest not fixing this.

Revision history for this message
Eric TF Bat (bat-flurf) wrote :

The database is actually already stored in utf-8, not latin-1, so that's not a problem.

Delphi 6, which is the version Gratian is written in, doesn't have UTF-8 support at all, so that's the real trouble.

I probably need to adjust it to allow the entry of HTML entities (in this case, &#x0111;, ie ampersand-hash-x-zero-one-one-one-semicolon) with perhaps a lookup table indicating which ones aren't editable in Gratian directly.

Revision history for this message
Teffania (teffania) wrote :

Perhaps I should ask shambles if anyone else is affected. (not so
obviously, but ask them if their special characters are entered in
canonlore).

I know it manages most special characters, and yet a lot of names are
sent to me without special characters, just their nearest plain text
equivalent.

So if I do that and don't manage to add anyone extra to the list of
people affected, then we consider it a very low priority issue
perhaps?

I'm not willing to send out an email to the list this week because
every time I ask I get a flood of fix me messages, and my inbox is
still full, but lets schedule me to send out an email in a fortnight,
and then get back to this bug status then.

Teffania

Revision history for this message
Teffania (teffania) wrote :

I did ask on shambles, and got half a dozen responses, but no others that I couldn't enter. And Snorri helpfuly suggested that hrodny actually should be using ð (which is supported) instead of đ (which isn't supported). I need to follow that up, but at least she'll have at worst a close aproximation.

I now have a new instance of a character which I presume is supported by gratian and not by canonlore.
person #2953 "Cinara Baraceco" should actually be written "Çinara Baraceco". Gratian ius happyu to represent this special character.
Uploading changes with this special character causes ALL person records to display the message:
"Unknown initial "�" (0xc7) in name "�inara Baraceco". Died"
(as a plain html page with only that test in the first line)

For now this persons name has been reverted to the plain text equivalent. Uploading plain text version of name restored usual functionality of canonlore.

Revision history for this message
Teffania (teffania) wrote :

(obviously this behaviour is rather more annoying if anyone else reproduces it accidentally than just not being able to enter a character)

Revision history for this message
Paul Harrison (paul-francis-harrison) wrote :

Ç is causing a slightly different problem to đ. Ç is a latin-1 character, can be stored in the database, etc, it's just the code for grouping by initial that is causing problems. This is relatively easy to fix.

Revision history for this message
Teffania (teffania) wrote :

Ah, so it wouldn't be a problem if it wasn't at the start of the name?

If it's easy to fix, please let me bribe you to do so.

T

On 6 May 2010 12:49, Paul Harrison <email address hidden> wrote:
> Ç is causing a slightly different problem to đ. Ç is a latin-1
> character, can be stored in the database, etc, it's just the code for
> grouping by initial that is causing problems. This is relatively easy to
> fix.
>
> --
> feature request/bug - upgrade to UTF8 needed
> https://bugs.launchpad.net/bugs/513702
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Canon Lore SCA Awards Database Website: New
>
> Bug description:
> *from old bugbase*
> "Hróđny Aradóttir" name unable to be reproduced by gratian as "đ" not in standard ANSI character set, only UTF8.
>
> This is the only person listed in canonlore who is currently known to be affected by this limitation.
>
> workarounds tried: entering "Hró&#0111;ny Aradóttir?" produces the characters as seen here.
>
> *end old report*
>
> update:
> still needed. Still only applies to one person - that has reported it anyway. Jan 2010
>
> To unsubscribe from this bug, go to:
> https://bugs.launchpad.net/canonlore/+bug/513702/+subscribe
>

--
. ___
 {o,o} The blog you are not looking for
 |)__) is definitely not at
 -"-"- http://teffania.blogspot.com

Revision history for this message
Paul Harrison (paul-francis-harrison) wrote :

Committed a fix for Ç, also fixes the sorting order of names. Now to work out how to upload it to sca.org.au...

Revision history for this message
Teffania (teffania) wrote :

So yesterday it was hapilyy displaying Cinara.

Today I upload post new gratian release and it's back to Ç crashing the whole thing.

Revision history for this message
Teffania (teffania) wrote :

Oh, and found a character that we need that Gratian doesn't have - the ogonek, specifically the one on o: ǫ
http://en.wikipedia.org/wiki/Ogonek

Two people Bjorn and annother found who actually have this in their registered names. They haven't compleained , I just noticed it in registrations.

A suggestion, this doccument probably describes a fair limit of registerable characters:
http://www.scadian.net/heraldry/daud.html

Revision history for this message
Teffania (teffania) wrote :

A better version of Daud Text: http://heraldry.sca.org/laurel/daud_notation.pdf

A new problem character, the turkish dotless i. http://en.wikipedia.org/wiki/Turkish_dotted_and_dotless_I
As displayed in our current queen's byname: Leylii bint Hızır

Can't past the ı into gratian - pastes as "?"
Pasting &#x0131; (unicode 131) in place of character works in canonlore!

Bjǫrn Svartsson also works when using Bj&#491;rn Svartsson.
Looks like I got something worng last time when attempting this perhaps, or maybe some of hte fixes have made things bettter.

We have a workable workaround. But the next canon is going to wonder what on earth &#491; that means.

Revision history for this message
Teffania (teffania) wrote :

Also the ǫ in the canonlore font is pretty disproportionate (not like in this font). Not elegant. But a small price to pay to have it display at all.

Revision history for this message
Teffania (teffania) wrote :

Importance low on upgrade now that have workaround.

Importance of getting Paul's great programming working again - high!

Changed in canonlore:
importance: Undecided → Low
Revision history for this message
Paul Harrison (paul-francis-harrison) wrote :

When you say it's back to Ç crashing the whole thing, what exactly happened?

Revision history for this message
Teffania (teffania) wrote :

It returned to the behaviour experinced before you applied your fix.
I presume this was related to Bat's comments regarding code branches
that need to be integrated, and means your code branch might have been
put asside until integration. I'm hoping Bat will answer if his is
true of not.

Revision history for this message
Paul Harrison (paul-francis-harrison) wrote :

My changes are still there on sca.org.au (it's still sorting correctly). I don't see any recent commits to branches other than the main branch (lp:~gratian-team/canonlore/trunk). ...ah... Could you have been looking at your local copy of the website?

Revision history for this message
Teffania (teffania) wrote :

I could have been looking at localhost, but I think I'm normally
smarter than that.

testing the phenomenon again....

All working today. Also good on localhost.

Either someone was fidling behind our backs or something wierd was
happening. So long as it stays good from now on I'm happy.

Revision history for this message
Teffania (teffania) wrote :

So Bug status:

1) problems with Çinara Baraceco in canonlore - ALL FIXED!

2) problems with displaying stranger special characters - clutsy workaround found, in long term wish fix applied. Issue now probably only exists in gratian, not canonlore.

Can someone assing this to a gratian bug - I can't work out how/don't have enough authority.

Revision history for this message
Teffania (teffania) wrote :

ok, found how to assign it to gratian.

Paul, you can assign this fixed in canonlore if you think it'll display anything gratian accepts.

Changed in gratian:
importance: Undecided → Low
Changed in canonlore:
status: New → Fix Released
Revision history for this message
Teffania (teffania) wrote :

A few special characters not displaying correctly on my work copy of IE6 (yes IE6, sigh).

specifically, ones mentioned above as non-ansi/delphi/gratian compliant:
Bjorn Svartsson http://www.sca.org.au/canon/person.php?id=2371
Displays a box instead of the ogneck (o) character.

but Leylii is displaying correctly:
http://www.sca.org.au/canon/person.php?id=2921

and a few others that really are illogical:
Tristana RAEvenlock won't display:
http://www.sca.org.au/canon/person.php?id=2131
(displays a ? instead of hte AE ligature)
but AEdward will:
http://www.sca.org.au/canon/person.php?id=773
as do the other other half dozen people with names starting with AE
and AliAEka with it in hte middle of her name:
http://www.sca.org.au/canon/person.php?id=2992
or AE testcase:
http://www.sca.org.au/canon/person.php?id=2994

also not displaying:
Martin of Lyos
http://www.sca.org.au/canon/person.php?id=202
(displays a ? instead of an f - well I presume it should be an f)

Teffania to confim this occurs on other browsers when she gets home, and that they are entered into gratian correctly. If this only affects IE6, this is not prefereable, but IE6 is obsolete. If special characters have been lost in some upgrade, that is more problematic.

Revision history for this message
Paul Harrison (paul-francis-harrison) wrote :

Tristina also does not display on Firefox 3.6.10 ( http://www.sca.org.au/canon/person.php?id=2131 ). Try editing her entry, deleting the odd character and entering it again.

Might it be "Martin o'Lyos"? Microsoft Windows has a habit of using a non-standard apostrophe sometimes, might not have survived being translated through latin-1 or something.

I propose not fixing character display issues specific to IE6.

Revision history for this message
Teffania (teffania) wrote :

On 11 October 2010 15:52, Paul Harrison <email address hidden> wrote:
> Might it be "Martin o'Lyos"? Microsoft Windows has a habit of using a
> non-standard apostrophe sometimes, might not have survived being
> translated through latin-1 or something.

Hmm, it might be. Nice latteral thinking.
Checking what gratian says on these records requires me to be home,
where the browser isn't IE6, so this is for the moment a note for me
to check.

> I propose not fixing character display issues specific to IE6.

That was what I was kinda saying without expressing myself well - that
if it affected other browsers it might be an issue, but IE6 being
obsolete, any problems that are only located in IE6 should be noted
for prosterity (and in case other browsers are later found to have the
issue) and not fixed given availability of programmers.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.