Cuneiform for Linux

Bug #388926
Comment #5

Comment 5 for bug 388926

Revision history for this message

Ben Jackson (ben.jackson) wrote on 2009-07-06:

tells spelart about a new letter transpose pair for lithuanian Edit (140 bytes, chemical/x-mopac-input)

Assuming there is a Lithuanian dictionary (or you create a 'user dictionary' which I am almost done adding support for) then I believe the key to making this work is to create a suitable datafiles/rec9lit.dat entry which tells spelart.c that "ų" and "ę" are sometimes confused for each other. This is the same list that knows that 'rn' looks like 'm' and 'vv' looks like 'w'. The rec9lit.dat is just a copy of the default English one.

Here are some notes as I investigate.

0) I don't understand the source image: It seems to be a screenshot of a web browser showing the *bad* output? It's not the thing to OCR, is it??

1) rec6lit.dat defines the Lithuanian alphabet (the char to BYTE mapping, essentially). (6 is alphabet files, lit is the abbreviation for Lithuanian)

2) based on the contents of rec6lit.dat and *no* knowledge of Lithuanian at all my conclusion is that the charset of that file is cp1257. (that's consistent with mentions of 1257 in the code) (this picture was useful: http://www.borgendale.com/codepage/cp1257.gif )

3) ...in fact, all of the internal string representations of BYTE seem to be cp1257

4) (there's a bug in InitializeAlphabet where it uses a global instead of the passed in arg, which was breaking my dictionary builder! does not need to be fixed directly for this problem, though)

Ok, I have successfully made a modified rec9lit.dat and attached it (to the bug). It tells the spelling code about your pair of letters. This will cause it to try both variations against the stock dictionary and any user dictionaries. I can see it is trying both even for your jpg (which has the wrong letter, if I understand correctly). I don't know if the dictionary that comes with cuneiform knows the words you are having trouble with. If not, you will need my user dictionary support as well. I'm still waiting for email about that to appear on the list :(