Lemmas as annotations

Bug #426937 reported by Yin Liu
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Glyphicus
Triaged
Wishlist
Xiaonuo Gantan

Bug Description

Make lemmas into annotations so that we can annotate them.

Changed in glyphicus:
importance: Undecided → Wishlist
status: New → Triaged
Revision history for this message
Jefficus (jeff-smithicus) wrote :

This one is going to involve some coordination and planning, so Nikolai and Jeff should get together to map out how we're going to do this without breaking the universe.

Changed in glyphicus:
assignee: nobody → Nikolai Roman (xiaonuogantan)
Revision history for this message
Xiaonuo Gantan (xiaonuogantan) wrote :

Agreed. This one is going to be tough, but fun :).

Revision history for this message
Jefficus (jeff-smithicus) wrote :
Download full text (3.9 KiB)

We're going to handle this in 5 distinct phases.

Phase 0: Yin will confirm whether the existing lemmas are truly unique. I have already sent her the list.

Phase 1: Create mirror data
We'll create a script that searches for all unique lemma strings and for each one found, creates a new document, of type "lemma". That script will also add a new field to the word table, called "lemma_doc_id", and each word that has a lemma string assigned will then be linked to the lemma-doc version of the lemma via that doc_id field. The title of the lemma doc will be identical to the existing lemma string. At this point, we will have a complete, parallel encoding of lemmas in the system, with the advantage that each unique lemma appears only once, in doc form, but is not yet available through the interface.

The 0 phase can take place at the same time as Phases 1 and 2, since we will not attempt to implement the changes on the live Perceval db until after we've tested the mirroring script and initial user interface thoroughly.

Phase 2: Build replacement lemma GUI tools
The existing lemma editor widget will not be altered. Instead, a new widget will be created that has two modes: one for display when the word under the cursor has an assigned lemma, and one for when the lemma is unassigned.

The "assigned" mode will show the title of the lemma and will contain an "open" button. By clicking on the open button, the lemma doc will be opened in the interface, just like any other document. It is editable. It can have comments/annotations linked to it. It can even have words inside it assigned to lemmas, for whatever that's worth.

The "unassigned" mode will simply say: "No lemma currently assigned," but at this point, there will be no ability create a new lemma.

Phase 3: Implement the Lemma Suggester/Creator
The unassigned lemma display widget will be extended to offer two behaviours.

The first behavior is the "New Title:" field, which will be a text entry field into which the user can type the title for a new lemma. After hitting enter, the text will first be compared to all existing lemma entries, and if there is significant similarity between the new lemma and an existing lemma, the user will be prompted with a list of the similar entries, and asked to either chose one of the existing lemmas to assign, or to go ahead with creation of a new lemma. Whichever route is taken, the identified lemma will be assigned to the active word in the active document.

The second behaviour will be a lemma suggester button. When clicked, the system will take the active word (under the cursor) in the active document and find all lemmas that have been assigned to a word with that spelling (case insensitive). That list of lemmas will then be searched to find all other words in the db that have been assigned to any one of those lemmas. Finally, a list of all lemmas that have been assigned to any of this list of related words, will be presented to the user as a possible lemma for the active word. The presentation list will be in "likelihood" order, which means that it will present lemmas that are bound to the active word/spelling first, in order of frequency, followed by the list...

Read more...

Revision history for this message
Xiaonuo Gantan (xiaonuogantan) wrote :

In Phase 2, is it better to hide the lemmaTextField when there is no lemma assigned to the current word? This behavior is consistent with the existing lemma panel.

Revision history for this message
Yin Liu (yin-liu) wrote : Re: [Bug 426937] Re: Lemmas as annotations

Quoting Nikolai Roman <email address hidden>:

> In Phase 2, is it better to hide the lemmaTextField when there is no
> lemma assigned to the current word? This behavior is consistent with the
> existing lemma panel.

Yes, I think that would be best.

Yin

--
Yin Liu
Department of English
University of Saskatchewan
9 Campus Drive
Saskatoon SK S7N 5A5
Canada
+1 (306) 966-1835
<email address hidden>

Revision history for this message
Xiaonuo Gantan (xiaonuogantan) wrote :

In Phase #3: "if there is significant similarity between the new lemma and an existing lemma, the user will be prompted with a list of the similar entries"
How do we decide the degree of similarity between two lemmas? There're plenty string similarity comparison algorithms at disposal. However, we probably need to decide the definition of similarity and possibly list several examples to show the definition.

Revision history for this message
Yin Liu (yin-liu) wrote :

My gut reaction would be to say that a lemma is similar if it is is
identical up to the first breaking space. Ordinarily a lemma is just a
word (like 'lemma') but in the Perceval database I have differentiated
between lemmas with part-of-speech abbreviations that form part of the
lemma title (like 'lemma (n)'). Defining similarity in this case as
'identical up to the first breaking space' means that if I have two
lemmas like 'lemma (n)' and 'lemma (adj)', they will be considered
similar, but 'lemma' and 'lemon' won't. It also means that if I suggest
'lemma' as a title, I will get both 'lemma (n)' and 'lemma (adj)' as
similar options. All these behaviours are potentially useful.

Other users might simply use single words as lemmas and enter POS tags
as annotations to lemmas, but that shouldn't be a problem either. It
would be up to the project protocols.

--
Yin Liu
Department of English
University of Saskatchewan
9 Campus Drive
Saskatoon SK S7N 5A5
Canada
+1 (306) 966-1835
<email address hidden>

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.