Output corpus position in tokens for hits

Bug #1006409 reported by Amir Zeldes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ANNIS
Triaged
Medium
Thomas Krause

Bug Description

Currently each search result shows the document it comes from, but not at what position in the document. This is important for citations in publications (Corpus X, Position Y). It would be nice to have the span of tokens covered by the results (including the context that is being displayed) next to the document name for each hit:

tiger2> doc001: (tokens 102 - 115)

The token numbering should be relative to the current document, starting with 1. For export formats like WEKA, it should be possible to output the token number(s) of each search element (#1, #2 etc.) and the document and corpus as a separate column, much like metadata, e.g.:

"der" & pos="NN" & #1 . #2

"der","ART",1,"Hund","NN",2,"doc10","tiger2"
"der","ART",23,"Mann","NN",24,"doc10","tiger2"
"der","ART",15,"Vogel","NN",16,"doc12","tiger2"
...

Tags: feature
Thomas Krause (krause)
Changed in annis:
status: New → Triaged
milestone: none → 3.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.