Output corpus position in tokens for hits
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ANNIS |
Triaged
|
Medium
|
Thomas Krause |
Bug Description
Currently each search result shows the document it comes from, but not at what position in the document. This is important for citations in publications (Corpus X, Position Y). It would be nice to have the span of tokens covered by the results (including the context that is being displayed) next to the document name for each hit:
tiger2> doc001: (tokens 102 - 115)
The token numbering should be relative to the current document, starting with 1. For export formats like WEKA, it should be possible to output the token number(s) of each search element (#1, #2 etc.) and the document and corpus as a separate column, much like metadata, e.g.:
"der" & pos="NN" & #1 . #2
"der","
"der","
"der","
...
Changed in annis: | |
status: | New → Triaged |
milestone: | none → 3.0.0 |