EntityTagger keeps entities from previous sentences.

Bug #499357 reported by Jim White
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
RelEx
Fix Released
Medium
linas

Bug Description

I enable GATE and use "-g --g-post" options then parse the following two sentences in sequence:

Sue received a message from Joe.
Joe gave Sue a message.

That throws a SIOOB exception:

     [java] Error: Failed to process sentence: Joe gave Sue a message.
     [java] java.lang.StringIndexOutOfBoundsException: String index out of range: 28
     [java] at java.lang.String.substring(String.java:1765)
     [java] at relex.entity.EntityMaintainer.createConvertedSentence(EntityMaintainer.java:125)
     [java] at relex.entity.EntityMaintainer.convertSentence(EntityMaintainer.java:249)
     [java] at relex.MyRelationExtractor.parseSentence(MyRelationExtractor.java:309)
     [java] at relex.MyRelationExtractor.processSentence(MyRelationExtractor.java:237)
     [java] at relex.MyRelationExtractor.main(MyRelationExtractor.java:616)
     [java] RelEx processing: 14 milliseconds (avg=344 millisecs, cnt=6)
     [java] Exception in thread "main" java.lang.NullPointerException
     [java] at relex.MyRelationExtractor.main(MyRelationExtractor.java:625)

Revision history for this message
Jim White (james-paul-white) wrote :

The problem is that EntityTagger.orderedEntityInfos has entries left in it from previous sentences. If there is an entity that had an index beyond the end of the current sentence then we get the SIOOB exception.

I tried to sort out the exact situation and tried using a new EntityTagger but that didn't work.

So I just added a reset call to wipe those entries out. I doubt this is the correct fix, but it got me working again.

Revision history for this message
Jim White (james-paul-white) wrote :

There's more to this because GateEntityDetector has a member basic that is also an EntityTagger and *it* can get in trouble from saving the previous sentence too. So I do the reset thing on it too, still wondering what the right solution is.

I notice that GateEntityDetector has a member doc that also holds information from the previous sentence, so I reset that as well for good measure.

Hmm, patch a little funky because GateEntityDetector.java has CRLF line endings.

Revision history for this message
Jim White (james-paul-white) wrote :

Oh, the two sentence sequence that caused the exception for #2 was:

(not sure if these should be seen as the same thematic role, although they could be).
post-prep_on vs post-prep_to.

Obviously not proper sentences (they were notes in the test corpus), but helpful here in that they illuminate a problem.

Revision history for this message
linas (linasvepstas) wrote :

FYI, one need only use -g or --g-post, it doesn't really make sense to use both.

The bug was introduced during a recent restructuring of the entity detector interfaces.

Changed in relex:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
linas (linasvepstas) wrote :

I've applied your patches and pushed to the main repo. There was one more bug leading to a crash that I also had to fix for your example.

Changed in relex:
status: Confirmed → Fix Committed
assignee: nobody → linas (linasvepstas)
linas (linasvepstas)
Changed in relex:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.