wordlist2dawg segfaults on a small wordlist

Bug #435997 reported by Neskie Manuel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tesseract (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

wordlist2dawg segfaults on even a small list. This list used is here.

ell
Hello
hello
Kukstsemc
qelmúcw
Secwepemc
te
t̓e
The
the
weytk

The culprit is in training/wordlist2dawg.cpp. I think it has something to do with the memory allocation, that looks insane.

    EDGE_ARRAY dawg;
    inT32 max_num_edges = 100000000;
    inT32 reserved_edges = 1000000;

    dawg = (EDGE_ARRAY) Emalloc(sizeof (EDGE_RECORD) * max_num_edges);

It's not dynamically increased to what's needed. wordlist2dawg works out of svn, and it looks like they fixed it.

I took off two zeros from max_num_edges and reserved_edges and put it in /usr/local/bin

I dont know if this is fixed in karmic.

Tags: ocr
Revision history for this message
Jeff Breidenbach (jeff-jab) wrote :

suspect fixed due to shipping newer code

Changed in tesseract (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.