Crash if repeated call init,open,finalrecogn., save, close, done in a loop

Bug #705500 reported by Andreas Romeyke
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cuneiform for Linux
New
Undecided
Unassigned

Bug Description

Hello,

I am using cuneiform-1.0 as base for an XMLRPC based recognition server. I am using the example of cuneiform-cli.cpp as base.

If cuneiform is repeatedly called in a loop, the first iteration will work fine, the second reports an error in PUMA_XFinalRecognition and the third crashes the whole program.

The error lies in cuneiform itself (checked carefully with valgrind and gdb).

Here my code:

// simple_ocr method, first parameter is language, second the bitmap as stream
const std::string simple_ocr::_ocr (std::string language, std::vector<char> stream) {
 int dotmatrix=FALSE;
 int fax=FALSE;
 int onecolumn=TRUE; //FALSE;
 // create the return string
 std::string recognized_text ="";
 try {
  int bpp = 24;
  uinT8 * pixels=NULL;
  inT32 x = 0;
  inT32 y = 0;
  bmpstream2raw(&x, &y, stream, &pixels);
  int32_t dibsize=(40+x*y) * sizeof(uinT8);
  uinT8 * dib = new uinT8[dibsize];
  if (pixels == NULL) throw (1);
  if (dib == NULL) throw (1);
  *((int32_t *)dib) = 40;
  *((int32_t *) (dib+4)) = x; // width
  *((int32_t *) (dib+8)) = y; // height
  *((int16_t *) (dib+12)) = 1; // biplanes
  *((int16_t *) (dib+14)) = bpp; // bpp
  *((int32_t *) (dib+16)) = 0; // bi_rgb
  *((int32_t *) (dib+20)) = 0; //bi size image
  *((int32_t *) (dib+24)) = 0; // pixels per meter
  *((int32_t *) (dib+28)) = 0; // pixels per meter
  *((int32_t *) (dib+32)) = 0; // clr used
  *((int32_t *) (dib+36)) = 0; // clr important
  if (!PUMA_Init(0,0)) {
   std::cerr << "PUMA_Init failed." << std::endl;
   uint32_t rc = PUMA_GetReturnCode();
   std::cerr << "Error was: "<< PUMA_GetReturnString(rc) << std::endl;
   throw(SDKError);
  }
  int lang = get_language_index( language );
  PUMA_SetImportData(PUMA_Word32_Language, &lang);
  PUMA_SetImportData(PUMA_Bool32_DotMatrix, &dotmatrix);
  PUMA_SetImportData(PUMA_Bool32_Fax100, &fax);
  PUMA_SetImportData(PUMA_Bool32_OneColumn, &onecolumn);
  if (!PUMA_XOpen(dib, (tmpnam("cuneiform_bmp")).c_str())) {
   std::cerr << "PUMA_Xopen failed."<<std::endl;
   uint32_t rc = PUMA_GetReturnCode();
   std::cerr << "Error was: "<< PUMA_GetReturnString(rc) << std::endl;
   throw(SDKError);
  }
  if(!PUMA_XFinalRecognition()) {
   std::cerr << "PUMA_XFinalRecognition failed."<<std::endl;
   uint32_t rc = PUMA_GetReturnCode();
   std::cerr << "Error was: "<< rc << " "<< PUMA_GetReturnString(rc) << std::endl;
   throw(SDKError);
  }
#ifdef PUMA_GETSPECIALBUFFER_DEFINED
  char * buffer = NULL;
  int32_t bufferlen=0;
  PUMA_GetSpecialBuffer(buffer, &bufferlen);
  if (NULL==buffer || 0=bufferlen) {
   std::cerr << "PUMA_GetSpecialBuffer failed."<<std::endl;
   uint32_t rc = PUMA_GetReturnCode();
   std::cerr << "Error was: "<< PUMA_GetReturnString(rc) << std::endl;
   throw(SDKError);
  }
  for (int i=0; i < bufferlen; i++) {
   recognized_text.append( *(buffer+i) );
  }
#else
  std::string outfilename=tmpnam("cuneiform_txt");
  if(!PUMA_XSave(outfilename.c_str(), PUMA_TOTEXT, PUMA_CODE_UTF8)) {
   std::cerr << "PUMA_XSave failed.\n";
   uint32_t rc = PUMA_GetReturnCode();
   std::cerr << "Error was: "<< PUMA_GetReturnString(rc) << std::endl;
   throw(FileError);
  }
  FILE* f=fopen(outfilename.c_str(), "rb");
  if (!f) {
   std::cerr << "Could not open file '"<<outfilename<<std::endl;
   throw(FileError);
  }
  struct stat filestatus;
  stat( outfilename.c_str(), &filestatus );
  char * buffer = new char[ filestatus.st_size ];
  fread(buffer, filestatus.st_size, 1, f);
  fclose(f);
  unlink(outfilename.c_str());
  recognized_text.append( buffer );
  delete[] buffer;
  delete[] pixels;
#endif
  if(!PUMA_XClose()) {
   std::cerr << "PUMA_XClose failed."<<std::endl;
   uint32_t rc = PUMA_GetReturnCode();
   std::cerr << "Error was: "<< PUMA_GetReturnString(rc) << std::endl;
   throw(SDKError);
  }
  if(!PUMA_Done()) {
   std::cerr << "PUMA_Done failed."<< std::endl;
   uint32_t rc = PUMA_GetReturnCode();
   std::cerr << "Error was: "<< PUMA_GetReturnString(rc) << std::endl;
   throw(SDKError);
  }
 } catch (enum ocr_error) {
  std::cout << "ocr error :(" << std::endl;
  //recognized_text="";
 }

 return recognized_text;
};

The method _ocr() is called in a loop. The stream-data is still correct (checked by dumping it on harddisk and comparing it with bitmap-definition)

Any hints, what is going wrong?

Thanks for your help,

Bye Andreas

Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

Cuneiform is unfortunately buggy. You probably just found one.

It is likewise unfortunate that we don't have the resources to fix these.

As a workaround, you can run Cuneiform by writing the image to a file, invoking the Cuneiform binary, loading the result and repeating this for each page. This is what I do, and it's fast enough for me.

Revision history for this message
Andreas Romeyke (fa-romeyke) wrote :

Hello JussiP,

the suggested workaround is not the solution because the problem exists independently from loading/writing images. If you check the example above, the multiple calls of init,open,finalrecogn., save, close, done-methods is crashing. This means internal data structures of PUMA will not be freed or initialized correctly.

Because I am not familiar with Russian, it would be very difficult to analyze sources to find hint about possible bugs. If you could help me, I could try to track down and help you with fixes.

Bye Andreas

Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

What I meant is that you don't compile and link your program against libcuneiform, but rather call the Cuneifrom binary with system() function (which can be found in stdlib.h).

Revision history for this message
Andreas Romeyke (fa-romeyke) wrote :

Hello,

Using GDB I could found that the problem occurs in libctb32.so.1.0.0:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fd10d9f26f0 (LWP 10223)]
0x00007fd107d099a7 in CTB_open () from /usr/local/lib64/libctb32.so.1.0.0
Current language: auto; currently asm
(gdb) bt
#0 0x00007fd107d099a7 in CTB_open () from /usr/local/lib64/libctb32.so.1.0.0
Cannot access memory at address 0x2f2f2f2f2f2f2f37

Hope that helps you to fix the problem.

Bye Andreas

Revision history for this message
Yury V. Zaytsev (zyv) wrote :

Probably bt full would be of more help...

Revision history for this message
Andreas Romeyke (fa-romeyke) wrote :

Hello Yury,

Ok, here is the full backtrace, simple_ocr is my own caller class as described on top

(gdb) bt full
#0 0x00007f99a816bed5 in raise () from /lib/libc.so.6
No symbol table info available.
#1 0x00007f99a816d3f3 in abort () from /lib/libc.so.6
No symbol table info available.
#2 0x00007f99a81a8408 in ?? () from /lib/libc.so.6
No symbol table info available.
#3 0x00007f99a81ad9a8 in ?? () from /lib/libc.so.6
No symbol table info available.
#4 0x00007f99a81afab6 in free () from /lib/libc.so.6
No symbol table info available.
#5 0x00007f99a357838a in CTB_done () from /usr/local/lib64/libctb32.so.1.0.0
No locals.
#6 0x00007f99a46ec002 in LEODone () from /usr/local/lib64/libleo32.so.1.0.0
No locals.
#7 0x00007f99a5c1286d in RSTRDone () from /usr/local/lib64/librstr.so.1.0.0
No locals.
#8 0x00007f99a5c128ee in RSTR_Done () from /usr/local/lib64/librstr.so.1.0.0
No locals.
#9 0x00007f99a8e5b4eb in ModulesDone () from /usr/local/lib64/libcuneiform.so.1.0.0
No locals.
#10 0x00007f99a8e5b953 in ModulesInit () from /usr/local/lib64/libcuneiform.so.1.0.0
No locals.
#11 0x00007f99a8e6058b in PUMA_Init () from /usr/local/lib64/libcuneiform.so.1.0.0
No locals.
#12 0x0000000000421ad1 in simple_ocr::_ocr ()

Bye Andreas

Revision history for this message
Andreas Romeyke (fa-romeyke) wrote :

with "cont" it will report:

(gdb) bt
#0 0x00007f234880771b in mempcpy () from /lib/libc.so.6
#1 0x00007f23487fd13e in _IO_default_xsputn () from /lib/libc.so.6
#2 0x00007f23487d2656 in vfprintf () from /lib/libc.so.6
#3 0x00007f23487f1e29 in vsprintf () from /lib/libc.so.6
#4 0x00007f23487d8de8 in sprintf () from /lib/libc.so.6
#5 0x00007f2343bc9519 in CTB_open () from /usr/local/lib64/libctb32.so.1.0.0
#6 0x2f2f2f2f2f2f2f2f in ?? ()
#7 0x2f2f2f2f2f2f2f2f in ?? ()
#8 0x2f2f2f2f2f2f2f2f in ?? ()

a "bt full" repeats the last line "0x2f2f2f2f2f2f2f2f in ?? ()" more than 400 times

Revision history for this message
Andreas Romeyke (fa-romeyke) wrote :

In cuneiform-linux-1.0.0/cuneiform_src/Kern/smetric/src/sm_kern.cpp there is a hint about "gwHeightRC" and "ghStorage" which are set via PUMA_Init(). The comment says:

* A pointer to the repository
* Unique number of libraries in a single session

Could this be the reason for the crash? How should then PUMA_Init () be called twice?

Bye Andreas

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.