Initial support of multi-page files

Bug #267006 reported by Kuzemko Aleksandr
6
Affects Status Importance Assigned to Milestone
Cuneiform for Linux
Confirmed
Wishlist
Jussi Pakkanen

Bug Description

Please try this patch
Add support for pdf djvu tiff multi-page files
Pages saved in separeted files with name like outfilename_page_NUMBER_OF_PAGE.format_extension.

Revision history for this message
Kuzemko Aleksandr (kuzemkoa-rambler) wrote :
Revision history for this message
Alex Samorukov (samm-os2) wrote :

As far as i know there is multipage support inside PUMA engine and i think that we should try to use it. Because 100 rtf files are little useless.

Revision history for this message
Alex Samorukov (samm-os2) wrote :

2Aleksandr
May i ask you to attach some multipage tiff/djvu files to this bug? I`m trying to test multipage mode with cuneiform.

Revision history for this message
Kuzemko Aleksandr (kuzemkoa-rambler) wrote :

Yes

Revision history for this message
Kuzemko Aleksandr (kuzemkoa-rambler) wrote :
Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

I agree with Alex: we should use PUMA lib's internal multipage mechanism. Thus I'm not applying this patch. I'll keep the bug open though, since we eventually want to do multipage recognition.

Changed in cuneiform-linux:
assignee: nobody → jpakkane
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Kuzemko Aleksandr (kuzemkoa-rambler) wrote :

I dont undenstand! When I use such function:
static char* read_file(const char *fname) {
    Blob blob;
    size_t data_size;
    char *dib;
    try {
        Image image;
// image.density("10");
// image.ping(fname);
// cout << "Image number of page "<< image.fileSize()/(image.columns()*image.rows()) << "\n";
// image.density("300");
        int i=0;
        char* temp="file_venugopal.pdf[1]";
// sprintf(temp, "%s[%d]",fname, i );
        cout << "Now we form file name"<< temp<<".\n";
        image.read(temp);//"file_venugopal.pdf[1]");
// image.magick( "BMP" );
// image.depth(24);
// image.monochrome(1);
        // Write to BLOB in BMP format
        image.write(&blob, "DIB");
        cout << "Write BLOB" << "\n";
        image.write("out.bmp");
    } catch(Exception &error_) {
        cerr << error_.what() << "\n";
        return NULL;
    }
    data_size = blob.length();
    dib = new char[data_size];
    memcpy(dib, blob.data(), data_size);
    return dib;
}
it recognize second page of file_venugopal.pdf.
But when I use
...
        int i=0;
        char* temp;
        sprintf(temp, "%s[%d]",fname, i );
        cout << "Now we form file name"<< temp<<".\n";
...
I get Segfault error.

Where Is my mistake?

Revision history for this message
Polevoy Dmitry (openocr-polevoy) wrote : Re: [Bug 267006] Re: Initial support of multi-page files

In line
sprintf(temp, "%s[%d]",fname, i );
you write to random memory adress because 'temp' a non valid pointer for
this case.

Try
char* temp[1024] = {0};
instead
char* temp;

2009/3/13 Kuzemko Aleksandr <email address hidden>

> I dont undenstand! When I use such function:
> static char* read_file(const char *fname) {
> Blob blob;
> size_t data_size;
> char *dib;
> try {
> Image image;
> // image.density("10");
> // image.ping(fname);
> // cout << "Image number of page "<<
> image.fileSize()/(image.columns()*image.rows()) << "\n";
> // image.density("300");
> int i=0;
> char* temp="file_venugopal.pdf[1]";
> // sprintf(temp, "%s[%d]",fname, i );
> cout << "Now we form file name"<< temp<<".\n";
> image.read(temp);//"file_venugopal.pdf[1]");
> // image.magick( "BMP" );
> // image.depth(24);
> // image.monochrome(1);
> // Write to BLOB in BMP format
> image.write(&blob, "DIB");
> cout << "Write BLOB" << "\n";
> image.write("out.bmp");
> } catch(Exception &error_) {
> cerr << error_.what() << "\n";
> return NULL;
> }
> data_size = blob.length();
> dib = new char[data_size];
> memcpy(dib, blob.data(), data_size);
> return dib;
> }
> it recognize second page of file_venugopal.pdf.
> But when I use
> ...
> int i=0;
> char* temp;
> sprintf(temp, "%s[%d]",fname, i );
> cout << "Now we form file name"<< temp<<".\n";
> ...
> I get Segfault error.
>
> Where Is my mistake?
>
> --
> Initial support of multi-page files
> https://bugs.launchpad.net/bugs/267006
> You received this bug notification because you are a member of Cuneiform
> Linux, which is the registrant for Cuneiform for Linux.
>
> Status in Linux port of Cuneiform: Confirmed
>
> Bug description:
> Please try this patch
> Add support for pdf djvu tiff multi-page files
> Pages saved in separeted files with name like
> outfilename_page_NUMBER_OF_PAGE.format_extension.
>

Revision history for this message
Kuzemko Aleksandr (kuzemkoa-rambler) wrote :

...
        char* temp[1024] = {0};
// char* temp="file_venugopal.pdf[1]";
        sprintf(temp, "%s[%d]",fname, i );
...
result of make:
...
[100%] Building CXX object cuneiform_src/Kern/CMakeFiles/cuneiform.dir/cuneiform-cli.cpp.o
/home/mgraf/cuneiform-linux/cuneiform_src/Kern/cuneiform-cli.cpp: In function ‘char* read_file(const char*)’:
/home/mgraf/cuneiform-linux/cuneiform_src/Kern/cuneiform-cli.cpp:142: error: cannot convert ‘char**’ to ‘char*’ for argument ‘1’ to ‘int sprintf(char*, const char*, ...)’
...

line 142: sprintf(temp, "%s[%d]",fname, i );

Revision history for this message
Kuzemko Aleksandr (kuzemkoa-rambler) wrote :

Change char* temp[1024] = {0}; to char temp[1024] = {0}; Now it compiles.

Revision history for this message
Kuzemko Aleksandr (kuzemkoa-rambler) wrote :
Download full text (3.2 KiB)

Function with better support of vector format image. It read first page of pdf, svg, djvu files with image resolution 300dpi.

I have problem with getting number of pages of multipage images. Anybody can help me with this?
...
#define MIN_DPI_FOR_VECTOR_FORMAT 300
...

static char* read_file(const char *fname) {
    Blob blob;
    size_t data_size;
    char *dib;
    try {
        Image image;
        image.density("10");
        image.ping(fname);
        if (image.magick()=="PDF" or image.magick()=="SVG" or image.magick()=="DJVU")
        {
        image.density(Geometry(MIN_DPI_FOR_VECTOR_FORMAT,MIN_DPI_FOR_VECTOR_FORMAT));//change from default 72 dpi
        }
        int i=0;//read first page
        char temp[1024] = {0};
        while(1)
        {
        sprintf(temp, "%s[%d]",fname, i );
        i++;
        cout << "We read " << temp << " file" << endl;
        image.read(temp);
        }
        image.depth(24);
        // Write to BLOB in BMP format
        image.write(&blob, "DIB");
    } catch(Exception &error_) {
        cerr << error_.what() << "\n";
        return NULL;
    }
    data_size = blob.length();
    dib = new char[data_size];
    memcpy(dib, blob.data(), data_size);
    return d...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Related questions

Remote bug watches

Bug watches keep track of this bug in other bug trackers.