Add --nopictures, --tables=n to cli

Bug #395351 reported by Ben Jackson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cuneiform for Linux
New
Undecided
Unassigned

Bug Description

I found that cuneiform would not OCR anything inside an outline (any kind of box) and would consider it either a picture or a table. With '--nopicture' it seems to just ignore those same areas. If you also add --tables it will successfully OCR inside the table.

I've never used bzr before this so I've probably botched the patch somehow, but it is fairly trivial.

Revision history for this message
Ben Jackson (ben.jackson) wrote :
Revision history for this message
Ben Jackson (ben.jackson) wrote :

I see now in puma.h there are constants related to the values I set:

        # define PUMA_TABLE_NONE 0
        # define PUMA_TABLE_DEFAULT 1
        # define PUMA_TABLE_ONLY_LINE 2
        # define PUMA_TABLE_ONLY_TEXT 3
        # define PUMA_TABLE_LINE_TEXT 4

        # define PUMA_PICTURE_NONE 0
        # define PUMA_PICTURE_ALL 1

I tried all the table settings but didn't really get any different output. Setting it to anything other than 0 got it to OCR things inside a box outline (as opposed to turning the box into a picture).

Revision history for this message
Jussi Pakkanen (jpakkane) wrote :

This probably has to do with the table recognition code, which has not been open sourced yet. I'd like to get some comments from Cognitive people before committing this.

Revision history for this message
Ben Jackson (ben.jackson) wrote :

I saw the earlier discussion about this. I believe the missing code is for table *output* (eg as a spreadsheet). Enabling tables definitely allows the OCR code to look inside outlined areas, which it otherwise will not. In fact, if you OCR a page with a border it will not recognize anything without these new options.

Also, assuming bzr send is reasonable, the two patches are separate, so you could apply --nopictures.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.