0.12.0-1 Subtitles. ocrad used instead of tesseract

Bug #247826 reported by Robbie G
4
Affects Status Importance Assigned to Milestone
ogmrip (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: ogmrip

Hi,

Version 0.12.0-1~getdeb (from Ubuntu MOTO developers) - I hope it's OK to log this bug here despite the fact this version's not in the repo.

I have a subtitles issue where ocrad outputs a number 5 instead of a capital letter S. I raised this with the OGMRip developers who state that tesseract is the preferred choise of the OGMRip developers.

However, if I install tesseract it will not be used because the ubuntu 0.12.0-1 OGMRip specifies ocrad at build time which means that the preferred behaviour does not take place.

I can see from the packaging history that this dependency was introduced at ogmrip 0.11.1-0 as a bug fix (200600) by Florent Mertens however the notes for that bug state "Build with --with-ocr=ocrad : ogmrip can't choose at runtime, so we have to make a choice at build time. - Add ocrad to build-dep"

But, since 0.12, ogmrip detects at runtime which OCR are installed and selects either tesseract, gocr or ocrad, in that order so, according to the developers, the build should specify --with-ocr=auto or --with-ocr=tesseract instead.

Thanks for all the great work.

Regards, Rob

Related branches

Revision history for this message
Robbie G (robbie-cartwood-nee-grimwood) wrote :
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ogmrip - 0.12.2-0.0ubuntu1

---------------
ogmrip (0.12.2-0.0ubuntu1) intrepid; urgency=low

  * Merge from debian-multimedia (LP: #268063), remaining changes:
   - debian/libogmrip0.install & debian/install:
    + Don't install the .desktop file in the library package (LP: #204448)
   - debian/rules:
    + Drop configure flag --with-ocr=ocrad (introduced to fix LP #200600),
      as ogmrip now detect at runtime the ocr.
   - debian/control:
    + Set Depends for ocr to tesseract | ocrad | gocr . (LP: #247826)

ogmrip (0.12.2-0.0) unstable; urgency=low

  * New upstream release.

ogmrip (0.12.1-0.2) unstable; urgency=low

  * ogmrip need to depends on gpac for mp4 file.

ogmrip (0.12.1-0.1) unstable; urgency=low

  * Remove libgtk2.0-dev and add libglade2-dev, intltool and libgconf2-dev
    as dependencies for the libogmrip-dev package.
  * Don't package .la files and remove .a files from plugins directories.
  * Added libnotify-dev-gtk2.10 in Build-depends.
  * debian/rules Call quilt cleanly.

ogmrip (0.12.1-0.0) unstable; urgency=low

  * New upstream release.

ogmrip (0.12.0-0.0) unstable; urgency=low

  * New upstream release.

ogmrip (0.11.2-0.1) unstable; urgency=low

  * Removed gocr from Depends field.

ogmrip (0.11.2-0.0) unstable; urgency=low

  * New upstream release.

ogmrip (0.11.1-0.2) unstable; urgency=low

  * debian/watch fix address.

 -- Julien Lavergne <email address hidden> Fri, 03 Oct 2008 17:43:54 +0200

Changed in ogmrip:
status: New → Fix Released
Revision history for this message
unggnu (unggnu) wrote :

This is still an issue in Intrepid with OGMRip 0.12.2. Even if tesseract is installed ocrad is used which is unusable imho.

Changed in ogmrip:
status: Fix Released → Confirmed
Revision history for this message
Robbie G (robbie-cartwood-nee-grimwood) wrote :

I am on intrepid now too and tesseract is not being used however in my case gocr is being selected.

This is odd because ocrad is installed which, if I read things correctly, is different to the rule Launchpad Janitor details 21/10/08 namely:

"Set Depends for ocr to tesseract | ocrad | gocr . (LP: #247826)"

which I assume is an order of preference depending on what OGMrip finds installed.

Hope this helps, Rob

Revision history for this message
Julien Lavergne (gilir) wrote :

There is a typo in the package, as the name of the tesseract package is in reality tesseract-ocr.
A fixed package for Intrepid is available in my PPA : https://launchpad.net/~gilir/+archive
Could you please test with this package and tesseract-ocr installed and ocrad not installed and see if it's work ? Additionaly, if you can test with both installed and see if it's work too ?
Thanks.

Revision history for this message
Robbie G (robbie-cartwood-nee-grimwood) wrote :

Hi Julien,

1) With tesseract, gocr & ocrad installed (my config), gocr is used.

joebloggs@doddlebonk:~$ dpkg --get-selections | grep tesseract
tesseract-ocr install
tesseract-ocr-deu install
tesseract-ocr-eng install
joebloggs@doddlebonk:~$ dpkg --get-selections | grep gocr
gocr install
joebloggs@doddlebonk:~$ dpkg --get-selections | grep ocrad
ocrad install

gives us:

gocr -v 1 -f ASCII -m 4 -m 64 -o /tmp/sub.WW9GKU0601.pgm.txt /tmp/sub.WW9GKU0601.pgm
 Optical Character Recognition --- gocr 0.45 20071126

I will run 2 more checks without gocr. 2) with ocrad and tesseract & 3) with tesseract ony as per your request.

Will post again with each result

Cheers, Rob

Revision history for this message
Robbie G (robbie-cartwood-nee-grimwood) wrote :

WIth ocrad and tesseract installed, ocrad is used.

joebloggs@doddlebonk:~$ dpkg --get-selections | grep tesseract
tesseract-ocr install
tesseract-ocr-deu install
tesseract-ocr-eng install
joebloggs@doddlebonk:~$ dpkg --get-selections | grep gocr
joebloggs@doddlebonk:~$ dpkg --get-selections | grep ocrad
ocrad install

gives us:

ocrad -v -f -F byte -l 0 -o /tmp/sub.JJRKKU0305.pgm.txt /tmp/sub.JJRKKU0305.pgm

Revision history for this message
Robbie G (robbie-cartwood-nee-grimwood) wrote :

Hi Julien,

The final test is with only tesseract installed and it OGMrip seems not to get round to doing any OCR at all.

I quote the lines of the log file for reference. To the untrained eye it seems to launch straight into the audio decoding:

mplayer -nolirc -nocache -noframedrop -mc 0 -vc null -vo null -ao pcm:waveheader:file=/tmp/fifo.EARFKU -af volnorm=1 -channels 2 -aid 128 -dvd-device /dev/scd0 dvd://1
lame --nohist -h --preset fast medium /tmp/fifo.EARFKU /tmp/audio.L7QFKU
MPlayer 1.0rc2-4.3.2 (C) 2000-2007 MPlayer Team
CPU: Genuine Intel(R) CPU T2050 @ 1.60GHz (Family: 6, Model: 14, Stepping: 8)
CPUflags: MMX: 1 MMX2: 1 3DNow: 0 3DNow2: 0 SSE: 1 SSE2: 1
Compiled with runtime CPU detection.
Terminal type `unknown' is not defined.

Playing dvd://1.
There are 2 titles on this DVD.
There are 1 chapters in this DVD title.
There are 1 angles in this DVD title.
audio stream: 0 format: ac3 (stereo) language: en aid: 128.
number of audio channels on disk: 1.
subtitle ( sid ): 0 language: en
number of subtitles on disk: 1
MPEG-PS file format detected.
VIDEO: MPEG2 720x576 (aspect 3) 25.000 fps 9800.0 kbps (1225.0 kbyte/s)
==========================================================================
Forced video codec: null
Opening video decoder: [null] Null video decoder
VDec: vo config request - 720 x 576 (preferred colorspace: BGR 24-bit)
VDec: using Planar YV12 as output csp (no 0)
Movie-Aspect is 1.78:1 - prescaling to correct movie aspect.
VO: [null] 720x576 => 1024x576 Planar YV12
Selected video codec: [null] vfm: null (NULL codec (no decoding!))
==========================================================================
==========================================================================
Forced audio codec: mad
Opening audio decoder: [liba52] AC3 decoding with liba52
AUDIO: 48000 Hz, 2 ch, s16le, 192.0 kbit/12.50% (ratio: 24000->192000)
Selected audio codec: [a52] afm: liba52 (AC3-liba52)
==========================================================================
[AO PCM] File: /tmp/fifo.EARFKU (WAVE)
PCM: Samplerate: 48000Hz Channels: Stereo Format s16le
[AO PCM] Info: Faster dumping is achieved with -vc null -vo null -ao pcm:fast
[AO PCM] Info: To write WAVE files use -ao pcm:waveheader (default).
AO: [pcm] 48000Hz 2ch s16le (2 bytes per sample)
Starting playback...
A: 0.7 V: 0.0 A-V: 0.678 ct: 0.000 1/ 1 ??% ??% ??,?% 1 0
A: 1.0 V: 0.4 A-V: 0.683 ct: 0.000 2/ 2 ??% ??% ??,?% 2 0
A: 1.4 V: 1.3 A-V: 0.120 ct: 0.000 26/ 26 0% 0% 24.2% 21 0
A: 1.4 V: 1.3 A-V: 0.080 ct: 0.000 27/ 27 0% 0% 23.3% 21 0
A: 1.4 V: 1.4 A-V: 0.040 ct: 0.000 28/ 28 0% 0% 22.4% 21 0
A: 1.7 V: 1.6 A-V: 0.086 ct: 0.000 35/ 35 0% 0% 18.6% 24 0

Hope this helps - I'll be here for further tests if you need me.

Rob

Revision history for this message
Julien Lavergne (gilir) wrote :

Thanks a lot for all this test, I'll have a look at it.

Revision history for this message
Julien Lavergne (gilir) wrote :

I pushed a possible fix in my PPA : ogmrip - 0.12.2-0.0ubuntu2~ppa3 (Intrepid). It should bring the ocr automaticly now. Could you please do a test with this package ?
Thanks.

Revision history for this message
Robbie G (robbie-cartwood-nee-grimwood) wrote :

Hi, my first test with only tesseract (i.e. no gocr or ocrad) installed results in the following:

1) Tesseract cannot be used as there is no ability to rip srt subtitles.

2) 3 profiles "disappear" from the list of profiles - these are ones that use SRT text as the subtitle decoder. A profile configured with vobsub remains

3) If I add a new profile, the subtitle CODEC selector is blank (screenshot attached)

If I install either gocr or ocrad, the profiles "reappear" and I am able to select a subtitle CODEC in a new profile.

I hope this helps diagnose the problem!

Revision history for this message
Julien Lavergne (gilir) wrote :

Thanks for the tests.

>1) Tesseract cannot be used as there is no ability to rip srt subtitles.
Is it a normal behavior ? Because if tesseract can handle srt, so all profiles should not be available as they need a srt capable ocr.
In this case, it should be a bug or a feature request for tesseract.
I'll made another test package for another possibility.

Revision history for this message
Julien Lavergne (gilir) wrote :

Ok, maybe I found the problem. Please try the 0.12.2-0.0ubuntu2~ppa5 when it'll be available in PPA.

Revision history for this message
Robbie G (robbie-cartwood-nee-grimwood) wrote :

Thanks Julien, great work - we got there in the end!

The following *correct* behaviour is now seen with OGMrip 0.12.2-0.0ubuntu2~ppa5:

With my last setup i.e. only tesseract installed: tesseract is used

and...

With gocr, ocrad & tesseract installed: tesseract is used
With only gocr & ocrad installed: gocr is used
With only ocrad installed: ocrad is used

i.e. the behaviour as described by the great man himself:

"Comment By: Olivier Rolland (billl)
Date: 2008-07-12 08:45
Since 0.12, ogmrip detects at runtime which OCR are installed and selects
either tesseract, gocr or ocrad, in that order."

Revision history for this message
Julien Lavergne (gilir) wrote :

Thanks Robbie for the help :)

I included the debdiff to fix it in Jaunty.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ogmrip - 0.12.2-0.0ubuntu3

---------------
ogmrip (0.12.2-0.0ubuntu3) jaunty; urgency=low

  * debian/control:
   - Add libtiff4-dev as Build-Depends to add tesseract support.
  * debian/rules:
   - Add --with-ocr=auto flag to add runtime ocr detection (LP: #247826).

 -- Julien Lavergne <email address hidden> Thu, 27 Nov 2008 20:33:27 +0100

Changed in ogmrip:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.