Provide plugin api to provide text and images for binary files for diffs

Bug #273701 reported by Alexander Belchenko
2
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Low
Unassigned

Bug Description

There is docdiff plugin from HAYASHI Kentaro:
http://gigo-ice.org/scm/bazaar/plugins/docdiff.html

This plugins used xdoc2txt utility (free, non-opensource, but it used via command line) for converting binary documents, e.g. MS Word or Adobe PDF, to plain text and then diff them. xdoc2txt seems mature enough and simple in use. We could provide fallback to it when qdiff encounter supported documents.

This is idea for thinking and experimenting. Need to check how xdoc2txt handles with non-ascii characters in documents.

Revision history for this message
Alexander Belchenko (bialix) wrote :
Changed in qbzr:
importance: Undecided → Wishlist
status: New → Incomplete
Revision history for this message
Kentaro Hayashi (kenhys) wrote :

you can select xdoc2txt.exe output encoding in 3 ways.
1. Shift_JIS, 2. EUC-JP, 3. JIS.
so it solves to view diff context in Japanease or ASCII specific environment only.

Revision history for this message
Alexander Belchenko (bialix) wrote : Re: [Bug 273701] Re: using xdoc2txt for diffing binary document format files

HAYASHI Kentaro пишет:
> you can select xdoc2txt.exe output encoding in 3 ways.
> 1. Shift_JIS, 2. EUC-JP, 3. JIS.
> so it solves to view diff context in Japanease or ASCII specific environment only.
>
Thanks.

I did some experiments with Word document that contains Russian characters,
and with default xdoc2txt options it outputs text in cp1251 (default Russian
encoding on Windows). So it should work for non-Japanese too.

But Excel documents does not work in that way. I.e. I see numbers, but not text.
Strange.

Revision history for this message
Gary van der Merwe (garyvdm) wrote :

I think that the way to do this would be to have a plugin api in bzr, that would allow the plugin to provide a text, or image representation for supported binary files. The image part would only be used by qdiff, and maybe gdiff, but should be supported in the core plugin api.

Then it would be easy to write a plugin that uses xdoc2txt.

summary: - using xdoc2txt for diffing binary document format files
+ Provide plugin api to provide text and images for binary files for diffs
affects: qbzr → bzr
Changed in bzr:
importance: Wishlist → Low
status: Incomplete → Confirmed
Revision history for this message
Alexander Belchenko (bialix) wrote :

Gary, I think inside bzr should be the way to specify diff tool/algorithm for several type of files.

E.g. just today one man from ru_bzr group asked about writing plugin to diff binary files specific for Russian accounting application called 1C:Enterprise. There is 3rd party tools to do so, but currently there is impossible(?) to teach bzr to use extra tools for producing diffs of *.mdf files etc.

Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.