want a way to mark files as binary even if they look like text (eg pdf files)

Bug #218128 reported by Pekka Jääskeläinen
98
This bug affects 12 people
Affects Status Importance Assigned to Milestone
Bazaar
Confirmed
Medium
Unassigned
Breezy
Triaged
Medium
Unassigned

Bug Description

We have a PDF which gets detected as a text file thus produces quite nasty diffs in the commit emails, etc.
This is probably due to it having couple of lines of text in the beginning of the file.

However, it would be useful in general to be able to flag files as binary in Bazaar.

James Westby (james-w)
Changed in bzr:
importance: Undecided → Wishlist
status: New → Confirmed
Revision history for this message
Casufi (vladimirkotulskiy) wrote :

Does it possible to learn bzr to process pdf files as binary?

D:\...1C7.7\Bases\Работа\Бухгалтерия_рабочая>bzr diff -r0..1 ClientBank |more
=== added directory 'ClientBank'
=== added file 'ClientBank/!Руководство пользователя.pdf'
--- ClientBank/!Руководство пользователя.pdf
 1970-01-01 00:00:00 +0000
+++ ClientBank/!Руководство пользователя.pdf
 2009-06-02 10:57:28 +0000
@@ -0,0 +1,3020 @@
%вгПУ-1.2
0000000016 00000 n xref
+0000000811 00000 n
+0000001466 00000 n
+0000001624 00000 n
+0000001802 00000 n
+0000002214 00000 n
+0000002896 00000 n
+0000012793 00000 n
+0000013006 00000 n
+0000013505 00000 n
+0000013720 00000 n
+0000013901 00000 n
+0000033543 00000 n
+0000034338 00000 n
+0000034863 00000 n
+0000047999 00000 n
+0000048205 00000 n
+0000064608 00000 n
+0000065184 00000 n
+0000065538 00000 n
+0000065746 00000 n
+0000000933 00000 n
+0000001444 00000 n
stream446 /Filter /FlateDecode /Length 150 0 R >> 9edcd0ef631130a4fc6e7>]
+H‰b``` ўҐ l ,N ИА
+ eaаpђf ahb pb ЁћФ |шЯC• Љ3 ˜_m TИ3сэ! dбWШ!Бї§еЅЭтЂfЕЉ $S xЉњ ;ЪX…k_q \й“ б )r PVlh ®№фўRС­т#§ЊВ
·-уЄWыд Ћx ‰@ЋinR© Џ–¦ n
ѕуAґ-‹Ѓ§б#Л v €Xџ
 ?Г Ж Ь к)з h8ЬoXМИl$!n!;-№Amюы& ¦ ¦ † ў њ ш j Љ!¶17°3LЉfфLм`Ї#®ЦIЃGЙ]$QЁ дBЕ‰• ЛT›9Љ\%\ @®76

D:\Develop\test>bzr --version
Bazaar (bzr) 1.15
  Python interpreter: C:\Develop\Python25\python.exe 2.5.4
  Python standard library: C:\Develop\Python25\lib
  bzrlib: C:\Develop\Python25\lib\site-packages\bzrlib
  Bazaar configuration: C:\Documents and Settings\Vladimir\Application Data\bazaar\2.0
  Bazaar log file: D:\Docs\.bzr.log

Copyright 2005, 2006, 2007, 2008, 2009 Canonical Ltd.
http://bazaar-vcs.org/

bzr comes with ABSOLUTELY NO WARRANTY. bzr is free software, and
you may use, modify and redistribute it under the terms of the GNU
General Public License version 2 or later.

Martin Pool (mbp)
summary: - some binary files detected as text: provide a way to flag files binary?
+ want a way to mark files as binary even if they look like text
Changed in bzr:
importance: Wishlist → Medium
Revision history for this message
Martitza (martitzam) wrote : Re: want a way to mark files as binary even if they look like text

I wish to add my voice to this.
I suggest a new configuration file like .bzrignore but more like .bzrhandlers which would specify an alternate handler (possibly none!) for filenames matching user defined patterns:

*.pdf /usr/bin/pdfdiff # a magical pdf diffing tool I hope someone will write. :)
*.odf # empty means do not try to diff
*.mov

Revision history for this message
Per Johansson (per.j) wrote :

The best approach IMO would be to support mime-types for individual files, like svn does. That requires file properties though.

Martin Pool (mbp)
summary: - want a way to mark files as binary even if they look like text
+ want a way to mark files as binary even if they look like text (eg pdf
+ files)
Revision history for this message
Janne Snabb (snabb) wrote :

It is possible to define end of line conversion in rules file with file name glob basis.

Why not just add another option in the same rules file which specifies if the file should be handled as a binary or not?

Jelmer Vernooij (jelmer)
tags: added: diff
Revision history for this message
Samuel Bronson (naesten) wrote :

Huh, so the gibberish binary bytes that Adobe suggests including in a comment on the second line aren't enough ... interesting!

I wonder what sort of bytes one would need to put there before bzr *would* be convinced? A NUL, maybe?

Revision history for this message
Vincent Ladeuil (vila) wrote :

from bzrlib.textfile:

def text_file(input):
    """Produce a file iterator that is guaranteed to be text, without seeking.
    BinaryFile is raised if the file contains a NUL in the first 1024 bytes.
    """
    first_chunk = input.read(1024)
    if '\x00' in first_chunk:
        raise BinaryFile()
    return IterableFile(chain((first_chunk,), file_iterator(input)))

Revision history for this message
Per Johansson (per.j) wrote :

Precense of anything < 0x20 is probably a better indication. If one want to be extra strict validity could be checked against the current locale. But that is not a solution to the problem stated in the subject.

Revision history for this message
Per Johansson (per.j) wrote :

(excepting NL and CR ofc, and perhaps ESC as well)

Revision history for this message
Slawomir Wojcieszek (swojcieszek) wrote :

Does anyone know how to solve this problem without a bug fix?

information type: Public → Public Security
information type: Public Security → Public
Jelmer Vernooij (jelmer)
tags: added: check-for-breezy
Jelmer Vernooij (jelmer)
tags: removed: check-for-breezy
tags: added: file-attributes
Changed in brz:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.