pdf detected as text

Bug #36106 reported by lucia_engel
48
Affects Status Importance Assigned to Milestone
GnomeVFS
Fix Released
High
shared-mime-info (Ubuntu)
Fix Released
Medium
Sebastien Bacher

Bug Description

Confirmed real JPEG and PDF files detected as "plain text files". Not all files of those type are affected. Error seen in both nautilus and terminal.

I just downloaded a .rar archive with jpg files in it. When I tried to open them, an error message

Cannot Open XX.jpg

The filename "XX.jpg" indicates that this file is of type "JPEG image". The content of the file indicate that the file is of type "plain text document". If you open this file, the file might present a security risk to your system.
Do not open the file unless you created the file yourself, or received the file from a trusted source. To open the file, rename the file to the correct extension for "plain text document", then open the file normally. Alternatively, use the Open With menu to choose a specific application for the file.

pops up. I'm sure this isn't a plain text file because I can open them in WinXP like pictures just fine. I tried to use Open With and changing the extension, but it doesn't work. I saw this error for some pdf files I downloaded yesterday too. I thought it was only because they were corrupted and deleted them. Now I know this isn't isolated.

I tried to use both nautilus and the terminal to open the file but they both cannot recognize the files.

I've searched bugzilla and came up with three similar bugs. #35813, #6031, #23272 have the same error message, though two are claimed to be either external (Scribus) or solved (in Dapper).

A picture of the bug can be seen here http://arvindn.livejournal.com/15761.html

description: updated
summary: + Confirmed real JPEG and PDF files detected as "plain text files". Not
+ all files of those type are affected. Error seen in both nautilus and
+ terminal.
Revision history for this message
Sebastien Bacher (seb128) wrote :

Thanks for your bug. What version of Ubuntu do you use? Could you attach a .jpg creating the issue for you?

Changed in nautilus:
assignee: nobody → desktop-bugs
status: Unconfirmed → Needs Info
Revision history for this message
Marcel Stimberg (marcelstimberg) wrote :

I can reproduce this problem for some PDF files (files are recognized as "text/plain" but can be opened with evince/xpdf/acroread). I'm not sure whether this plays a role or not but all PDF documents that are not recognized by nautilus (no problem using the file command) are PDF 1.4 documents. However, there are PDF 1.4 documents that are correctly recognized.
My gnome is up-to-date:
libgnome2-vfs 2.14.0-0ubuntu1
nautilus 2.14.0-0ubuntu3

Unfortunately I don't have a small file to attach, but the pdf manual from the pgf package is an example of a not correctly recognized file. After installing pgf it can be found at

/usr/share/doc/texmf/latex/pgf/version-for-pdftex/en/pgfmanual.pdf.gz

"gunzipping" gives a pdf file recognized as "text/plain".

Revision history for this message
lucia_engel (lucia-engel) wrote : Screenshot when opening JPG

Opening from Nautilus.
This only happened with the JPGs in this particular RAR archive. No other JPG files are affected. This can be opened in WinXP Home SP2 with no problem.

Revision history for this message
lucia_engel (lucia-engel) wrote : Terminal output from 7zip

This happened when I tried to extract the RAR archive.

Revision history for this message
lucia_engel (lucia-engel) wrote : Terminal output from eog

This happened when I tried to use eog to open one of the JPGs.

Revision history for this message
lucia_engel (lucia-engel) wrote : Re: Non-text files reported as "plain text document"

My Ubuntu version
Breezy Badger 5.10
Other
kernel: linux 686 2.6.12.16.1
nautilus: 2.12.1-0ubuntu1.2
7zip: pzip 4.20-1
eog: 2.13.2-0ubuntu1~b
terminal: gnome-terminal 2.12.0-ubuntu2

File output
smrtalec@ubuntu:/media/hda6/NANA 01_13$ file NaNa_12_102.jpg
NaNa_12_102.jpg: empty

Revision history for this message
Sebastien Bacher (seb128) wrote :

could you attach an example (ie: the actual .jpg) to the bug so we can work with it?

The pdf is fixed locally, I'll upload that change after flight-6

Revision history for this message
Sebastien Bacher (seb128) wrote :

in fact the fix is not correct, I've forwarded it upstream: http://bugzilla.gnome.org/show_bug.cgi?id=336633

Revision history for this message
Carthik Sharma (carthik) wrote :

Please see Bug #38195 for a sample pdf document and messages from the other reporter.

Changed in nautilus:
status: Needs Info → Confirmed
Revision history for this message
lucia_engel (lucia-engel) wrote : .ram file same problem

I deleted the problematic .jpg file already, but I just saved this on my desktop, and when I opened it, it had that same message.

Revision history for this message
Sebastien Bacher (seb128) wrote : Re: Non-text files reported as "plain text document"

That's not easy to keep trace of different issues to the same bug. I'm updating that one about the pdf issue. Please open other bugs for issue you have with other formats than .pdf

Revision history for this message
Sebastien Bacher (seb128) wrote :

That's a shared-mime-info issue, I'll fix it with next upload of that package

Changed in gnome-vfs2:
assignee: desktop-bugs → seb128
Revision history for this message
In , Allison Karlitskaya (desrt) wrote :

the current shared-mime-info (0.17-0ubuntu7) contains 3 possible matches (all at
the same priority, 50) for files starting with "%"

TEX
matlab
PDF

this causes PDF files to have their typed detected as octet-stream.

The TeX/matlab matches should have their priority lowered so that the more
specific "%PDF-" string will match with higher priority.

TeX and matlab having the same magic detection string seems vaguely useless,
though.....

Revision history for this message
In , Allison Karlitskaya (desrt) wrote :

Note: postscript starts like "%!PS-Adobe-2.0" and is similarly broken by this bug.

Revision history for this message
Sebastien Bacher (seb128) wrote :

This upload fixes the issue:

 shared-mime-info (0.17-0ubuntu8) dapper; urgency=low
 .
   * debian/patches/190_pdf_conflict_fix.patch:
     - lower the priority of matlab and tex matching on "%" so they don't
       conflict with %PDF by example (Ubuntu: #36106)

Changed in shared-mime-info:
status: Confirmed → Fix Released
Revision history for this message
In , Gemi-bluewin (gemi-bluewin) wrote :

I can confirm this with shared-mime-info-0.17-1.fc5.1,
on Fedora Core 5. Some PDFs are correctly identified,
others as text. I increased the priority for PDF from 50 to 60,
now all PDFs are correctly identified by nautilus.

Revision history for this message
In , bernhard kleine (bbfk) wrote :

(In reply to comment #2)
> I can confirm this with shared-mime-info-0.17-1.fc5.1,
> on Fedora Core 5. Some PDFs are correctly identified,
> others as text. I increased the priority for PDF from 50 to 60,
> now all PDFs are correctly identified by nautilus.

Could you please tell me where to increase the priority, I have some serious
troubles sending pdf which are not properly identified via email.

Bernhard Kleine

Revision history for this message
In , Freedesktop-tevp (freedesktop-tevp) wrote :

Confirm on Debian unstable (shared-mime-info 0.17-1 package). Changing the glob
priority of PDF from 50 to 60 fixes the problem. Any progress on getting this
actually noticed by someone with commit privs?

Revision history for this message
In , Alex L (alexl-users) wrote :

Here's the downstream report on Fedora Core 5:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=193582

Interestingly enough 0.17-1.fc5.1 was supposed to fix a postscript vs. matlab
problem:

* Wed Mar 22 2006 Matthias Clasen <email address hidden> - 0.17-1.fc5.1
- Backport upstream change to fix postscript vs. matlab confusion

Revision history for this message
In , Joachim Frieben (jfrieben) wrote :

The issue is also solved by upgrading "shared-mime-info" to current "CVS",
as I had pointed out in:

  https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=190858#c6

The "RPM" package built accordingly on 05/25/06 behaves correctly.

Revision history for this message
In , Alex L (alexl-users) wrote :

(In reply to comment #6)
> The issue is also solved by upgrading "shared-mime-info" to current "CVS",
> as I had pointed out in:

Since it looks like it's been fixed in freedesktop's CVS, by this change:

http://webcvs.freedesktop.org/mime/shared-mime-info/freedesktop.org.xml.in?r1=1.138&r2=1.139

it should be marked closed.

Changed in gnome-vfs:
status: Confirmed → Fix Released
Changed in gnome-vfs:
importance: Unknown → High
Changed in gnome-vfs:
importance: High → Unknown
Changed in gnome-vfs:
importance: Unknown → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.