file(1) no longer recognises XML content

Bug #285031 reported by Gavin Panella
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
file (Debian)
Fix Released
Unknown
file (Ubuntu)
Fix Released
Medium
Unassigned

Bug Description

Binary package hint: file

1) The release of Ubuntu you are using, via 'lsb_release -rd' or
   System -> About Ubuntu.

Intrepid

2) The version of the package you are using, via 'apt-cache policy
   packagename' or by checking in Synaptic.

4.24-4

3) What you expected to happen

That `file --mime-type some.xml` (where some.xml is any file
containing valid XML) should return either text/xml, application/xml,
or something more specific (like application/atom+xml for example).

4) What happened instead

More often than not, just text/plain is returned. Sometimes, like for
/usr/share/mime/text/x-haskell.xml, text/x-pascal is returned, or for
/usr/share/xml/docbook/schema/dtd/catalog.xml, text/html is reported,
both of which are wrong.

I ran:

  locate *.xml | xargs file --brief --mime-type |
    egrep '(application|text)/xml'

and got no output at all. The `locate` command found about 28000
files.

Revision history for this message
Brian Murray (brian-murray) wrote :

I've confirmed this using file version 4.24-4 on Intrepid.

Changed in file:
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Brian Murray (brian-murray) wrote :

Additionally using file version 4.21-3ubuntu1 on Hardy I saw this:

bdmurray@bizarro:~$ file -i /usr/share/mediatomb/web/cm.xml
/usr/share/mediatomb/web/cm.xml: text/xml

While on Intrepid:

13:04:03 - flash:[~] file -i /usr/share/mediatomb/web/cm.xml
/usr/share/mediatomb/web/cm.xml: text/plain charset=us-ascii

So this seems to be a regression.

Revision history for this message
Gavin Panella (allenap) wrote :

This is fixed in Debian, package version 4.26-1.

Revision history for this message
Colin Watson (cjwatson) wrote :

Confirmed fixed. The diff from 4.24-4 to 4.26-1 includes this chunk:

@@ -25,14 +31,23 @@
 # Extensible markup language (XML), a subset of SGML
 # from Marc Prud'hommeaux (<email address hidden>)
 0 search/1/cb \<?xml XML document text
+!:mime application/xml
 0 string \<?xml\ version\ " XML
+!:mime application/xml
 0 string \<?xml\ version=" XML
+!:mime application/xml
+>15 search/1 >\0 %.3s document text
+>>23 search/1 \<xsl:stylesheet (XSL stylesheet)
+>>24 search/1 \<xsl:stylesheet (XSL stylesheet)
 0 string \<?xml\ version=' XML
+!:mime application/xml
 >15 search/1 >\0 %.3s document text
 >>23 search/1 \<xsl:stylesheet (XSL stylesheet)
 >>24 search/1 \<xsl:stylesheet (XSL stylesheet)
 0 search/1/b \<?xml XML document text
+!:mime application/xml
 0 search/1/b \<?XML broken XML document text
+!:mime application/xml

 # SGML, mostly from rph@sq

Changed in file (Ubuntu):
status: Triaged → Fix Released
Changed in file (Debian):
status: Unknown → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.