Some file formats allow extracting when file is not an archive

Bug #803229 reported by Chris Wharton on 2011-06-28
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mahara
Low
Unassigned

Bug Description

Mahara Version affected:
v1.3, 1.4a

When uploading an archive of .docx and some other file types (eg Adobe Captivate), Mahara recognises the file type incorrectly. Mahara allows unzipping of these files and breaks the files into individual components (eg XML metadata).

When downloading the uploaded file, the browser recognises the file as a zip file, but the OS recognises the file correctly as a .docx

Chris Wharton (y-chrisw) wrote :
François Marier (fmarier) wrote :

Interesting, I wonder if that's a bug in the PHP mimetype detection code...

The funny thing is that it's not entirely wrong, these files are zip files after all :)

Changed in mahara:
status: New → Triaged
importance: Undecided → Low

There is a forum post at http://mahara.org/interaction/forum/topic.php?id=3791#post16704 that alludes to the same problem. Though she write of Adobe Captivate files (which I guess are also XML plus content files) and MS Office.

François Marier (fmarier) wrote :

Also mentioned on this forum thread: http://mahara.org/interaction/forum/topic.php?id=4155

Ruslan Kabalin (rkabalin) wrote :

I can't reproduce it. Docx is treated as normal document file. This might be related to php version or OS. I am on Debian Squeeze, Apache 2.2.16-6+squeeze4, PHP 5.3.3-7+squeeze3

Melissa Draper (melissa) on 2011-11-24
Changed in mahara:
assignee: nobody → Melissa Draper (melissa)
Melissa Draper (melissa) wrote :

I can't reproduce this, however I suspect this patch may fix the issue.

Can someone who can reproduce the issue please test it?

Melissa, this only happens when you upload a zip file which includes docx etc. and then you extract the files and docx is then seen as archive. Not when you upload a docx on its own.

Changed in mahara:
status: Triaged → In Progress
Melissa Draper (melissa) on 2012-07-02
Changed in mahara:
assignee: Melissa Draper (melissa) → nobody
status: In Progress → Confirmed
awillson (awillson) wrote :

If this bug is still open...

This problem seems to be related to multiple issues:

Mahara (as of 1.6.2) Issue:
1. The docx, pptx, xlsx mime types are not included in the 'artefact_file_mime_types' table in the database.

Web Server Issue:
2. The web server (Apache or IIS) may need to be configured to send appropriate mime types for docx, pptx, xlsx.

Web Browser Issue:
3. IE8 is known to have issues with docx, pptx, xlsx mime types.

awillson (awillson) wrote :

Sorry for the debian distribution addition-deletion. I am using debian, but I'm not using the debian packaged Mahara software. So I don't know if it affects the default debian package.

no longer affects: debian
awillson (awillson) wrote :

I think I found the file that needs updating...

htdocs/artefact/file/filetypes.xml

and add the following for MS OOXML compatiability...

    <filetype>
        <description>docx</description>
        <mimetypes>
            <mimetype>application/vnd.openxmlformats-officedocument.wordprocessingml.document</mimetype>
        </mimetypes>
    </filetype>
    <filetype>
        <description>pptx</description>
        <mimetypes>
            <mimetype>application/vnd.openxmlformats-officedocument.presentationml.presentation</mimetype>
        </mimetypes>
    </filetype>
    <filetype>
        <description>xlsx</description>
        <mimetypes>
            <mimetype>application/vnd.openxmlformats-officedocument.spreadsheetml.sheet</mimetype>
        </mimetypes>
    </filetype>

I can't replicate this on Mahara 1.6 on my local machine.

related: bug #1193158

Robert Lyon (robertl-9) wrote :

This problem might also be related to:

https://bugs.launchpad.net/mahara/+bug/1249166
https://bugs.launchpad.net/mahara/+bug/1249858

Where the mimetype is incorrectly set in the browser's registry and then mahara is not correctly sniffing out the true mimetype

Rebecca Blundell (rjb-dev) wrote :

I can't replicate the problem. I tried uploading docx files as part of a .zip and .tar archive which also contained other file formats, and in both cases Mahara treated the docx correctly as a single file. (Using master branch of Mahara, Firefox Quantum 58.0, php 7.1.13)

It's still a problem with .key files, Mac Keynote files, for example.

Setting this to "Won't fix" because it's a minor issue and someone trying to extract the file in their own files area would see that it would alter the files.

Changed in mahara:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers