MS Office files being seen as zip archives

Bug #1302251 reported by Aaron Wells on 2014-04-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mahara
High
Robert Lyon
1.6
High
Robert Lyon
1.7
High
Robert Lyon
1.8
High
Robert Lyon
1.9
High
Robert Lyon

Bug Description

Now that we've got file_mime_type() working with the PHP fileinfo library correctly, it has caused a problem. Microsoft Office 2007+ "docx" files are recognized as zip archives by fileinfo!

So when users upload a .docx file into Mahara, they see a ZIP icon, and they have the option to decompress the archive. Which they should not.

Aaron Wells (u-aaronw) wrote :

The current workaround is to disable fileinfo. You can do this by setting $cfg->pathtomagicdb to boolean false. However, if you've got the mime_content_type PHP library installed, it'll fall back to that as well, and that also identifies docx files as ZIP files. So what you have to do is apply an invalid pathtomagicdb value. This will cause Mahara to attempt to use fileinfo, fail, and fall back to identifying the filetype based on its file extension.

So to do that, you would put something like this in your config.php:

 $cfg->pathtomagicdb = '/dev/null';

Aaron Wells (u-aaronw) wrote :

What should we do to fix this? Well, probably the best thing is to just copy the Moodle approach. They only use the file extension to identify files, and they have a pretty large list of known file types. Additionally, we could make this user-extensible, allowing sites to identify other types of obscure or unusual files that their students are uploading.

My only worry is whether this might have any security ramifications. But I think we're pretty safe, because of the limited number of mimetypes that we serve the content back out as. Additionally, we provide the option to pass file uploads through clamav, which should pick up any malicious file uploads.

If trusting the file extension is in general a security issue, then what we could do is just have a list of extension-based exceptions. For instance, if the mimetype detected is zip, then we check the file extension and see that a zip that ends in .docx should be treated as a Word document rather than a zip file.

Mahara Bot (dev-mahara) wrote :

Patch for "master" branch: https://reviews.mahara.org/3212

Reviewed: https://reviews.mahara.org/3211
Committed: http://gitorious.org/mahara/mahara/commit/122968ddc09e1bcc98b3d68798a29ae629f8356c
Submitter: Son Nguyen (<email address hidden>)
Branch: master

commit 122968ddc09e1bcc98b3d68798a29ae629f8356c
Author: Robert Lyon <email address hidden>
Date: Tue Apr 8 13:38:07 2014 +1200

Allowing the .zip files to be detected by file extension (Bug #1302251)

A number of filetypes are being detected as a zip file, due to them
having compression.

To sort this out we will let the file extension be checked first then
fall back to checking magic.mgc.

Change-Id: Iab6ddbc17af4018cf381c4949172804acff49483
Signed-off-by: Robert Lyon <email address hidden>

Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/3212
Committed: http://gitorious.org/mahara/mahara/commit/5db61d74a2fb4e0410f5262299d3fe5b0d8553f1
Submitter: Robert Lyon (<email address hidden>)
Branch: master

commit 5db61d74a2fb4e0410f5262299d3fe5b0d8553f1
Author: Robert Lyon <email address hidden>
Date: Wed Apr 9 07:26:55 2014 +1200

Adding in more mimetypes for fuller list (Bug #1302251)

Got the extra mimetypes from the list that moodle uses

Change-Id: I0794df03a7742226e0520aff5bdb86519a3dd67b
Signed-off-by: Robert Lyon <email address hidden>

Robert Lyon (robertl-9) on 2014-04-14
Changed in mahara:
assignee: nobody → Robert Lyon (robertl-9)

Reviewed: https://reviews.mahara.org/3247
Committed: http://gitorious.org/mahara/mahara/commit/4668f108120748f3a89ab9a091dce1fd71ada897
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.6_STABLE

commit 4668f108120748f3a89ab9a091dce1fd71ada897
Author: Robert Lyon <email address hidden>
Date: Tue Apr 8 13:38:07 2014 +1200

Allowing the .zip files to be detected by file extension (Bug #1302251)

A number of filetypes are being detected as a zip file, due to them
having compression.

To sort this out we will let the file extension be checked first then
fall back to checking magic.mgc.

Change-Id: Iab6ddbc17af4018cf381c4949172804acff49483
Signed-off-by: Robert Lyon <email address hidden>

Mahara Bot (dev-mahara) wrote :

Patch for "1.8_STABLE" branch: https://reviews.mahara.org/3249

Mahara Bot (dev-mahara) wrote :

Patch for "1.8_STABLE" branch: https://reviews.mahara.org/3250

Mahara Bot (dev-mahara) wrote :

Patch for "1.9_STABLE" branch: https://reviews.mahara.org/3252

Robert Lyon (robertl-9) on 2014-04-14
Changed in mahara:
milestone: 1.9.0 → 1.10.0
no longer affects: mahara/trunk

Reviewed: https://reviews.mahara.org/3248
Committed: http://gitorious.org/mahara/mahara/commit/3d90c60f764661133a8773d565b10cb0b7155d3f
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.7_STABLE

commit 3d90c60f764661133a8773d565b10cb0b7155d3f
Author: Robert Lyon <email address hidden>
Date: Tue Apr 8 13:38:07 2014 +1200

Allowing the .zip files to be detected by file extension (Bug #1302251)

A number of filetypes are being detected as a zip file, due to them
having compression.

To sort this out we will let the file extension be checked first then
fall back to checking magic.mgc.

Change-Id: Iab6ddbc17af4018cf381c4949172804acff49483
Signed-off-by: Robert Lyon <email address hidden>

Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/3249
Committed: http://gitorious.org/mahara/mahara/commit/6343716953f4fcc8c7d2994b61d4258a83fe4a51
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.8_STABLE

commit 6343716953f4fcc8c7d2994b61d4258a83fe4a51
Author: Robert Lyon <email address hidden>
Date: Tue Apr 8 13:38:07 2014 +1200

Allowing the .zip files to be detected by file extension (Bug #1302251)

A number of filetypes are being detected as a zip file, due to them
having compression.

To sort this out we will let the file extension be checked first then
fall back to checking magic.mgc.

Change-Id: Iab6ddbc17af4018cf381c4949172804acff49483
Signed-off-by: Robert Lyon <email address hidden>

Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/3250
Committed: http://gitorious.org/mahara/mahara/commit/096928e4314b659edee2e40a8166f865df137da3
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.8_STABLE

commit 096928e4314b659edee2e40a8166f865df137da3
Author: Robert Lyon <email address hidden>
Date: Wed Apr 9 07:26:55 2014 +1200

Adding in more mimetypes for fuller list (Bug #1302251)

Got the extra mimetypes from the list that moodle uses

Change-Id: I0794df03a7742226e0520aff5bdb86519a3dd67b
Signed-off-by: Robert Lyon <email address hidden>

Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/3251
Committed: http://gitorious.org/mahara/mahara/commit/ee2365125e11ea34f69c40643556899b6c9ff22c
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.9_STABLE

commit ee2365125e11ea34f69c40643556899b6c9ff22c
Author: Robert Lyon <email address hidden>
Date: Tue Apr 8 13:38:07 2014 +1200

Allowing the .zip files to be detected by file extension (Bug #1302251)

A number of filetypes are being detected as a zip file, due to them
having compression.

To sort this out we will let the file extension be checked first then
fall back to checking magic.mgc.

Change-Id: Iab6ddbc17af4018cf381c4949172804acff49483
Signed-off-by: Robert Lyon <email address hidden>

Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/3252
Committed: http://gitorious.org/mahara/mahara/commit/dd5112b11f2d6910f3e61ca357f0abed22c4c6fb
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.9_STABLE

commit dd5112b11f2d6910f3e61ca357f0abed22c4c6fb
Author: Robert Lyon <email address hidden>
Date: Wed Apr 9 07:26:55 2014 +1200

Adding in more mimetypes for fuller list (Bug #1302251)

Got the extra mimetypes from the list that moodle uses

Change-Id: I0794df03a7742226e0520aff5bdb86519a3dd67b
Signed-off-by: Robert Lyon <email address hidden>

Robert Lyon (robertl-9) on 2014-04-14
Changed in mahara:
status: Confirmed → Fix Committed

Reviewed: https://reviews.mahara.org/3253
Committed: http://gitorious.org/mahara/mahara/commit/b6a4a06e16db5aff6f0eca6a304f8e44e6f1417a
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.7_STABLE

commit b6a4a06e16db5aff6f0eca6a304f8e44e6f1417a
Author: Robert Lyon <email address hidden>
Date: Wed Apr 9 07:26:55 2014 +1200

Adding in more mimetypes for fuller list (Bug #1302251)

Got the extra mimetypes from the list that moodle uses

Change-Id: I0794df03a7742226e0520aff5bdb86519a3dd67b
Signed-off-by: Robert Lyon <email address hidden>

Son Nguyen (ngson2000) wrote :

Here is the list of file extensions for MS Office documents

Word: doc, dot, docx, docm, dotx, dotm
Excel: xls, xlt, xlm, xlsx, xlsm, xltx, xltm, xlsb, xla, xlam, xll, xlw
PowerPoint: ppt, pot, pps, pptx, pptm, potx potm, ppam, ppsx, ppsm, sldx, sldm

We need to correct the file mime type of these files in existing database.

Reviewed: https://reviews.mahara.org/3348
Committed: http://gitorious.org/mahara/mahara/commit/78a1ee3adf7552c32addfcb2df020857ad4f14a5
Submitter: Robert Lyon (<email address hidden>)
Branch: master

commit 78a1ee3adf7552c32addfcb2df020857ad4f14a5
Author: Aaron Wells <email address hidden>
Date: Mon May 12 14:32:50 2014 +1200

Fix file artefacts that were incorrectly identified as ZIP files

Bug 1302251

Change-Id: I77fcbd9c534afa26296badca0ccb33f197eca648

Mahara Bot (dev-mahara) wrote :

Patch for "1.8_STABLE" branch: https://reviews.mahara.org/3351

Mahara Bot (dev-mahara) wrote :

Patch for "1.7_STABLE" branch: https://reviews.mahara.org/3352

Mahara Bot (dev-mahara) wrote :

Patch for "1.6_STABLE" branch: https://reviews.mahara.org/3353

Reviewed: https://reviews.mahara.org/3353
Committed: http://gitorious.org/mahara/mahara/commit/80ca7a6445b557d0c56c1fcb4ac89de8bbde9d23
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.6_STABLE

commit 80ca7a6445b557d0c56c1fcb4ac89de8bbde9d23
Author: Aaron Wells <email address hidden>
Date: Mon May 12 14:32:50 2014 +1200

Fix file artefacts that were incorrectly identified as ZIP files

Bug 1302251

Change-Id: I77fcbd9c534afa26296badca0ccb33f197eca648

Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/3352
Committed: http://gitorious.org/mahara/mahara/commit/064c6fe60d0c9a2d96706dd394d47efe97edb337
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.7_STABLE

commit 064c6fe60d0c9a2d96706dd394d47efe97edb337
Author: Aaron Wells <email address hidden>
Date: Mon May 12 14:32:50 2014 +1200

Fix file artefacts that were incorrectly identified as ZIP files

Bug 1302251

Change-Id: I77fcbd9c534afa26296badca0ccb33f197eca648

Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/3351
Committed: http://gitorious.org/mahara/mahara/commit/8cc1c2ac3605444172891d5f3e6b4beb45dfc770
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.8_STABLE

commit 8cc1c2ac3605444172891d5f3e6b4beb45dfc770
Author: Aaron Wells <email address hidden>
Date: Mon May 12 14:32:50 2014 +1200

Fix file artefacts that were incorrectly identified as ZIP files

Bug 1302251

Change-Id: I77fcbd9c534afa26296badca0ccb33f197eca648

Mahara Bot (dev-mahara) wrote :

Reviewed: https://reviews.mahara.org/3350
Committed: http://gitorious.org/mahara/mahara/commit/8874de3930b784ede5efda3c200372c65e287595
Submitter: Robert Lyon (<email address hidden>)
Branch: 1.9_STABLE

commit 8874de3930b784ede5efda3c200372c65e287595
Author: Aaron Wells <email address hidden>
Date: Mon May 12 14:32:50 2014 +1200

Fix file artefacts that were incorrectly identified as ZIP files

Bug 1302251

Change-Id: I77fcbd9c534afa26296badca0ccb33f197eca648

Aaron Wells (u-aaronw) on 2014-10-21
Changed in mahara:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers