MHTML Format - Web Archive Files - Standard not supported in Firefox

Bug #240133 reported by peter on 2008-06-15
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Won't Fix
Wishlist
firefox (Ubuntu)
Wishlist
Unassigned

Bug Description

Binary package hint: firefox-3.0

this location http://www.militarybadges.org.uk/badget11.mht

displays
This document is a Web archive file. If you are seeing this message, this means your browser or editor doesn't support Web archive files. For more information on the Web archive format, go to http://officeupdate.microsoft.com/office/webarchive.htm

I am using ubuntu 8.04

ProblemType: Bug
Architecture: i386
Date: Sun Jun 15 10:28:40 2008
DistroRelease: Ubuntu 8.04
NonfreeKernelModules: nvidia
Package: firefox-3.0 3.0~rc1+nobinonly-0ubuntu0.8.04.1
PackageArchitecture: i386
ProcEnviron:
 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
SourcePackage: firefox-3.0
Uname: Linux 2.6.24-18-generic i686

WORKAROUND <Thanks Thomas>:
http://www.unmht.org/unmht/en_index.html

In , Sidr (sidr) wrote :

Copied the following comment from bug 17309 (cc-ing contributor):
>------ Additional Comments From <email address hidden> 11/13/99 07:03 ------
>Authors could add the proprietary "important" keyword to the list of keywords
>in the rel attribute of the link element, e.g., rel="important stylesheet" (or
>rel="stylesheet important") to do what you want without resorting to RFC2557.

Yes, they could, but they would have no guarantee that Mozilla or any other
browser would either interpret that they way they want or implement the
behaviour they want in response, nor that a future version would not do
something slightly or markedly different.

Providing an rfc2557 MHTML mechanism would take care of the extreme case,
leaving room for a reasonable policy for "important stylesheet" that would
not necessarily mean "absolutely required" from this point forward.

Having said that, I absolutely would not advocate MHTML in the browser as the
*only* mechanism provided to authors to indicate how important or necessary
a stylesheet is, lest this feature get thought of by anyone as the only way
to go. I'd go so far as to say don't add the feature if nothing else is
provided as a fix for bug 17309.

Bulk move of all Necko (to be deleted component) bugs to new Networking

component.

I don't know how easy it'll be implement. Basically when we open the channel -
we would ask for /foo.html. If that contains 3 htmls, but not one - we'll have
to invent a way to deal with it.

Per warren's decision -> nobody

Putting on [nsbeta2-] radar.

Marking helpwanted since that's what I think was meant by "-> nobody".

Open Networking bugs, qa=tever -> qa to me.

This sounds like it would be a MAJOR step toward being able to save (and send)
entire HTML pages as ONE file to disk - great.

Suggest keyword: mozilla0.9.2

I created a tracking (meta) bug 82118 to track these kinds of bugs and to unify
the efforts.

Maybe a few duplicates will also become aparent this way - then we can assign
the keyword MostFreq.

Removing dependancy to bug 82118 as it should be the other way round (bug 82118
depends on this bug).

I'll try to implement this, but I do not know yet if I really have the skills
to do it. Be prepared that I may have to give this back to <email address hidden>.

My plan is roughly this:

- Implement a mhtml: protocol handler similar to the jar: handler.
- Implement a stream converter similar to the multipart/x-mixed-replace converter.
- Implement a method to control pending loads.

The stream converter would return the root resource within a mhtml channel and
put the other parts into a cache. On every page load we'd have to check if the
referring URI has a mhtml scheme and if so, translate the URI to be loaded into
a mhtml: URI.

The mhtml channel would simply fetch from cache if the requested resource is
available. If the containing multipart resource is still loading, it would
wait until it becomes available. If the requested resource wasn't included in
the multipart resource, try to get it using the original URI.

If the requested resource isn't in the cache and the containing multipart
resource is not currently loading, we'd have to load it using basically the same
mechanism the stream converter is using.

Assign to myself, not nobody.

Why do you need a new protocol handler? If I go to
   http://www.example.org/mydocument.mhtml
...I would want it to display right without changing the URI.

BTW, if you _do_ use your own protocol, then it should be called 'moz-mhtml' o
whatever, so as not to polute the protocol namespace.

Ian, somehow we must remember that we have an MHTML document if we don't want to
rewrite its links. URIs in MHTML documents can be the same as existing URIs
outside the MHTML document. If we rewrite the links (e.g. convert them to <cid:>
URIs) it is very likely that we break at least some JS.

It may be possible to keep the original URI for the root resource, but it would
require more changes to docshell. I do not intend to implement this in the first
step. Please file a bug on it once MHTML works.

If we display a resource other than the root resource (e.g. open a frame in a
new window), it does not make sense to keep the original URI and it does not
make sense to show the given URI, because the displayed document may be
different from a document with the same URI retrieved directly over the net.
Another approach would be to generate a Content-ID ourselves if the MHTML
document doesn't specify it and use <cid:> instead of <mhtml:>. But that would
be much more difficult to implement.

Name of the protocol: We do already pollute the protocol namespace (<jar:>,
<view-source:>, <about:>, <internal:>, <chrome:>, <resource:>, <javascript:>).
But if you think we shouldn't continue this it would be no problem to use
<moz-mhtml:>.

A clarification: If http://www.example.org/mydocument.mhtml has Content-Type:
multipart/related you could of course type that URI into the URL bar or use it
in a link. But it would then change to
mhtml:http://www.example.org/mydocument.mhtml!/ or
mhtml:http://www.example.org/mydocument.mhtml!/http://another.example.com/
(if the root resource has Content-Location: http://another.example.com/ ).
This is similar to an HTTP redirection.

In , Sidr (sidr) wrote :

Clarence, first, thanks for giving this a try. From a quick look at my inbox,
at least some HTML mail uses multipart/related, instead of multipart/mixed,
so MailNews may already have some of the code you need.

As a start, try this LXR query:
  http://lxr.mozilla.org/seamonkey/search?string=multipart%2Frelated
and especially look at
http://lxr.mozilla.org/seamonkey/source/mailnews/mime/src/mimemrel.cpp#29
and following, where <email address hidden> has some implementation notes about
how to handle multipart/related data.

I know the MHTML code for mail. But I think it needs nearly a complete rewrite
to work outside of mail and to support all HTML features (e.g. frames).
The first implementation note in mimemrel.cpp describes basically the way I'm
going to implement this.

What's going on with this bug?

*** Bug 108329 has been marked as a duplicate of this bug. ***

now i see that this bug is targetted for mozilla 1.1
i really hope that it won't be postponed again.
thanks.

I'm finding content on the net encoded in this format with .mht file
extensions...and I'd like to use the format myself to encapsulate saved web
pages...hmmmmmmmmm, for that matter, file->"save as" {c,sh}ould save the file
and it's images in one blob. this seems like a half-decent-enough format
(except that it mime-encodes images so they bloat)...okay, then .mht.gz...(yeah,
there's .war too [from konqueror])

target milestone 1.1alpha is out of date...
Is there any progress going on?

*** Bug 176054 has been marked as a duplicate of this bug. ***

*** Bug 177713 has been marked as a duplicate of this bug. ***

Is this networking of file handling?

Pardon me if I'm wrong, but it seems to be that this is under both networking
and filehandling.

My main want is to be able to open a single file (whether it be downloaded from
an ftp site or opened from my desktop) with all graphics, stylesheets, and html
included.

In , 3-14 (3-14) wrote :

Is there a testcase available somewhere?

pi

Sorry, I haven't (time to) read RFC2557, but I'd like to make a wish:
when MHTML is opened (in browser), it would be nice if the From, Date, etc
fields aren't displayed.

Excuse-me if this wish isn't adequate and doesn't seem to be within topic of
this bug because my bug got being marked as a dup of this.

Created an attachment (id=113008)
This Page as MS MHTML

This page saved with MS IExplorer (6)

Created an attachment (id=113016)
More Complex Testcase with Frames

After upgrading from MS Internet-Exploder to Mozilla (security reasons) I
really miss the MS-Feature of saving complete Webpages into one single File.
With a huge collection of documents on your hard drive it matters very much how
the files are organized and structured. I hope this issue will get a higher
priority. I wonder why Netscape/Mozilla did not make a progress in that
direction for years.
Frank

One of the main reasons I want such a thing is so that I can use Mozilla and its
composer for writing general reports.

Basically I propose that there are few examples of report styles, formats, and
uses that wouldn't be entirely handled by HTML/XML/CSS/etc. There just aren't
any particularly good front ends for writing these reports.

Anyway, I want to be able to view, print, and edit my reports on computers
without bothering with a word processor at all. I SHOULDN'T need anything other
than Mozilla, but this bug makes transmission of such documents more cumbersome.

In the end this is one of those feature requests where the potential uses are
nearly limitless in number.

I tried out Mozilla a couple of times. But the most important reason for me to
stay with IntExp is the lack of saving web pages into a single file.
I wonder wether there is no progress on this topic for years.
It should be easy to implement this feature in comparison to other projected
items on the todo list.

Thomas

As a developer I can tell you it's not that easy ! However I think Mozilla
developers could use some help so here goes a useful link :
http://www.codeproject.com/shell/IESaveAs.asp

The article contains useful information about IE and its so famous (^^) save as
MHTML feature . Developers should also read the user comments.

>somehow we must remember that we have an MHTML document if we don't want to
>rewrite its links. URIs in MHTML documents can be the same as existing URIs
>outside the MHTML document. If we rewrite the links (e.g. convert them to <cid:>
>URIs) it is very likely that we break at least some JS.

How does IE do it?

Me, and my employer, are interested in the implementation of this feature.
If there is no one working on it, or if there is someone working on this having
trouble, I'd like to give it a try.
So, feel free to contact me with pointers about how to go about it.

I changed from IE to Mozilla just for security reasons. I'm still missing this
nice MHTML feature. There are many HTML documents with important inline
graphics, like pages with embedded math formulas as GIF or graphs.
For me it's not important that all features of a web page are preserved.
Javascript can be broken, that's not important to me as it's used mostly for
advertisements. Also I don't care much about CSS as the content is more
important to me as a correct layout. This topic is discussed now for more than 4
years. So, maybe a simple approach at the beginning would be sufficient. The
mail component is using already a similar functionality. Javacript/CSS, external
Link and Layout optimizations can be made later.

(In reply to comment #37)
> I changed from IE to Mozilla just for security reasons. I'm still missing this
> nice MHTML feature. There are many HTML documents with important inline
> graphics, like pages with embedded math formulas as GIF or graphs.
> For me it's not important that all features of a web page are preserved.

Actually Moz… is at least capable of viewing *.mht files created by IE. I tried
following.

1. in IE. Opened a web page with graphics
2. saved it as *.mht file
3. Opened a new message in Thunderbird
4. Attached the *.mht file
5. save the message as draft
6. view the saved draft message in preview pane
7. I am able to see the complete web page with graphics and css

If Thunderbird is capable of viewing a *.mht Mozilla.org has code to show it in
the browser.

But I dont whether there is code to save it!!

Alternative, Mozilla is capable of viewing contents inside a zip file (including
pages with graphics and css). So why not make a XPCOM component to update zip,
then extension developers can use that to make a single file achieving facility.

See topic http://forums.mozillazine.org/viewtopic.php?p=442473

When I should make a ranking about the most important improvements in Mozilla
browser, this one would be on Nr. 1.
I like mozilla, but not for the fact that it makes my document folders
absolutely chaotic with all the subfolders of page contents - really stupid.
Maybe we should convince some active developers to look at this issue and forget
about other things like making Mozilla look even more beautiful. Unfortunately
the Internet Explorer is no alternative to me because of the security problems.
Otherwise I would have changed back again, because IE is able to properly save
web pages.

At present mozilla allow to view STUFF from a zip file.
STUFF may be a web page saved as "Web page, complete" format.

see following

jar:http://www.geocities.com/bijumaillist/mozilla/mozilla.zip!/mozilla.htm

(if your are unable to see it try
http://geocities.com/bijumaillist/go.html#jar:http://www.geocities.com/bijumaillist/mozilla/mozilla.zip!/mozilla.htm
or the snipped url http://snipurl.com/56pg )

Now all we need is an option to zip the contents of "Web page, complete" format
while saving.

(PS: at present the mozilla zip services dont allow to add/update file in a zip
file)

one advatage of zip format over mhtm is we can access content using any ziptool
and it is a non XML format.
disadvantage is zip file format dont store mime-type info of contained files
to resolve this we could store an additional content list file (say content.lst)
which list file names inside the archive and its mime-type.
content.lst should also contain an entry to indicate the root html file.
content.lst should NOT be in XML format.
XML is difficult to process using shell script

33 comments hidden view all 113 comments

Thank you for reporting, but this is not real bug. I'm converting this to question.

Changed in firefox-3.0:
status: New → Invalid
Thomas Kluyver (takluyver) wrote :

Strictly, I think this is a bug, or at least a possible wishlist item. I've linked it to Firefox's bug tracker. But since that has been around since 1999, don't expect any immediate action. I don't think many people now use MHT--even the officeupdate link it points you to is defunct.

Workaround: http://www.unmht.org/unmht/en_index.html

Changed in firefox:
status: Unknown → Confirmed

Bug Watch Updater wrote:
> ** Changed in: firefox
> Status: Unknown => Confirmed
>
>
For future reference if you see its a site issue as this warning implies
please report a bug using Help > Report a Broken Website because there
isnt anything we can do to work around websites.

--
Sincerely Yours,
    John Vivirito

https://launchpad.net/~gnomefreak
https://wiki.ubuntu.com/JohnVivirito
Linux User# 414246

John Vivirito (gnomefreak) wrote :

Vojt?ch Trefný wrote:
> Thank you for reporting, but this is not real bug. I'm converting this
> to question.
>
> ** Changed in: firefox-3.0 (Ubuntu)
> Status: New => Invalid
>
> ** bug changed to question:
> https://answers.edge.launchpad.net/ubuntu/+source/firefox-3.0/+question/36336
>
>
Yes this is a very real bug but not a Ubuntu bug its an upstream bug due
to a broken website. the more sites that we report upstream using Help
>report a Broken Website there is a bigger chance of these sites getting
fixed. There is no text on this bug that makes me thing it is a
question. I will close question

--
Sincerely Yours,
    John Vivirito

https://launchpad.net/~gnomefreak
https://wiki.ubuntu.com/JohnVivirito
Linux User# 414246

I just tried to reproduce htis with the link given and i see it as text document on the site given. the attachment is what i see when i open that page.I dont see any pictures or anything except text.

This has also been reported to Launchpad bug tracker for Ubuntu.
https://bugs.launchpad.net/ubuntu/+source/firefox-3.0/+bug/240133

John Vivirito (gnomefreak) wrote :

Marked as confirmed and moved back to bug report.

Changed in firefox-3.0:
status: Invalid → Confirmed

*** Bug 445761 has been marked as a duplicate of this bug. ***

*** Bug 448960 has been marked as a duplicate of this bug. ***

Actually, not only being able to save the page in mhtm format is helpful but even sending the page from the webserver is equally useful. This feature would drastically cut down time taken for loading multiple web requests for different resources and the overhead of creating and breaking down the connection. Considering it is such a useful and powerful feature, it is surprising that even google's chrome has not supported it.

Hi, is it correct that this bug is in the 'Networking' component?

When can this bug be finally fixed?

Those who are still running into periodic need to view .mht files (like those my HR insists on sending me when they find a resume for me on the web) may be interested to know about this add-on:
http://www.unmht.org/unmht/en_index.html

I'm not sure when it came on the scene, but it's now indispensible... Seemed to work pretty well in all occasions I've had to try it so far.

*** Bug 471270 has been marked as a duplicate of this bug. ***

@John Vivirito
This hasn't been fixed as of FF 3.5b5pre

description: updated
summary: - "If you are seeing this message, this means your browser or editor
- doesn't support Web archive files"
+ MHTML Format - Web Archive Files - not supported in Firefox
Changed in firefox-3.0 (Ubuntu):
importance: Undecided → Low
status: Confirmed → Triaged

*** Bug 509285 has been marked as a duplicate of this bug. ***

Firefox 3.0 is only receiving Security Updates and major bug fixes at this point.

Changed in firefox-3.0 (Ubuntu):
importance: Low → Wishlist
status: Triaged → Won't Fix
Micah Gersten (micahg) wrote :

Moving tracking to Firefox 3.5

Changed in firefox-3.5 (Ubuntu):
importance: Undecided → Wishlist
status: New → Triaged

Just tried the add-on. I have to give it a "thumb up"!
... Although there's pitfall to avoid: conflict with "IE tab" and have to disable a special URL. It's written in the webpage... at about the very last part of it (not easy to spot it if one has no idea what to look for)

*** Bug 538108 has been marked as a duplicate of this bug. ***

I need that feature.

I think this bug after 11 years should be WONTFIX, given the addon mentioned in comment 78 works great, and this is clearly not on the developers' priority list. I voted for it but I'm well aware that this is not a feature wanted or needed by the vast majority of users. Parity with IE is not always a good enough reason to spend time developing a feature. Especially when there are addons capable of doing the job.

(in reply to comment 84)
I doubt very much that a bug with as many as 165 votes, continuous requests for a period of 11 years so far and with recent duplicates still coming in is a likely candidate for wontfix. Maybe fix would be better.

Addons can't replace vital core functionality. Many users will never bother installing addons, but they still need and expect the functionality.

Michael, where's your data to show this isn't needed by many users?

Finally, please consider that the lack of "parity with IE" in this case means that users who already use Firefox might be tempted to switch back to IE both for viewing and saving all-in-one rfc2557 MHTML files. I'll take it a step further and state that both MHTML and .maff format should be natively supported by the browser. Only after many years of using FF did I discover .maff add-on, and I'm not very shy of addons. Current default of saving html pages as a "file + loads of files in subfolder" set is very impractical and resource-wasting. Let alone all the problems you can get when copying over-long file paths resulting from saved html files. Mozilla should really do better.

(In repetition of comment #76)
> Is it correct that this bug is in the 'Networking' component?

MHTML is not an approved standard. It is a Microsoft idea that other browser developers have followed. Whether Firefox follows the trend or not is a choice. If they don't, it is not a bug. We already have Zip to archive web pages and related objects. Using an add-on to do the archive from within Firefox is a convenience not a bug fix.

(In reply to comment #87)
> it is not a bug.
> is a convenience not a bug fix.

That is why this "bug" is an "enhancement" (with 164 votes).

165 Votes!

I don't really care who invented it or whether it is a bug or an enhancement. All

I know is it's something I would find very very useful and I would really like to see it integrated into FF.

The reason I don't 'need' it is because I just use Internet Explorer every time I want to save a single page as MHTML, but I'd rather not have to do that! In an ideal world all browsers would support this as a standard (drops dead laughing).

I use this feature frequently in Opera, because my PCs all have different OSes.

I've recently read on Planet Mozilla that Fennec will imply a "Save as PDF" feature for the next release, in order to make saving websites easy on mobile platforms. Since Firefox and Fennec share some amount of code, it might be possible to re-prioritize this issue in order to make it a better alternative for "Save as PDF", as MHTML is more open format than PDF.

http://madhava.com/egotism/archive/005045.html - Since it is impossible to comment on this post, it would be nice if someone can contact him and notifying on this RFE.

Huh. Looks like Fennec won't be much of a threat for now, then. Perhaps if it saved as PDF and then emailed it, we may have a killer feature. His screenshot also exposes the lingering "unknown size" bug.

So Fennec will be able to save files as a type it can't even read? Sounds counter-intuitive. MHT is not perfect, but wider adoption will force the standard to make some improvements of its own.

Changed in firefox:
importance: Unknown → Wishlist

The UnMHT extension does this work, can we integrate it into Firefox?

Provide a patch, write tests, ask for review, address review comments, let it land, done.

Before someone wastes their time I actually tried to implement this back in 2008 or 2009 and ran into unexpected problems. This task isn't as trivial as it at first seems. Let's just say that you can't just take UnMHT or Thunderbird and make it work in Firefox.

*** Bug 603476 has been marked as a duplicate of this bug. ***

DirectuX (cacquarante) on 2013-01-26
summary: - MHTML Format - Web Archive Files - not supported in Firefox
+ MHTML Format - Web Archive Files - Standard not supported in Firefox

I've found that Chromium does have MHTML support[1], although it is still marked as experimental and require manual toggling in its configuration page. Supporting MHTML would allow us to easily make desktop HTML5 portable applications, and I think it would be more useful to our users and more reflecting our mission to support the web, than, for example, building our own built in PDF viewer.

[1] https://codereview.chromium.org/7064044/ - They also have bunch of resolved and unresolved issues on https://code.google.com/p/chromium/issues/list?can=1&q=MHTML

I think this is a very useful feature. When you save a webpage with a view to read it outline or later, it is more convenient to have a single file.

(In reply to Lance Baker from comment #87)
> MHTML is not an approved standard. It is a Microsoft idea that other browser
> developers have followed.
Among the authors of the RFC2557, only one works for Microsoft. Moreover, it is not like if MHTML is a closed file format: the specification is public and part of IETF's work.

(In reply to Lance Baker from comment #87)
> MHTML is not an approved standard. It is a Microsoft idea that other browser
> developers have followed. Whether Firefox follows the trend or not is a
> choice. If they don't, it is not a bug. We already have Zip to archive web
> pages and related objects. Using an add-on to do the archive from within
> Firefox is a convenience not a bug fix.

It's a specification approved as "proposed standard" by the IETF. Just like many other things the internet runs on.

*** Bug 1028603 has been marked as a duplicate of this bug. ***

Adolfo Jayme (fitojb) on 2015-10-02
Changed in firefox (Ubuntu):
status: New → Triaged
importance: Undecided → Wishlist
no longer affects: firefox-3.0 (Ubuntu)
no longer affects: firefox-3.5 (Ubuntu)
Changed in firefox:
status: Confirmed → Won't Fix
Displaying first 40 and last 40 comments. View all 113 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.