save as complete page doesn't save images used in css

Bug #747197 reported by Teo on 2011-04-01
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Confirmed
Low
firefox (Ubuntu)
Medium
Unassigned

Bug Description

Binary package hint: firefox

When you save a web page with "File/Save As" and chose to save it as "web page, complete", an html file is saved and the image files, javascript files and css files are saved in a folder. However, background image files referenced in css style sheets are not retrieved and saved. There's no reason why they shouldn't.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: firefox 3.6.16+build1+nobinonly-0ubuntu0.10.04.1
ProcVersionSignature: Ubuntu 2.6.32-30.59-generic 2.6.32.29+drm33.13
Uname: Linux 2.6.32-30-generic i686
NonfreeKernelModules: nvidia
Architecture: i386
Date: Fri Apr 1 13:44:08 2011
FirefoxPackages:
 firefox 3.6.16+build1+nobinonly-0ubuntu0.10.04.1
 firefox-gnome-support 3.6.16+build1+nobinonly-0ubuntu0.10.04.1
 firefox-branding 3.6.16+build1+nobinonly-0ubuntu0.10.04.1
 abroswer N/A
 abrowser-branding N/A
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100429)
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: firefox

Save complete Webpage works great but not 100% correct. Try to download this page:
http://phpbb.sourceforge.net/phpBB2/viewforum.php?f=1

It seems to forget the table background images which are called in the CSS header.

To ben. This has been mentioned in the fora on mozillazine.org too.

This applies to non-css backgrounds too.

*** Bug 115532 has been marked as a duplicate of this bug. ***

see bug 115532 for more in the straight html for background tags.

The webbrowserpersist object doesn't save anything from the CSS whether inline
or not.

I don't know if it is possible to walk through, fixup and generate externally
linked and inline CSS (with the minimum of effort), but it's something I will
take for the time being.

Adding "(background images not saved)" to summary.

*** Bug 116660 has been marked as a duplicate of this bug. ***

Seen this on Linux 2001-12-20 (Slackware 8) changing OS to All

This is regarding the <td background=''> attribute.

I have a page I saved, from www.stomped.com as of today. using 12-27 build.. the
source has a <td background="images/trans.gif"> and Mozilla renders that
correctly. Now doing 'save page as', doesn't create a subdirectory of images
for example 'thedefaultsavedir'/www.stomped.com_files/images, and it doesn't
parse this tag attribute to retrieve the image: 'trans.gif' and save it to the:
'/www.stomped.com_files/images' subdirectory. 'Save page as' page source,
still outputs the default source tag as above, and when trying to access the
image upon viewing using open file > context menu > view background image, the
alert box shows it trying to access just 'thedefaultsavedir'/images/trans.gif.

There is also no doc type declared on this page.

just spoke w/ sarah about this. minusing.

*** Bug 120859 has been marked as a duplicate of this bug. ***

*** Bug 126307 has been marked as a duplicate of this bug. ***

*** Bug 128843 has been marked as a duplicate of this bug. ***

I suspect there may actually be two issues at work here; I've made a few
testcases to illustrate my point.

- Pages with <td background="foo"> aren't saved completely.
http://pantheon.yale.edu/~al262/mozilla/115107/tables/index.html
- Pages with foo{background-image:url(bar);} in the CSS aren't saved completely,
even if foo != td. http://pantheon.yale.edu/~al262/mozilla/115107/css/index.html
- However, pages with simple <body background="foo"> tags work fine.
http://pantheon.yale.edu/~al262/mozilla/115107/plain/index.html

Or maybe I'm missing something. Regardless, going to zip up the whole lot and
attach it.

Created attachment 72701
Test cases possibly showing independence of CSS bug and table bug

Testcases above, packaged up (brown paper packages tied up in string / these
are a few of my favorite things...)

Andrew, the <td background="foo"> thing should be a separate bug. See
http://lxr.mozilla.org/seamonkey/source/embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp#1807
-- we basically never check for background attrs on <td> nodes. That should be
simple to fix. If we don't have a bug on that already, file one on me.

Boris, at the moment I don't read C++, so I'm going to take your word for it.
There was a bug (Bug 115532) that seems to address your issue but it was marked
as a duplicate of this one (see this bug 115107 comment 3).

Neither TABLE nor TD elements have a background attribute (in HTML 4.0) which is
why it isn't fixed up. I could add fixup for such quirks if people feel the
practice is prevalent enough.

As for inline/external styles with url declarations such as this:

  BODY {
    color: black;
    background: url(http://foo.com/texture.gif);
  }

I don't believe there is much that can be done until we have a way to serialize
CSS from it's in-memory representation (DOM).

Adam, we _do_ have such a way. The CSSOM should allow you to walk the
document.styleSheets array, walk the .cssRules array for each sheet, check
whether a rule has a background image set in it, change the URL if so, save the
.cssText for each rule, recurse down @import rules, etc. I can't actually
think of any bugs we have blocking that sequence of actions.... If you _do_ run
into bugs in that code that block this, please let me know.

Boris, do you know where the CSS serialization code lives? It's not being done
the way other DOMs are serialized via the content encoder.

Heh. There is no pre-build serializer, yet. Each CSSRule object has a
GetCssText() method that returns an nsAString holding a serialization of the
rule (per DOM2 Style). The CSSOM-walking has to get done by hand,
unfortunately....

*** Bug 133725 has been marked as a duplicate of this bug. ***

*** Bug 157708 has been marked as a duplicate of this bug. ***

I have seen this bug as well. In my case, the CSS for a page contains:

body {
        background-color: #ffffff;
        margin: 1em 1em 1em 40px;
        font-family: sans-serif;
        color: black;
        background-image: url(/images/logo-left.gif);
        background-position: top left;
        background-attachment: fixed;
        background-repeat: no-repeat;
}

The background image (/images/logo-left.gif) is not saved, when it clearly
ought to be.

Rich.

Not only does the @import css not get saved to disk, when you later try to open
the saved html while the original server is unavailable, it takes forever to
load the local file while Mozilla waits for the css that never comes.

*** Bug 187590 has been marked as a duplicate of this bug. ***

38 comments hidden view all 198 comments
Teo (teo1978) wrote :
157 comments hidden view all 198 comments

Devs don't save pages at all as I see.
They don't pay attention to such important bugs as this bug and Bug 653522 - saving pages when offline (or using adblock) inundates the user with error dialogs:

https://bugzilla.mozilla.org/show_bug.cgi?id=653522

Why are major bugs in core functionality like this not fixed when instead we get garbage like microsummaries, geolocation, and DNS prefetching?

Wrt. comment#146: It's likely the problem that you and I don't pay the devs to do so, and the people who do, see more value in a read-only web and user tracking. It's also usually much easier to make something new than fix old bugs.

The problem is really that there's no good approach here with the current "Save As, Complete" model. If we fix this, we'll break the CSS working in other browsers.

A better solution would be replacing Save As, Complete (which attempts to rewrite the pages to fix up URLs) with a cross-browser package format for saved documents (which could save resources as they were).

(In reply to David Baron [:dbaron] from comment #148)
> If we fix this, we'll break the CSS working in other browsers.
Only for hacks and browser-specific properties. This is not a big deal. The same problem exists anyway for things like IE's conditional comments, or any kind of user-agent selectivity on the server, or indeed any other type of hack or browser-specific feature.

But obviously it's SO much better with the current situation that it works in no browsers at all!

Are you seriously more worried about browser-specific CSS features, which are probably expected to degrade gracefully anyway, than, oh I don't know, background-image!?

> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).
And you'd STILL need to rewrite the URLs. Come on, give it half a thimble of thought.

(In reply to Anonymous from comment #149)
> (In reply to David Baron from comment #148)
> > A better solution would be replacing Save As, Complete (which attempts to
> > rewrite the pages to fix up URLs) with a cross-browser package format for
> > saved documents (which could save resources as they were).
> And you'd STILL need to rewrite the URLs.
Let's suppose for the sake of argument that we used IE's .mht format. It doesn't rewrite the URLs. Instead, each entry in the file is associated with its original URL. (What I don't know is whether it would be possible to achieve this scheme in Gecko.)

(In reply to <email address hidden> from comment #150)
>Instead, each entry in the file is associated with its original URL.
Oh I see. Fair enough. But unless anyone thinks that the feature to save a web page as individually manipulable files should be completely removed, the bug is still not avoidable, and the URLs still need rewriting.

Just for the record, I do have code in BlueGriffon achieving a full rewrite
of all stylesheets attached to a given document, tweaking all URIs.
I needed it to be able to turn an arbitrary document into a reusable
well-packaged template. That code is based on my parser/serializer JSCSSP.
In other terms, a full rewrite is doable but requires considerable complexity
and footprint...

(In reply to David Baron [:dbaron] from comment #148)
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents

Modern pages have very big js files - 500kb or 1mb per page. I delete js files from folders of saved pages sometimes. And a lot of reasons to open folders of pages. And I will never use the browser with only archive format for save as.

Please don't add reasons not to use Gecko browsers. (Now I use SeaMonkey 1.1.19 as main browser only because of regression in save as function, which was developed by kind developers)) Bug 653522)

(In reply to David Baron [:dbaron] from comment #148)
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).

I use "Save As, Complete" to save pages complete to examine a source code. So I need a working local copy of the page, not just a package.

I agree with Emin, save the source as seen in view-source and rewrite urls to resources to a local folder next to the saved page. multiple request are definitely allowed. no reason to get any performance on this task as it is user invoked and the expectation is that the current document is saved as-is for various reasons, for later use. Not as firefox reads the document, nor as quickly as possible or to consume least space as possible - there is save document (not complete) for that.

Actually it's not enough. Complete save must save page as it is visible with all elements inserted by ajax (like comments) and if some elements are missing then they should remain missing (like advertisements with active adblock). Even all additional user CSS applied to the elements on the page must be preserved as-is.
Otherwise it is not complete.
I mean it have to crawl DOM and generate page from it instead of saving page's source with rewritten links.
I do agree there is no reason for fast performance but when you open such page it must resemble page which you have saved as much as possible. It could not be considered as complete if it will download anything from the web on open. Also I don't think preserving interactivity is expected behavior in this case. Users usually saves page to be able to read it later or send someone to read. So, it must be preserved as he seen it when decided to save it.

For investigation purposes there is separate applications like extension "DownThemAll!". You can configure how exactly it will save things and how deep will it go. It does even more then Emin requests. It not only makes local copy of the page - it could make local copy of the whole site or decent part of it.

Not exactly but kind of. In your example that code generates a screenshot of the site while complete copy of the page have to remain as text and linked objects to let you scale it, select, copy and everything else. The only thing which shouldn't work is interactivity. I mean scripts must not work.
BTW, Scrapbook does such a thing already. I'd just like to see ability to make complete stand-alone static copy of the page in the Fx itself.

Well, the example also does that. As does this: http://code.google.com/p/phantomjs/wiki/QuickStart#DOM_Manipulation but it might not be so clear.
This is the interesting bit:
var page = require('webpage').create(),
    url = 'http://lite.yelp.com/search?find_desc=pizza&find_loc=94040&find_submit=Search';

page.open(url, function (status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        var html = page.evaluate(function() {
            return document.innerHTML; // <- get all generated and styled HTML
        });
        console.log(html);
    }
    phantom.exit();
});
There is a similar project for Gecko called offscreen but I can't see how it's possible to get the generated HTML with this though.
Project Page: http://offscreengecko.sourceforge.net/
Source code: http://hg.mozilla.org/incubator/offscreen/

(In reply to David Baron [:dbaron] from comment #148)
> The problem is really that there's no good approach here with the current
> "Save As, Complete" model. If we fix this, we'll break the CSS working in
> other browsers.
>
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).

Consider how some browsers are closed source and written to the interests of their owners and how they implement either CSS or JS compared to the standards ... duh

On the other hand, I would strongly vote against such a method, one reason being the examination of various resource files related to a page.

I use an extension called Web Developer Toolbar. This extension has a neat feature called "view generated source". This contains all content generate/modified by JS as displayed. Based on this one could easily reference each resource file (image/css/etc) for saving.

Also, saving locally a webpage, in principle, does not imply that the entire webpage functionality will be preserved for the simple reason that some features require online interractivity with the source server, and this is impossible to replicate in a local save and offline viewing scenario.

*** Bug 852007 has been marked as a duplicate of this bug. ***

It would be nice to have that feature back. I also like saving pages for offline inspection later ("coding on the lake" and that sort of thing :).

This bug report/ticket exists for eleventh years.
Is there really no one interested in fixing this bug.......?

Some remarks regarding creation of 'faithful offline snapshot of displayed page' from browser:
- Aforementioned "Save complete" addon [0] seems no longer maintained.
- There is "Mozilla Archive Format" addon [1][2] which promotes 'Faithful save system' or 'Exact snapshot' feature, seems to work quite well. Maybe worth checking.
- (Parity) Opera browser is able to successfully save page with imported resources as well.

[0] https://addons.mozilla.org/en-US/firefox/addon/save-complete-4723/
[1] http://maf.mozdev.org/
[2] https://addons.mozilla.org/en-US/firefox/addon/mozilla-archive-format/

I apologize for the "me too" voice here.

dbaron, Are you saying that this bug is simply not fixable?

> If we fix this, we'll break the CSS working in other browsers.

But right now the CSS doesn't work in any browser. What would be the harm in just making it work in Firefox?

Well, it could break things that aren't broken today. Though now that prefixes are becoming less common that would be less of an issue, so I'm less worried about it than I was a few years ago.

Alternatively, we could do tokenization-level fixup, which actually might not be that bad, except it's a litte tricky for @import which takes strings in addition to url()s.

I feel inclined to develop a Firefox Add-On for this. To me, this seems overly simple.
1. Just take the download tree (e.g. as seen in Firebug ect.) and redownload it all.
2. Save all files to a folder as is (which is, if I remember correct, the current behavior).
3. Parse all css/js files and replace URI's to local path.
4. Have heaps of fun with script loaders (AMD et al.) making NO. 3 difficult.

There you have it. The page, saved in it's entirety.

Actually, the MAF add-on (https://addons.mozilla.org/en-us/firefox/addon/mozilla-archive-format/) does appear to do exactly that.

If you have it installed, Firefox Save complete page works much better.

@mkaply Yep, "Faithful to the original" is all I need.
Thanks

Removing "qawanted" since the need for an automated test is marked by "in-testsuite?".

Hello I'd like to be assigned to this bug please assign me.

Roughly what steps are you planning to take to fix this?

[Blocking Requested - why for this release]:

I read the whole thread, and since I'm new to the project i try implement solution proposed by the dotnetCarpenter. What do you think David?

Do you mean comment 167? That sounds like a proposal to rewrite the entire "Save As, Complete" feature, of which this bug (fixing up URIs in the CSS) would be just a small part. But it doesn't tell me anything about what you plan to do for the part covered by this bug: fixing up URIs in the CSS.

Created attachment 8599503
Bugzilla main page test case- original

Created attachment 8599504
Bugzilla main page test case- after save

This 13-year-old bug in basic functionality is surprising - why is this still broken?
I've attached a very local test demonstration from the Bugzilla main page, to help illustrate its status today.

UnMHT is on GPL licence ( https://addons.mozilla.org/fr/firefox/addon/unmht/license/7.3.0.5 ), is it not possible to use a part of code ? UnMHT is for me a great tool for save a complete page : HTML / CSS / JS / media

Hi all, here are my two cents... I hope it helps!

To sum up:
- I tried to use the "save as, complete web page" with https://slate.adobe.com/a/NzR3A/ and as a result background images and other CSS effects were missing.
- I tried to save this web page with https://addons.mozilla.org/en-us/firefox/addon/mozilla-archive-format/ in either MAF or MHT formats, and it failed (only the first background image and text were displayed).
- I tried to save this web page with https://addons.mozilla.org/fr/firefox/addon/unmht/ in MHT format and it worked perfectly!

In conclusion: a million thanks to UnMHT developers! :-)

*** Bug 1232103 has been marked as a duplicate of this bug. ***

191 comments hidden view all 198 comments
dino99 (9d9) wrote :

Closing that outdated report as EOL has been reached long time ago

Changed in firefox (Ubuntu):
status: New → Invalid
teo1978 (teo8976) wrote :

Is dino99 a bot or a retarded person?

I have just checked, and the issue described in this "outdated report" is still present in the latest version of Firefox on the current version of Ubuntu.

Before closing issues just because they are old, please check if they still exist!

P.S. I have seen a few other issues that I reported or starred closed by the same user in the last few days. I am not going to waste my time checking how many of them were wrongly closed like this one.

Someone should mass-undo all the issue closing changes made by dino99 lately and probably revoke him permissions to manage bugs.

Changed in firefox (Ubuntu):
status: Invalid → Confirmed
dino99 (9d9) on 2016-04-16
tags: removed: lucid
dino99 (9d9) wrote :

To the comments made above:

- verify how many users have warns and validated your report
- note that no one have wasted time to fix or commented on that bug
- if you still get these issues with a fresh install not disturted by old borked config/settings, then open a new report about bug or missing feature(s)/wrong design.
- dont forget to report upstream if necessary.

Changed in firefox (Ubuntu):
status: Confirmed → Invalid
teo1978 (teo8976) wrote :

> verify how many users have warns and validated your report

Not sure what you mean

> note that no one have wasted time to fix or commented on that bug

So what?? If nobody has fixed it yet, it means it shouldn't be fixed??

> if you still get these issues with a fresh install not disturted by old borked config/settings,

I f***ing told you I DO observe these issues with a current version. It does not need to be a fresh install. It's upgraded through the official channels. If there were any difference between that and a fresh install, that would be a bug in itself. But I'm pretty sure it's not the case.

> then open a new report

Why should I reopen a new report every time a new version is released if the bug is not fixed?? The original reports contains all the relevant information. You shouldn't have closed it without verifying that the issue was fixed (which it is not), and I have even verified again that it is not. So why waste time closing reports and opening new ones?

tags: added: wily
Changed in firefox (Ubuntu):
status: Invalid → New
dino99 (9d9) wrote :
Changed in firefox (Ubuntu):
status: New → Incomplete
teo1978 (teo8976) wrote :

I don't get what the link has to do with the issue.

Did you even read the report?? The issue is not the inability to download some "protected" content, it's just that when saving a complete webpage, urls used in CSS such as background images are not retrieved.

BTW, Last comment before doing what?

teo1978 (teo8976) wrote :

Oh, I now notice you just changed the status to "incomplete".
What is missing?

dino99 (9d9) wrote :

Can you compare with the httrack way on a page having issue ?

teo1978 (teo8976) wrote :

What's httrack?

Note that the issue is extremely easy to test, you should be able to compare whatever you want to compare by yourself (I don't mean to be rude, I'm just saying)

If you are still experiencing this bug in any currently Ubuntu release (https://wiki.ubuntu.com/Releases) then:

1. Enter that release first name into the tag list.
2. Set this bug status back to "confirmed".

> Thank you.

teo1978 (teo8976) wrote :

> 1. Enter that release first name into the tag list.

It's already there since comment #5

> 2. Set this bug status back to "confirmed".

Let's hope this time dino99 won't change it back without providing a reason.

Changed in firefox (Ubuntu):
status: Incomplete → Confirmed
Brian Murray (brian-murray) wrote :

I've followed the steps indicated in the bug description and confirm that this still happens on xenial with firefox version 45.0.1+build1-0ubuntu1.

tags: added: xenial
Changed in firefox (Ubuntu):
importance: Undecided → Medium
181 comments hidden view all 198 comments

*** Bug 1328204 has been marked as a duplicate of this bug. ***

*** Bug 1326669 has been marked as a duplicate of this bug. ***

Why, 16 years later, does Firefox still have a Save as "Webpage complete" option which does *not* save a complete webpage?

This is a fairly fundamental flaw as obviously people expect a "Webpage complete" option to save a complete webpage and it could cause serious problems when they discover later that it actually does not.

*** Bug 1599543 has been marked as a duplicate of this bug. ***

Changed in firefox:
importance: Unknown → Low
status: Unknown → Confirmed
Displaying first 40 and last 40 comments. View all 198 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.