Locally saved webpages not displaying correctly

Bug #350407 reported by James Hurley on 2009-03-28
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Confirmed
Medium
firefox (Ubuntu)
Low
Unassigned

Bug Description

WORKAROUND:
Firefox Scrapbook Addon: https://addons.mozilla.org/en-US/firefox/addon/427

----------------------------------------------------------------------------------

Firefox Information: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko/2009032711 Ubuntu/8.04 (hardy) Firefox/3.0.8

Web pages that have been saved locally are not displaying correctly. To see this problem run the following steps:

1). Go to the jwebunit website ( http://jwebunit.sourceforge.net/ ). Suggesting saving a page from this website as a test since this provided a very good example of the problem.

2). Right click on the main web page and select "Save As..."

3). Save the web page somewhere on the local drive. On Ubuntu there is an option in the save dialog specifying how to save the web page. Make sure that "Web page, complete" is selected.

4). Navigate to the location where the web page was stored and attempt to load the saved page back into firefox

At this point the web page that is loaded from the hard disk should display very differently than the real web page. Styles and elements of the original web page should be missing. I also tested this same process on Windows XP SP2 (using firefox 3.0.8) and I see the exact same behavior. This seems to eliminate the problem being caused by local browser settings (since I see the same thing under Windows and Linux on two separate machines with totally different preferences). Furthermore I was able to run steps 1 - 4 using Opera 9.64 and this produces a saved web page that displays identically to the web page navigated to on the internet.

...Additionally I just tried loading the webpage saved locally using opera and this loads into firefox correctly. So the problem appears to be with the actual web page save using firefox and not the fact that the page is saved locally.

ProblemType: Bug
Architecture: i386
Date: Sat Mar 28 11:36:15 2009
DistroRelease: Ubuntu 8.04
Package: firefox-3.0 3.0.8+nobinonly-0ubuntu0.8.04.2
PackageArchitecture: i386
ProcEnviron:
 PATH=/home/username/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: firefox-3.0
Uname: Linux 2.6.24-23-generic i686

Update - Looks like this bug report may have gone to the wrong place. I was expecting this to be submitted against firefox instead of Ubuntu... Apparently selecting --> Help --> Report a Problem inside of the firefox web browser sends bug reports to Ubuntu.

-------------

Ok, please disregard this bug report; I've got this submitted to the correct place now.

To adam. This is not a duplicate, I think, but bug 115107 needs to get fixed
first.

Another page with the same problem :
http://webnouveau.net/

*** Bug 162108 has been marked as a duplicate of this bug. ***

Hasn't the future arrived by now? How hard would this be to fix? Can we please
have some attention to this soon?

*** Bug 202737 has been marked as a duplicate of this bug. ***

*** Bug 223406 has been marked as a duplicate of this bug. ***

*** Bug 224586 has been marked as a duplicate of this bug. ***

*** Bug 225009 has been marked as a duplicate of this bug. ***

*** Bug 235791 has been marked as a duplicate of this bug. ***

*** Bug 236069 has been marked as a duplicate of this bug. ***

Also Save Page As does not flatten the url() items in the CSS file(s).

In the HTML, an image element is correctly flattened, e.g.
    <img src="http://www.somewhere.com/logo.png">
becomes
    <img src="saveasfolder/logo.png">

In CSS, images should also be flattened - but are not. E.g.
    body { background-image: url(http://www.somewhere.com/background.png); }
should become
    body { background-image: url(saveasfolder/background.png); }

The same applies for the other syntactic variants of the same thing (i.e. with
or without ' quote marks, with or without the url() lexical token.

Rick :-)

rick: that's bug 115107, which oddly enough is listed in the dependencies. in
the future, please at least check depencies before you comment about something
which you think might be related. ideally you'd do a normal bug search....

*** Bug 237106 has been marked as a duplicate of this bug. ***

This page is an example of the problem:

http://weblogs.mozillazine.org/hyatt/

In that case, it is exacerbated by the fact that a JavaScript script selects the
css file to use (after user action). These alternate css files are not saved.

similar to not saving CSS images:
CSS not fixed up by webbrowserpersist (background images not saved)
http://bugzilla.mozilla.org/show_bug.cgi?id=115107

there is a work in progress patch there, but hasn't been rolled into nightly
build as of last month (may2004)

*** Bug 252392 has been marked as a duplicate of this bug. ***

please see also http://bugzilla.mozilla.org/show_bug.cgi?id=115107#c67

voting for both bugs

Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040803
MultiZilla/1.6.4.0b Mnenhy/0.6.0.104

*** Bug 263600 has been marked as a duplicate of this bug. ***

Sorry for spamming, but would this be really hard to fix?

It's really annoying when you want to save a page to:
 - Open the file
 - Find where the css file is located
 - Open the css file and save it
 - Modify the page to use the local version of the css file

'Save page' does this perfectly when the css file is not @imported, so is it so
hard to include the support for the @import tag? It seems so, since this bug was
opened more than 2.5 years ago...

*** Bug 267662 has been marked as a duplicate of this bug. ***

*** Bug 271626 has been marked as a duplicate of this bug. ***

*** Bug 273091 has been marked as a duplicate of this bug. ***

*** Bug 278895 has been marked as a duplicate of this bug. ***

Sorry for rereporting this bug, but it didn't come up in response to my query. Anyway, it is nigh on
three years old, and is simply a matter of resolving the @import tags. Can you fix it?

you can fix it.

*** Bug 281478 has been marked as a duplicate of this bug. ***

Please don't set blocking flags, other than to request them (?).

*** Bug 287525 has been marked as a duplicate of this bug. ***

*** Bug 294724 has been marked as a duplicate of this bug. ***

*** Bug 297180 has been marked as a duplicate of this bug. ***

*** Bug 309632 has been marked as a duplicate of this bug. ***

*** Bug 309737 has been marked as a duplicate of this bug. ***

*** Bug 314665 has been marked as a duplicate of this bug. ***

I am putting up a $25 bounty for this bug. This is very basic fdunctionality from an end-users standpoint -- something that people should expect to just work. The average joe isn't going to be able to manually download and modify the css files.

Anybody else want to contribute to this?

PS: If anybody does fix it, email me your paypal information.

*** Bug 321349 has been marked as a duplicate of this bug. ***

*** Bug 326131 has been marked as a duplicate of this bug. ***

Here's the problem that I've found, and correct me if I'm wrong because I'm not intimately familiar with the code:

At the moment, nsWebBrowserPersist.cpp uses a Tree Walker (line 1581) to go through each and every "tag set" (node) and find references to external objects. Since the CSS is not parsed in the DOM, the Tree Walker doesn't find these images and stuff. Therefore, the style sheets also need to be parsed or stuff inside import() and url() would need to be collected as well.

*** Bug 337114 has been marked as a duplicate of this bug. ***

*** Bug 343627 has been marked as a duplicate of this bug. ***

For those of you still interested, I've created an extension to serve as a temporary fix to the problem. < https://addons.mozilla.org/firefox/2925/ >
Until I or someone else has the time to re-write nsWebBrowserPersist however, this is the best solution.

*** Bug 355366 has been marked as a duplicate of this bug. ***

In , Ski (ski) wrote :

I'll throw $25 in to that bounty. It's the kind of thing that we need to make "just work".

(In reply to comment #36)
> I am putting up a $25 bounty for this bug. This is very basic fdunctionality
> from an end-users standpoint -- something that people should expect to just
> work. The average joe isn't going to be able to manually download and modify
> the css files.
>
> Anybody else want to contribute to this?
>
> PS: If anybody does fix it, email me your paypal information.
>

*** Bug 370152 has been marked as a duplicate of this bug. ***

I tried to use the "Save Complete" extension mentioned in comment 42, but it garbled the web page on disk in FF 2.0.0.3. I have found that ScrapBook (https://addons.mozilla.org/en-US/firefox/addon/427) works well, and "Mozilla Archive Format" (https://addons.mozilla.org/en-US/firefox/addon/212) looks interesting for Firefox 1.5.x users.

The issue (and I don't know where I saw this comment in the first place) is that, for this problem to be corrected, either the CSS parser needs to be rewritten to allow its usage by nsWebBrowserPersist for properly parsing and replacing urls in stylesheets, or a simplified CSS parser must be written for nsWebBrowserPersist, which would result in duplicated code. The best solution seems to be a full rewrite of the CSS parser to allow proper search-and-replace for urls in stylesheets, although this will take a lot of work. If someone is willing to mentor me, I have done enough research that I am pretty sure I know what needs to be done.

As a side note, the issue in comment 46 with the "Save Complete" extension has been fixed, and although it has problems with a specific time of @import rule, it is a lot better than the save functionality provided by nsWebBrowserPersist. The extension can be found at <https://addons.mozilla.org/en-US/firefox/addon/4723>.

Stephen: see bug 115107 comment 88 (that suggestion also applies to this bug - the suggested fixup interface could collect the URLs it fixes up and pass them to StoreURI, thus causing the URIs referenced from @import to be saved). I actually half-implemented that suggestion and it seemed to work fine before it fell off my plate :(

I can help you with the easier things, as time permits, if you take that approach.

The issue that I have with working through the DOM is that, currently, anything modified by Firefox before being saved is "corrected". If the page had html like <a href=http://www.google.com>google.com</a>, the address is now enclosed in quotes. Although this is perfectly acceptable for those simply trying to save a page for later, it is not acceptable for web-developers, who would prefer that nothing was changed in their code before being saved. This is why I prefer the, albeit less effective, method of using regular expressions. The URLs can be easily collected through the DOM interfaces (see code in extension mentioned in comment 47 for example). However replacing only those URLs that need to be replaced is difficult, as a regular expression cannot ever beat a full parser for shear flexibility.

Web developers have lots of ways to get the exact source of the page.. that's not an issue in my opinion. Regular expressions are a no-go, since you can't parse HTML properly using them, plus we already have a working parser - why would we write and maintain another, regexp-based one? For a single questionable web developer use-case?

*** Bug 388565 has been marked as a duplicate of this bug. ***

*** Bug 398839 has been marked as a duplicate of this bug. ***

*** Bug 428046 has been marked as a duplicate of this bug. ***

*** Bug 431605 has been marked as a duplicate of this bug. ***

firefox 3 has been release but this bug not be cleared. look at the starting bug date, why?

the original *simple* example may be "questionable", but there are plenty of recursive usage out there where one .css imports others via @import() ... i'm not suggesting that creating a dedicated parser is a good idea, just that the idea of writing off css @import() as questionable is ridiculousness.

I'm not getting your point. You expect your saved page to look like the original. Actually Firefox is not saving all files. There is no discussion here.

my comment was a specific reply to Comment #50. take a chill pill.

#57: then why does it save the images and re-write the paths to them? What's wrong with expecting to be able to have a saved page look the same as the original one? Or am I misunderstanding your comment?

For most people the aim is to make a local copy of the page view some strongly appreciate a copy of the files structure.

Both are unhappy because a part of the css is lost.

Nop.
For people who wants files from the serveur, they'll need the css file linked in the @import directive.
For people who needs a copy (sort of screenshot), they'll want the same style to be applied, including those imported.

Not conflictual. Yes, the @import url should be rewritten locally for this to work properly.

(ps->Mike Frysinger=>i'm not so fluent, what's a chill pill? bad medicine probably...)

James Hurley (hurleyjames) wrote :

Firefox Information: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.8) Gecko/2009032711 Ubuntu/8.04 (hardy) Firefox/3.0.8

Web pages that have been saved locally are not displaying correctly. To see this problem run the following steps:

1). Go to the jwebunit website ( http://jwebunit.sourceforge.net/ ). Suggesting saving a page from this website as a test since this provided a very good example of the problem.

2). Right click on the main web page and select "Save As..."

3). Save the web page somewhere on the local drive. On Ubuntu there is an option in the save dialog specifying how to save the web page. Make sure that "Web page, complete" is selected.

4). Navigate to the location where the web page was stored and attempt to load the saved page back into firefox

At this point the web page that is loaded from the hard disk should display very differently than the real web page. Styles and elements of the original web page should be missing. I also tested this same process on Windows XP SP2 (using firefox 3.0.8) and I see the exact same behavior. This seems to eliminate the problem being caused by local browser settings (since I see the same thing under Windows and Linux on two separate machines with totally different preferences). Furthermore I was able to run steps 1 - 4 using Opera 9.64 and this produces a saved web page that displays identically to the web page navigated to on the internet.

...Additionally I just tried loading the webpage saved locally using opera and this loads into firefox correctly. So the problem appears to be with the actual web page save using firefox and not the fact that the page is saved locally.

ProblemType: Bug
Architecture: i386
Date: Sat Mar 28 11:36:15 2009
DistroRelease: Ubuntu 8.04
Package: firefox-3.0 3.0.8+nobinonly-0ubuntu0.8.04.2
PackageArchitecture: i386
ProcEnviron:
 PATH=/home/username/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: firefox-3.0
Uname: Linux 2.6.24-23-generic i686

James Hurley (hurleyjames) wrote :
description: updated
description: updated
description: updated

*** Bug 498472 has been marked as a duplicate of this bug. ***

*** Bug 524301 has been marked as a duplicate of this bug. ***

Monkey (monkey-libre) wrote :

I´ve assigned this bug to the firefox-3.0 package.

Thank You for making Ubuntu better.

affects: ubuntu → firefox-3.0 (Ubuntu)
John Vivirito (gnomefreak) wrote :

We are in the process of updating all Firefox versions to un-versioned packages for all Ubuntu versions

affects: firefox-3.0 (Ubuntu) → firefox (Ubuntu)
Draycen DeCator (ddecator) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. We are sorry that we do not always have the capacity to look at all reported bugs in a timely manner. I have been able to confirm that this bug still occurs with daily build of Firefox 3.7. A report has also been found upstream, so I will be linking this report with that one.

Changed in firefox (Ubuntu):
status: New → Incomplete
Micah Gersten (micahg) wrote :

Thank you for your bug report. This bug has been reported to the developers of the software. You can track it and make comments at: https://bugzilla.mozilla.org/show_bug.cgi?id=126309
I'm going to mark it as Triaged and wait for upstream to work on this. Thanks for taking the time to make Ubuntu better! Please report any other issues you may find.

Changed in firefox (Ubuntu):
importance: Undecided → Low
status: Incomplete → Triaged
description: updated
Changed in firefox:
status: Unknown → Confirmed
Changed in firefox:
importance: Unknown → Medium

Until this bug is fixed, couldn't Firefox display a warning saying that some CSS files are missing when doing the "Save Page As" / "Web Page, complete"? (I don't know the Firefox internals, but since the page is already displayed, I suppose that Firefox could have this information quite easily.)

This would avoid to make the user wonder why a saved page looks wrong when opened, because Firefox doesn't display any error when a CSS file is not found.

Please ficks this bug. I am dying. I can't wait any longer. I want to see this bug ficked before I pass away. I want to enjoy my life. Please

@ Lil B: Others are not responsible for your life. Suggestion: Why don't you use the maff add-on ?? It not only saves an identical, faithful copy of the web page, it also saves disk space due to zip compression!

@chrizoo : You have no humor.
@lilb : I am only waiting since 2008, take the queue :)
@mozilla : This is a bug, please fix

A reminder: there's 50$ bounty for fixing this bug from https://bugzilla.mozilla.org/show_bug.cgi?id=126309#c36 and https://bugzilla.mozilla.org/show_bug.cgi?id=126309#c44.
Please hurry up while they are still alive. Are they?
P.S. The future in now.

In , Sjw (sjw) wrote :

Wow, no patch since 11 years for this annoying bug :o

*** Bug 1106261 has been marked as a duplicate of this bug. ***

Happy New Year, bug !

Remember bug when we used to play? Let's get together again some time and work it out again, like we used to again, let's again. Merry new year to yours and of course your others. Promise, Lil B

The bounty should be adjusted for inflation, in fact I'll throw all my gold nuggets into a new river. I am going to invest in this bug instead. Promise, Lil B

If you are serious about the bounty, please consider putting it up at https://www.bountysource.com/issues/3508687-save-page-does-not-save-import-ed-css to streamline the process.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.