save as complete page doesn't save images used in css

Bug #747197 reported by Teo
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mozilla Firefox
Confirmed
Unknown
firefox (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Binary package hint: firefox

When you save a web page with "File/Save As" and chose to save it as "web page, complete", an html file is saved and the image files, javascript files and css files are saved in a folder. However, background image files referenced in css style sheets are not retrieved and saved. There's no reason why they shouldn't.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: firefox 3.6.16+build1+nobinonly-0ubuntu0.10.04.1
ProcVersionSignature: Ubuntu 2.6.32-30.59-generic 2.6.32.29+drm33.13
Uname: Linux 2.6.32-30-generic i686
NonfreeKernelModules: nvidia
Architecture: i386
Date: Fri Apr 1 13:44:08 2011
FirefoxPackages:
 firefox 3.6.16+build1+nobinonly-0ubuntu0.10.04.1
 firefox-gnome-support 3.6.16+build1+nobinonly-0ubuntu0.10.04.1
 firefox-branding 3.6.16+build1+nobinonly-0ubuntu0.10.04.1
 abroswer N/A
 abrowser-branding N/A
InstallationMedia: Ubuntu 10.04 LTS "Lucid Lynx" - Release i386 (20100429)
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.utf8
 SHELL=/bin/bash
SourcePackage: firefox

Revision history for this message
In , Dd1079+bugzilla (dd1079+bugzilla) wrote :

Save complete Webpage works great but not 100% correct. Try to download this page:
http://phpbb.sourceforge.net/phpBB2/viewforum.php?f=1

It seems to forget the table background images which are called in the CSS header.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

To ben. This has been mentioned in the fora on mozillazine.org too.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

This applies to non-css backgrounds too.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

*** Bug 115532 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mozbugz-z (mozbugz-z) wrote :

see bug 115532 for more in the straight html for background tags.

Revision history for this message
In , Adamlock (adamlock) wrote :

The webbrowserpersist object doesn't save anything from the CSS whether inline
or not.

I don't know if it is possible to walk through, fixup and generate externally
linked and inline CSS (with the minimum of effort), but it's something I will
take for the time being.

Revision history for this message
In , Bugzilla3 (bugzilla3) wrote :

Adding "(background images not saved)" to summary.

Revision history for this message
In , Dave532 (dave532) wrote :

*** Bug 116660 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Dave532 (dave532) wrote :

Seen this on Linux 2001-12-20 (Slackware 8) changing OS to All

Revision history for this message
In , Mozbugz-z (mozbugz-z) wrote :

This is regarding the <td background=''> attribute.

I have a page I saved, from www.stomped.com as of today. using 12-27 build.. the
source has a <td background="images/trans.gif"> and Mozilla renders that
correctly. Now doing 'save page as', doesn't create a subdirectory of images
for example 'thedefaultsavedir'/www.stomped.com_files/images, and it doesn't
parse this tag attribute to retrieve the image: 'trans.gif' and save it to the:
'/www.stomped.com_files/images' subdirectory. 'Save page as' page source,
still outputs the default source tag as above, and when trying to access the
image upon viewing using open file > context menu > view background image, the
alert box shows it trying to access just 'thedefaultsavedir'/images/trans.gif.

There is also no doc type declared on this page.

Revision history for this message
In , Jud-me (jud-me) wrote :

just spoke w/ sarah about this. minusing.

Revision history for this message
In , L. David Baron (dbaron) wrote :

*** Bug 120859 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Spam-minneboken (spam-minneboken) wrote :

*** Bug 126307 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

*** Bug 128843 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Onethreeseven (onethreeseven) wrote :

I suspect there may actually be two issues at work here; I've made a few
testcases to illustrate my point.

- Pages with <td background="foo"> aren't saved completely.
http://pantheon.yale.edu/~al262/mozilla/115107/tables/index.html
- Pages with foo{background-image:url(bar);} in the CSS aren't saved completely,
even if foo != td. http://pantheon.yale.edu/~al262/mozilla/115107/css/index.html
- However, pages with simple <body background="foo"> tags work fine.
http://pantheon.yale.edu/~al262/mozilla/115107/plain/index.html

Or maybe I'm missing something. Regardless, going to zip up the whole lot and
attach it.

Revision history for this message
In , Onethreeseven (onethreeseven) wrote :

Created attachment 72701
Test cases possibly showing independence of CSS bug and table bug

Testcases above, packaged up (brown paper packages tied up in string / these
are a few of my favorite things...)

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

Andrew, the <td background="foo"> thing should be a separate bug. See
http://lxr.mozilla.org/seamonkey/source/embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp#1807
-- we basically never check for background attrs on <td> nodes. That should be
simple to fix. If we don't have a bug on that already, file one on me.

Revision history for this message
In , Onethreeseven (onethreeseven) wrote :

Boris, at the moment I don't read C++, so I'm going to take your word for it.
There was a bug (Bug 115532) that seems to address your issue but it was marked
as a duplicate of this one (see this bug 115107 comment 3).

Revision history for this message
In , Adamlock (adamlock) wrote :

Neither TABLE nor TD elements have a background attribute (in HTML 4.0) which is
why it isn't fixed up. I could add fixup for such quirks if people feel the
practice is prevalent enough.

As for inline/external styles with url declarations such as this:

  BODY {
    color: black;
    background: url(http://foo.com/texture.gif);
  }

I don't believe there is much that can be done until we have a way to serialize
CSS from it's in-memory representation (DOM).

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

Adam, we _do_ have such a way. The CSSOM should allow you to walk the
document.styleSheets array, walk the .cssRules array for each sheet, check
whether a rule has a background image set in it, change the URL if so, save the
.cssText for each rule, recurse down @import rules, etc. I can't actually
think of any bugs we have blocking that sequence of actions.... If you _do_ run
into bugs in that code that block this, please let me know.

Revision history for this message
In , Adamlock (adamlock) wrote :

Boris, do you know where the CSS serialization code lives? It's not being done
the way other DOMs are serialized via the content encoder.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

Heh. There is no pre-build serializer, yet. Each CSSRule object has a
GetCssText() method that returns an nsAString holding a serialization of the
rule (per DOM2 Style). The CSSOM-walking has to get done by hand,
unfortunately....

Revision history for this message
In , Onethreeseven (onethreeseven) wrote :

*** Bug 133725 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Alfonso (amla70) wrote :

*** Bug 157708 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Richard W.M. Jones (rich-annexia) wrote :

I have seen this bug as well. In my case, the CSS for a page contains:

body {
        background-color: #ffffff;
        margin: 1em 1em 1em 40px;
        font-family: sans-serif;
        color: black;
        background-image: url(/images/logo-left.gif);
        background-position: top left;
        background-attachment: fixed;
        background-repeat: no-repeat;
}

The background image (/images/logo-left.gif) is not saved, when it clearly
ought to be.

Rich.

Revision history for this message
In , Felix Miata (mrmazda) wrote :

Not only does the @import css not get saved to disk, when you later try to open
the saved html while the original server is unavailable, it takes forever to
load the local file while Mozilla waits for the css that never comes.

Revision history for this message
In , Alfonso (amla70) wrote :

*** Bug 187590 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Adamlock (adamlock) wrote :

Created attachment 121701
Work in progress

Patch is work in progress but it attempts to fix up style properties that
typically contain an url(), e.g. background-image. I have most of the rule
searching and url extraction / fixup down but I have to clean up the node
cloning function.

I also have a horrible feeling that just asking a node for it's inline syle
drags a bunch of -moz styles into existence even if they weren't there to start
with. This means you get a mess of extra styles in the output document. I'll
have to examine this issue a bit more.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

So this is based on a totally drive-by skim of the patch:

1) NS_ARRAY_LENGTH is a nice macro. ;)
2) "content" and "cursor" can have URI values
3) The parser you wrote will fail to parse something like:

     content: "url(foo)" url(foo);

    correctly. You're right that GetPropertyCSSValue would be nice here...
4) "foo: bar" in CSS is a declaration, not a rule (the "// Test if the inline
    style contains rules that" comment)
5) The code for setting the URL value will not work for "content" and "cursor"
    because they can include things other than the URL.
6) I'm not sure how putting @import in the same list as property names will
    work -- the two don't even live in the same places...

Revision history for this message
In , Adamlock (adamlock) wrote :

Created attachment 122047
Work in progress 2

Patch addresses most of the issues but is still work in progress. Uses
NS_ARRAY_LENGTH, adds support for various "cursor-*" props but not "content",
skips quoted text in values, fixes the broken comment, removes @import.

Patch actually fixes up content now, but is busted in a couple of ways:

1. The call to cssDeclOut->SetProperty(propName, newValue, propPriority) during
fixup results in a mutant URL caused by the CSS resolving the supplied relative
url into an absolute URL using the base address of the original location. So
"url(bg.gif)" becomes "url(http://foo.com/original_path/local_path/bg.gif)".
I'll have to figure out how it is getting its base address and fix it.

2. Just touching the .style property on an element results in all kinds of
nasty extra styles to leap into existence. For example:

  <body style="background: yellow url(bg.gif)">

Becomes:

  <body style="background: yellow url(126309_files/bg.gif) repeat scroll 0%;
    -moz-background-clip: initial; -moz-background-inline-policy: initial;
    -moz-background-origin: initial;">

Finally, the patch will have to be able to extract and fixup multiple urls that
the likes of "content" could contain. This is likely to complicate string
parsing matters a lot.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

> I'll have to figure out how it is getting its base address and fix it.

For inline style, this is coming from GetBaseURL on the document object. For
style in stylesheets, this is the stylesheet's URI, which the sheet stores.
It's set at creation time via nsICSSStyleSheet::Init().

"background" is a shorthand property. So it will set all sorts of stuff. As
long as you only work with "background-image" you don't really have to worry
about it...

The cursor property is "cursor". Those things you have in the list are just
possible values. The problem is, you can have something like:

  cursor: url(foo) crosshair;

Are there useful changes that could be made to the style system APIs that would
make this easier? Should some of those changes become part of DOM2 CSS?

Revision history for this message
In , Ian-hixie (ian-hixie) wrote :

Good lord. You really don't want to be doing actual parsing of CSS here. We
already have one CSS parser, having two just means that we'll have twice as many
bugs (probably more than twice, since this code won't be tested much).

Does it deal with CSS escapes? CSS comments? etc.

Surely there's a better way to do this.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

It sorta deals with CSS escapes (probably to the extent that it's needed in this
code). Comments are a non-issue, since the only CSS we have to parse are
property values gotten via getPropertyValue(). We're basically parsing
canonicalized values here...

But yes, it would be nice if we could ask the CSS decl for that info, since it
already has it all broken down and has to trouble itself to produce CSS just so
we can reparse it....

Revision history for this message
In , Ian-hixie (ian-hixie) wrote :

I thought we already could? How does "view background image" work?

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

That gets the computed style, not the declared style. Computed style supports
getPropertyCSSValue() to a certain extent (not live, but better than nothing).

Note that we don't implement computed "content" either... ;)

Revision history for this message
In , Ian-hixie (ian-hixie) wrote :

Well in that case I would recommend writing a quick internal hack of an API. In
fact probably the simplest API would be on the stylesheet or rule interfaces,
which just serialises the entire stylesheet or rule using a new base URI.

The reason I say that, as opposed to "implement DOM2 CSS", is that DOM2 CSS is
likely to be junked in favour of a more sensible API.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

Yes, I realize that. That's why we haven't implemented it. ;)

Serializing with a new base URI is nontrivial, since we only store absolute
URIs. We would need to duplicate some fun code that webbrowserpersist already
has. Further, for inline style there is no "sheet" -- there is just a
CSSDeclaration, and that implements no useful interfaces
(nsIDOMCSSStyleDeclaration/nsIDOMCSS2Properties are the closest it gets).

I suppose we could add another interface that you could QI these beasties to...
as long as we're careful, it should not be too expensive to do that...

Adam, I assume you're trying to fix this for 1.4?

Revision history for this message
In , Adamlock (adamlock) wrote :

I'll fix it for 1.4 if I can get it working reasonably at least for the CSS 1
stuff. I don't care so much about CSS 2 as background images are the most
visible breakage this patch tries to fix.

As I mentioned, there are a few issues to be sorted out before I would call it
ready for review. I'd like to know why these extra styles suddenly materialised
for example. I can probably live with forcing an absolute uri into the fixed up
style to make it work for now.

Revision history for this message
In , L. David Baron (dbaron) wrote :

Comment on attachment 122047
Work in progress 2

>+ "cursor-crosshair",
>+ "cursor-default",
>+ "cursor-pointer",
>+ "cursor-move",
>+ "cursor-e-resize",
>+ "cursor-ne-resize",
>+ "cursor-nw-resize",
>+ "cursor-n-resize",
>+ "cursor-se-resize",
>+ "cursor-sw-resize",
>+ "cursor-s-resize",
>+ "cursor-w-resize",
>+ "cursor-text",
>+ "cursor-wait",
>+ "cursor-help",

I've never heard of any of these.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

Adam, those properties are just set by the "background" shorthand... Or are you
saying that something in your patch is making them appear? (I see nothing there
that should cause that to happen....)

Revision history for this message
In , Adamlock (adamlock) wrote :

I was confused about the cursor thing. I was taking the CSS2 spec to mean there
were cursor, cursor-wait, cursor-pointer etc. properties just as there is a
background and a more specific background-image. I can cut those too for the
time being.

I was trying to avoid having to deal with lines like this:

cursor: url(foo.gif), url(foo2.gif), url(foo3.gif), text;

As for those other properties, I still don't understand why they suddenly appear
like they do. They are not in the original source so why should they just
materialise when I ask for the style object? If the styles are harmless then
okay, but I suspect if I checked in a fix which has this behaviour, I'd soon see
another bug open complaining about them.

Revision history for this message
In , Ian-hixie (ian-hixie) wrote :

Don't forget -moz-binding: url(...); either, while you're at it.

Regarding the property explosion. Assuming the original stylesheet reads:

   background: green;

...then it is exactly equivalent to:

   barkground-color: green; background-image: none; background-position: 0% 0%;
   background-attachment: scroll; background-repeat: repeat;
   -moz-background-clip: initial; -moz-background-origin: initial;
   /* and a few more that i've forgotten */

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

Adam, those properties appear when you serialize the style attr. Your patch has
nothing to do with that....

Revision history for this message
In , Adamlock (adamlock) wrote :

I know they have nothing to do with my patch but expect another bug nonetheless.
If someone publishes a page with an inline style and sees these new things
appear, they're not going to understand why.

I'm not sure I do either. I can appreciate that saying "background:
url(foo.gif)" might internally imply some other styles but I am not sure that
they should be serialized when they were not explicitly defined in the first place.

I can raise another bug on that issue and revise my patch to concentrate on
background images for the time being.

Revision history for this message
In , Brade (brade) wrote :

Boris Zbarsky in comment 36 says:
>Serializing with a new base URI is nontrivial, since we only store absolute
>URIs. We would need to duplicate some fun code that webbrowserpersist already
>has.

Which "fun code" in webbrowserpersist would you need to duplicate? If you want
to get a relative url from an absolute url (relative to another location), you
can do so with nsIURL's getRelativeSpec:
  AUTF8String getRelativeSpec(in nsIURI aURIToCompare)
Does that help?

Revision history for this message
In , Alfonso (amla70) wrote :

*** Bug 206401 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Coffeebreaks (coffeebreaks) wrote :

Encountering the bug when trying to save the URL for bug 216573.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

*** Bug 221532 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Tom-chiverton (tom-chiverton) wrote :

Is there a fix-for-version for this yet ?
With increasing use of CSS (the latest Dreamweaver makes much better use of it)
it is going to bite more and more people !

I think it should be a more major issue, as potentialy navigation elements etc.
on webpages may vanish for no good reason.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

See the target milestone. No one is working on this right now or will in the
immediate future. There is a work-in-progress patch here, and now that the
style system has been changed to also store non-absolute URIs things should be a
bit easier (issue 1 in comment 29 should not be present anymore). So someone
who cares about this bug needs to pick it up and make it work.

I'd say that a first cut could skip cursor and content and just do @import,
backgrounds, and -moz-binding...

Revision history for this message
In , Daniel-glazman (daniel-glazman) wrote :

> No one is working on this right now or will in the immediate future.

Hey Boris... Where did you get that... This is one of the bugs I want to
see fixed for Nvu. Please don't reassign yet, I have plenty of stuff on
my plate right now.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

glazou, I got it from earlier comments from Adam (the assignee), from the target
milestone, and from general knowledge of what I know people are working on.
Since the details what you're doing has been shrouded in secrecy and since I
can't read minds (well, not ones that far away, at least), I hardly had a way of
knowing you were thinking of working on this... ;)

Revision history for this message
In , Bugzilla-accessibleinter (bugzilla-accessibleinter) wrote :

*** Bug 224801 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Asa (asa) wrote :

I'm nominating this for 1.6 because I think saving web pages is a fairly common
task and as CSS becomes more prominent, we're seeing more and more pages that
don't save completely. dbaron thinks this might be more 1.7alpha material but it
does cause one of our smoketests, B.27, (and I suspect an increasingly
representative smoketest) to partially fail.

Revision history for this message
In , Ian-hixie (ian-hixie) wrote :

The way to fix this is to implement saving web pages the way that MacIE does it
-- saving the files unchanged, annotated with their original URI, so that links
between dependent resources still work, even if they are done via obscure ways
like built from JavaScript (which we could never solve otherwise).

Revision history for this message
In , Asa (asa) wrote :

Jump ball. Who wants to try to tackle this? Glazman, can you take this?

Revision history for this message
In , Bugzilla-spray (bugzilla-spray) wrote :

*** Bug 226925 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Asa (asa) wrote :

doesn't look like this is happening for 1.6

Revision history for this message
In , Mithgol (mithgol) wrote :

http://www.alistapart.com/articles/customunderlines/

Yet another good testcase for this bug, and for bug 126309.

The whole page is broken in Firebird 0.7, Firefox 0.8. (But I am not familiar
with Mozilla 1.6, so you'll have to test it there for yourselves. Sorry.)

Revision history for this message
In , Tom-chiverton (tom-chiverton) wrote :

http://www.alistapart.com/articles/customunderlines/ WFM
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040207 Firefox/0.8

Revision history for this message
In , Mithgol (mithgol) wrote :

Tom Chiverton, does http://www.alistapart.com/articles/customunderlines/ work
for you even after being saved (File, Save page as, Web page, complete) and then
opened from local drive?

Revision history for this message
In , Tom-chiverton (tom-chiverton) wrote :

Sergey- No, sorry, doesn't work after save.
My bad - I thought this bug was just about on-line pages, not saved ones too.

FWIW, it looks like the saved page hasn't had a style sheet applied, which I
guess fits with what seems to be wrong with Mozilla 'fixing' URLs when it saves
pages.

Is there any chance of this bug being assigned a non-future milestone, or at
least having severity increased - this bug is going to bite a lot more people as
other comments have indicated, as saving is fairly common.

Revision history for this message
In , Bugzilla-accessibleinter (bugzilla-accessibleinter) wrote :

*** Bug 245799 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Tom-b52 (tom-b52) wrote :

(In reply to comment #62)
> *** Bug 245799 has been marked as a duplicate of this bug. ***
(sorry, searched but couldn't find this report
from my report:)

Reproducible: Always
Steps to Reproduce:
1. Go the example URL http://dromoscopio.net/
2. Menu:File/Save As.../Save As Type: Web Page, complete
3. look at saved file in a file explorer

Actual Results:
none of the images are saved, eg
#uno div{
 background-image: url(uno.gif);
}

Expected Results:
saved all images, even thoses declaired in CSS

default config

workaround:
1. hover mouse over desired CSS background image
2. right-mouseclick, View Background Image
3. right-mouseclick, Save Image As...

This is not not critical, but may render off-line viewing of saved webpages
impossible without editing the HTML/CSS, if the page is written with the font &
background colour the same. (same issue when browsing some sites while the
browser is set w/images off: eg if the font color is white & the background
image is dark, but the background-color isn't set)

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

*** Bug 251815 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Bugzilla-spray (bugzilla-spray) wrote :

*** Bug 254306 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Bugzilla-accessibleinter (bugzilla-accessibleinter) wrote :

*** Bug 255838 has been marked as a duplicate of this bug. ***

Revision history for this message
In , A-geek (a-geek) wrote :

This problem will appear much more often in the future - see eg.
http://plone.org and any comments on how to be XHTML+CSS2 compliant and achieve
accessibility (AAA, S508 etc).

It breaks saving pages from plone.org right now.

Revision history for this message
In , A-geek (a-geek) wrote :

A propaganda website that makes massive use of this techniques, and which is
advocating it with good arguments, is

  http://www.csszengarden.com/

Revision history for this message
In , Bugmail-mozilla (bugmail-mozilla) wrote :

The problem will appear more and more often. For example now that mozilla.org
has a new style with a pretty background image : http://www.mozilla.org :(

Revision history for this message
In , Bugzilla-accessibleinter (bugzilla-accessibleinter) wrote :

*** Bug 268810 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Bugzilla-accessibleinter (bugzilla-accessibleinter) wrote :

*** Bug 273218 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Mozilla-06 (mozilla-06) wrote :

*** Bug 274163 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Djpohly+bmo (djpohly+bmo) wrote :

*** Bug 294976 has been marked as a duplicate of this bug. ***

Revision history for this message
In , 3-14 (3-14) wrote :

Long silence here. I am not 100% sure. Does this bug here include that files
called by include in style files are not included into save page, complete? If
so, the summary is totally missleading. Maybe someone who better understands the
bug can find a better wording.

pi

Revision history for this message
In , Peter-vanderwoude (peter-vanderwoude) wrote :

*** Bug 298819 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Zach-zachlipton (zach-zachlipton) wrote :

*** Bug 299394 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Dtownsend (dtownsend) wrote :

*** Bug 305438 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Zookqvalem (zookqvalem) wrote :

*** Bug 305630 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Uriber (uriber) wrote :

*** Bug 308489 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Tom-chiverton (tom-chiverton) wrote :

(In reply to comment #74)
> Long silence here. I am not 100% sure. Does this bug here include that files
> called by include in style files are not included into save page,

Yeah, that's spot on Boris.

Revision history for this message
In , Mike Connor (mconnor) wrote :

Not in the scope for Fx2, which is not going to include significant changes to Gecko.

Revision history for this message
In , Mymailforreg (mymailforreg) wrote :

Windows 2000, Firefox 1.5
For simplicity, I give an example. Save page http://www.mozilla.org/ to your local disk. Then open the saved page. You see that inscription "mozilla.org" and an image of the dinosaur at the top of the page disappeared.
Sometimes it significantly disfigures pages, especially forums.

Revision history for this message
In , Ria-klaassen (ria-klaassen) wrote :

*** Bug 322817 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Jruderman (jruderman) wrote :

*** Bug 328588 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Ivo-mmm (ivo-mmm) wrote :

Oh my, this bug has been reported 4 years ago, and it wasn't fixed yet? How come is it so hard?

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

*** Bug 332899 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

This bug needs someone to step up if it's going to get fixed. All the code involved is currently unowned... I can promise quick patch reviews on this one.

Revision history for this message
In , L. David Baron (dbaron) wrote :

So, for what it's worth, the way I think this ought to be fixed is roughly as follows:

To avoid having two separate bits of CSS serialization code, we should teach the style system to serialize with URI fixup.

For a start, this requires the nsWebBrowserPersist code to implement an interface a lot like nsIDocumentEncoderNodeFixup -- except for URI fixup. In other words, the interface would take a URI as input (probably nsIURI) and return a URI as output (probably string), potentially enqueueing the saving of the URI in question as part of the implementation.

Then we could add a Serialize method to nsIStyleSheet that took one of these as an argument (but where the object being null would mean no URI fixup was needed). Furthermore, we would then add Serialize methods to everything that has a GetCssText method by renaming GetCssText and adding the argument to this new interface -- and then adding a GetCssText that called Serilaize(null) -- where the null object for this new interface would mean to do no URI fixup. This object would further have to be passed down through nsCSSDeclaration::ToString and the methods it calls -- where, again, null would mean that no fixup was needed.

I think this is highly preferable to reimplementing CSS parsing and serialization.

Boris, does this seem reasonable?

Revision history for this message
In , Nickolay Ponomarev (asqueella) wrote :

(I plan to work on this, although not in the next few days.)

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

Yeah, the plan in comment 88 sounds pretty good.

In fact, it makes me wish that our content serialization worked more like that... Then fixup would work for, say, SVG too...

Revision history for this message
In , Jerfa (jerfa) wrote :

*** Bug 364036 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Marcel-dejean (marcel-dejean) wrote :

*** Bug 369282 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Dtownsend (dtownsend) wrote :

*** Bug 371419 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Goa103 (goa103) wrote :

As a temp solution you might be interested to try the Save Complete [1] extension. It supports saving images referenced in CSS files. We also discuss the problem in the « "File -> Save Page As" does not save images fro » [2] MozillaZine Forums topic.

Notes :
* [1] <https://addons.mozilla.org/en-US/firefox/addon/2925>
* [2] <http://forums.mozillazine.org/viewtopic.php?t=538458>

Revision history for this message
In , Sylvain Pasche (sylvain-pasche) wrote :

*** Bug 376597 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Kevin Brosnan (kbrosnan) wrote :

*** Bug 393054 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Ben Karel (eschew) wrote :

Created attachment 279129
preliminary, incomplete implementation

So I've been working on dbaron's suggestion for a week or two now. I'm leaving for the rest of the weekend in half an hour to run my first half marathon (eep!) so I figured I'd post what I have so far for preliminary review.

I've tried to avoid gratuitous refactoring of nsWebBrowserPersist, opting for small changes/hacks around an architecture not really aligned with the sequence of operations CSS serialization requires.

Right now, the biggest thing left to implement feature-wise is to make CSS and other content-serialized, not-downloaded files get the right file extension. Currently, wbp fixes up persisted files with nsIChannel-derived mime types; since we're not downloading the CSS files we persist from a remote server, that's not going to work. I haven't spent more than a few minutes looking at how much I have to work with, so I'm not sure of the best way to go about doing that. For some things it's easy (nsICSSStyleSheet should be saved as .css) but for things like background-image: url(blah), I'm not sure whether CSS knows what the mime type of blah is. Can't just take the extension from the URI, since that fails for dynamically-generated pages. For images I guess we do content-sniffing anyways, so we could save it as whatever we wanted, but still.

Anyways, I'll be back Monday and can hopefully post a more polished/complete patch later in the week. This patch deserves at least two r-'s in its current state. At least there's enough code there to make comments on.

Revision history for this message
In , Ben Karel (eschew) wrote :

Oh, one other thing: we technically don't need to persist @import-ed stylesheets linked to from style elements as separate files, their child rules can be recursively serialized along with the main page stylesheet. Right now the code does both, I think, because it's in flux. Since that's the first thing in the diff I just wanted to comment on it.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

I doubt I'll be able to get to this until after I get back in mid-to-late Sept.

> their child rules can be recursively serialized along with the main page
> stylesheet

Not if they contain @charset, @namespace, etc rules.

Revision history for this message
In , Jruderman (jruderman) wrote :

*** Bug 396042 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Cbook (cbook) wrote :

*** Bug 396276 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :
Download full text (3.9 KiB)

Comment on attachment 279129
preliminary, incomplete implementation

>Index: layout/style/nsCSSStyleSheet.cpp
>+ aContent.Append(NS_LITERAL_STRING("/* Rules from linked stylesheet, original URL: */\n"));

As I said, this is no good if the linked stylesheets have @namespace rules (or @charset or @import, but skipping those won't necessarily break things, while skipping @namespace most definitely will). Please make sure to write tests for this case.

What you probably want to do instead is to serialize the rules, and have @import serialization start the serialization of the imported sheet. I believe that's what dbaron suggested too.

>+ for (i = 0; i < styleRules; ++i) {
>+ rv = GetStyleRuleAt(i, *getter_AddRefs(rule));

Declare |rule| here, not up at the beginning somewhere?

>Index: layout/style/nsHTMLCSSStyleSheet.cpp
>+ NS_IMETHOD Serialize(nsAString& aContent, nsIDocumentEncoderFixup *aFixup);

Why not just put this method on nsICSSStyleSheet, since those are all you serialize?

>Index: layout/style/nsICSSRule.h

And probably put this on nsICSSStyleRule.

I don't see any rule implementations of Serialize() in this patch. Some of the rules will require fixup too, of course (@document rules come to mind).

Of course such rules in user sheets won't work right once saving has happened, but getting that to work is food for another bug, I think.

>Index: embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp
>+#include "nsIFormControl.h"
>+#include "nsIDOM3Node.h"

This looks like part of another patch, right?

>+ // Leave mCurrentDataPath alone so that CSS fixup knows where to save
>+ //mCurrentDataPath = oldDataPath;

This is pretty questionable. I'd have to dig to make sure, but I would be it's wrong.

>@@ -2504,17 +2522,37 @@ nsWebBrowserPersist::EnumPersistURIs(nsH
>+ rv = cos->Init(outputStream, nsnull, 0, 0);
>+ NS_ENSURE_SUCCESS(rv, rv);

This is wrong if the sheet has an @charset rule. Unless you plan to strip those, in which case it's wrong if the main document is not UTF-8. Again, please add tests.

>+ PRBool wroteFullString = PR_FALSE;
>+ rv = cos->WriteString(data->mContents, &wroteFullString);

Why is wroteFullString being ignored? Perhaps this should be called something else?

>+ rv = cos->Flush();
>+ rv = outputStream->Flush();
>+ rv = cos->Close();

Why bother assigning to rv if you plan to ignore it?

>+ nodeAsLink->GetType(type);
>+ if (type.EqualsLiteral("text/css")) {

This is wrong. |type| could be an empty string for a CSS style sheet. Further, if type is "text/css" there ould be no DOMStyleSheet attached to the node. You probably want to just GetSheet() and then null-check it. Again, add tests.

>+ nsAutoString content;
>+ ss->Serialize(content, mFixup);
...
>+ data->mContents.Assign(content);

Why not get the |data| first and just pass data->mContents to Serialize()?

I haven't reviewed the URI-munging details.

CSS fixup should probably also be applied to "style" attributes. Might be a separate bug.

To answer your quesion...

Read more...

Revision history for this message
In , Ben Karel (eschew) wrote :
Download full text (6.1 KiB)

Created attachment 282460
mostly complete patch (actually not nearly complete)

Updated patch attached. It correctly serializes CSS images. I actually had most of this work done before your review; I should have uploaded what I had earlier, since this resulted in some amount of wasted effort for you, bz. Mea culpa.

I think I removed all the changes related to bug 293834 in this patch.

(In reply to comment #102)
> (From update of attachment 279129 [details])
> >Index: layout/style/nsCSSStyleSheet.cpp
> >+ aContent.Append(NS_LITERAL_STRING("/* Rules from linked stylesheet, original URL: */\n"));
>
> As I said, this is no good if the linked stylesheets have @namespace rules (or
> @charset or @import, but skipping those won't necessarily break things, while
> skipping @namespace most definitely will). Please make sure to write tests for
> this case.
>
> What you probably want to do instead is to serialize the rules, and have
> @import serialization start the serialization of the imported sheet. I believe
> that's what dbaron suggested too.
>

I believe the code should now do this, but haven't written tests yet.

> >+ for (i = 0; i < styleRules; ++i) {
> >+ rv = GetStyleRuleAt(i, *getter_AddRefs(rule));
>
> Declare |rule| here, not up at the beginning somewhere?
done

>
> >Index: layout/style/nsHTMLCSSStyleSheet.cpp
> >+ NS_IMETHOD Serialize(nsAString& aContent, nsIDocumentEncoderFixup *aFixup);
>
> Why not just put this method on nsICSSStyleSheet, since those are all you
> serialize?
>
> >Index: layout/style/nsICSSRule.h
>
> And probably put this on nsICSSStyleRule.
>
Done, though I had Serialize on nsICSSRule and not nsICSSStyleRule because the non-style rules in nsCSSRules.h require Serialize too. But it's not much different either way, I guess.

> I don't see any rule implementations of Serialize() in this patch.
Added.

> Some of the rules will require fixup too, of course
> (@document rules come to mind).
>
> Of course such rules in user sheets won't work right once saving has happened,
> but getting that to work is food for another bug, I think.
Haven't tested/accounted for this yet.

> >Index: embedding/components/webbrowserpersist/src/nsWebBrowserPersist.cpp
> >+#include "nsIFormControl.h"
> >+#include "nsIDOM3Node.h"
>
> This looks like part of another patch, right?
>

Yes, they're from bug 293834, which this bug depends on to correctly persist serialized style elements.

> >+ // Leave mCurrentDataPath alone so that CSS fixup knows where to save
> >+ //mCurrentDataPath = oldDataPath;
>
> This is pretty questionable. I'd have to dig to make sure, but I would be it's
> wrong.
Yeah, that was leftover code from experimentation that I neglected to delete. Removed.

>
> >@@ -2504,17 +2522,37 @@ nsWebBrowserPersist::EnumPersistURIs(nsH
> >+ rv = cos->Init(outputStream, nsnull, 0, 0);
> >+ NS_ENSURE_SUCCESS(rv, rv);
>
> This is wrong if the sheet has an @charset rule. Unless you plan to strip
> those, in which case it's wrong if the main document is not UTF-8. Again,
> please add tests.

What would be the correct way of persisting @charset rules? Extract the @charset and write the stream ...

Read more...

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

> Done, though I had Serialize on nsICSSRule

Yeah, that's what I meant.

> What would be the correct way of persisting @charset rules? Extract the
> @charset and write the stream with that encoding?

Yes. For perfect fidelity, we'd have sheets know what charset they were parsed as, and serialize as that charset, modifying or inserting @charset rules as needed. That's what we do for documents...

> Er, because WriteString requires a boolean out param

How about calling the stack boolean "ignored" then? Or warning if it's false? Or something?

> In essence, it boils down to webbrowserpersist expecting to do two passes, and
> to download files before determining the filename to replace the URI with,

Yes, that's the right way to go about it.

I doubt I'll be able to review this any time soon. I definitely won't spend time reviewing until we have tests that this passes, but even then I really doubt that I'll be able to do it within a reasonable timeframe (measured in months).

Revision history for this message
In , Ben Karel (eschew) wrote :

>> In essence, it boils down to webbrowserpersist expecting to do two passes, and
>> to download files before determining the filename to replace the URI with,
>
> Yes, that's the right way to go about it.

Do you mean to say that you think that when CSS is being fixed up, FixupURI should block and not return until the original URI has been downloaded? Wouldn't forcing all the downloads to be made in serial like that have an adverse impact on performance?

Also, by "tests," do you mean automated tests, or just manually-verifiable testcases? I have no idea how to automate testing of webbrowserpersist...

Revision history for this message
In , Nickolay Ponomarev (asqueella) wrote :

Re: testing. I had a patch in bug 366072 to make use of reftest for WBP testing, but it bit-rotted since then and I didn't get to update it.

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

> Do you mean to say that you think that when CSS is being fixed up, FixupURI
> should block and not return until the original URI has been downloaded?

How do we handle <img> right now? We should handle CSS the same way, basically.

> Also, by "tests," do you mean automated tests

Yes, ideally automated tests. But even manually-verifiable ones would be a start....

Revision history for this message
In , Bzbarsky (bzbarsky) wrote :

Comment on attachment 282460
mostly complete patch (actually not nearly complete)

Per comment 104, someone else should review this...

Revision history for this message
In , Ginnchen+exoracle (ginnchen+exoracle) wrote :

*** Bug 358709 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Bugzilla-tuxmachine (bugzilla-tuxmachine) wrote :

*** Bug 409200 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Webdesign-semnanweb (webdesign-semnanweb) wrote :

just test the wikipedia.org, the master css file that include another css file.

@import url('./anothercss.css');
@import url('./anothercss-two.css');

their's also must be save with the images and another css included.

Revision history for this message
In , L. David Baron (dbaron) wrote :

Comment on attachment 282460
mostly complete patch (actually not nearly complete)

On the principle that it's better off in somebody's review queue than nobody's, adding this to my review queue, although I also probably won't get to it for a bit.

Revision history for this message
In , Gavin Sharp (gavin-sharp) wrote :

*** Bug 421711 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Stephen-donner (stephen-donner) wrote :

Just FYI; added this bug # to https://litmus.mozilla.org/show_test.cgi?id=3952 for easier reference.

Revision history for this message
In , Viktoriia-bogdanova (viktoriia-bogdanova) wrote :

Defect is still reproducible on version 3.0b5pre. Firefox doesn't save background image in css style when using save page as, web page complete.

Revision history for this message
In , Viktoriia-bogdanova (viktoriia-bogdanova) wrote :

Defect is reproducible on version 3.0b5pre (XP, Vista, Mac OS X 10.4). Firefox doesn't save background image in css style when using save page as, web page complete.

Revision history for this message
In , Polidobj (polidobj) wrote :

*** Bug 434480 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Ivan Ivanov (komarik) wrote :

Bug is still there on the latest nightly (Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.9.1a2pre) Gecko/2008072803 Minefield/3.1a2pre).

BTW, bug 293834 was recently fixed. Will it help to fix that bug? All the blockers are gone now.

Revision history for this message
In , L. David Baron (dbaron) wrote :

Adding qawanted keyword to reflect the request for automated tests for this.

Revision history for this message
In , Philringnalda (philringnalda) wrote :

*** Bug 451109 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Polidobj (polidobj) wrote :

*** Bug 467906 has been marked as a duplicate of this bug. ***

Revision history for this message
In , L. David Baron (dbaron) wrote :

So I've been thinking about this a little bit, and I'm wondering how tolerable the loss of accuracy in the CSS is. In particular, we lose properties we don't support, conditional comment hacks, etc.

An alternative approach might involve re-tokenizing the CSS but not parsing it, and then fixing up all the url() functions. That's not trivial either, though, given that we have some context-sensitive tokenization, especially for URLs.

I guess this approach probably is ok, though. But in the long run I think "Save Page As, Complete" is broken-by-design, and if we want to save complete Web pages we ought to support saving them into some package format.

Revision history for this message
In , htrex (hantarex) wrote :

A cross-browser package format standard would be more than welcome, but in the meantime what about integrating scrapbook's (https://addons.mozilla.org/en-US/firefox/addon/427) excellent abilities to save complete web pages into firefox ?

Revision history for this message
In , Twalker (twalker) wrote :

*** Bug 474424 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Hskupin (hskupin) wrote :

Will we have any progress on this bug in the near future?

Revision history for this message
In , Timr-mozilla (timr-mozilla) wrote :

Created attachment 391979
Firefox start page before the Save Page As....

Revision history for this message
In , Timr-mozilla (timr-mozilla) wrote :

My understanding is that this affects saving our current home page

http://www.google.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official

I attached the before and after shots for the "Save Page..." as web page
Complete.

This is on Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13)
Gecko/2009073022 Firefox/3.0.13

STR:
1) Install firefox
2) See start page
3) Select File -> Save Page As...
4) Make sure "save as type:" is "Web Page, complete"
5) save it to some convenient place like your desktop
6) open the file on your desktop, called firefox by default.

Revision history for this message
In , Timr-mozilla (timr-mozilla) wrote :

Created attachment 391980
Saved firefox page after "Save Page As..." - missing elements

Revision history for this message
In , Ben Karel (eschew) wrote :

I won't have time to work on this in the near future, so reverting to default owner.

Revision history for this message
In , Polidobj (polidobj) wrote :

*** Bug 508591 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Kairo-kairo (kairo-kairo) wrote :

*** Bug 349114 has been marked as a duplicate of this bug. ***

Revision history for this message
In , L. David Baron (dbaron) wrote :

Created attachment 395446
mostly complete patch, mostly merged to trunk

This is mostly merged to trunk (I haven't tried compiling yet, though). The aSerializedCloneKids change appears to have landed already. I merged the URI serialization code in nsCSSDeclaration. But I still need to add a replacement for the code that was patching TryBackgroundShorthand in the old patch.

Revision history for this message
In , L. David Baron (dbaron) wrote :

So now that I'm attempting to compile this, I'm having trouble figuring out how *any* of the patches in this bug ever compiled (well, I can see how the first patch could compile... but it wouldn't link). They both have code in nsCSSStyleSheet.cpp that calls a Serialize method on either nsICSSRule or nsICSSStyleRule. In the first patch it was declared pure virtual on nsICSSRule but never implemented; in the second patch it's neither declared nor implemented.

And it makes far more sense to me for that method to be on nsICSSRule. The current serialization code simply omits things like @media rules, @namespace rules, etc.

So I'd thought this patch was really ready for review, but it seems like it has some pretty major gaps in it.

Or were there changes you had in your tree that you just didn't include in your diff?

Revision history for this message
In , L. David Baron (dbaron) wrote :

Also, the patch renames nsEncoderNodeFixup to nsEncoderFixup in its implementation, but doesn't touch its declaration... which also clearly isn't going to compile, unless, again, many entire files are missing from the patch.

Revision history for this message
In , L. David Baron (dbaron) wrote :

Created attachment 395474
more merged to trunk

I redid a bunch of the guts of the CSS changes to be much more thorough, and thus hit all CSS properties with URLs in them. (Though it still doesn't hit downloadable fonts, but could once the issue below is fixed.)

I also filled in most of the missing pieces of code. However, the CSS rule serialize implementations are just "FIXME: WRITE ME".

Revision history for this message
In , Jo-hermans (jo-hermans) wrote :

*** Bug 519084 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Philringnalda (philringnalda) wrote :

*** Bug 526823 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Bugzilla-tf (bugzilla-tf) wrote :

*** Bug 530801 has been marked as a duplicate of this bug. ***

Revision history for this message
In , L. David Baron (dbaron) wrote :

Comment on attachment 282460
mostly complete patch (actually not nearly complete)

Marking review- because this wasn't actually anywhere near complete. (And see also other comments above.)

Revision history for this message
In , Kevin Brosnan (kbrosnan) wrote :

*** Bug 592642 has been marked as a duplicate of this bug. ***

Revision history for this message
Teo (teo1978) wrote :
Revision history for this message
In , Cork-8 (cork-8) wrote :

*** Bug 659230 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Simona-marcu (simona-marcu) wrote :

Issue is still reproducible on the latest nightly:
Mozilla/5.0 (Windows NT 5.1; rv:7.0a1) Gecko/20110704 Firefox/7.0a1

Revision history for this message
In , Pppx (pppx) wrote :

Confirming on latest Seamonkey build
Build identifier: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:10.0a1) Gecko/20111017 Firefox/10.0a1 SeaMonkey/2.7a1

Revision history for this message
In , Lain_13 (lain-halfbit) wrote :

Additional 1.5 months and we will be able to celebrate 10th birthday of this bug. ^___^

Revision history for this message
In , Vlad44-u (vlad44-u) wrote :

Devs don't save pages at all as I see.
They don't pay attention to such important bugs as this bug and Bug 653522 - saving pages when offline (or using adblock) inundates the user with error dialogs:

https://bugzilla.mozilla.org/show_bug.cgi?id=653522

Revision history for this message
In , C142592 (c142592) wrote :

Why are major bugs in core functionality like this not fixed when instead we get garbage like microsummaries, geolocation, and DNS prefetching?

Revision history for this message
In , A-geek (a-geek) wrote :

Wrt. comment#146: It's likely the problem that you and I don't pay the devs to do so, and the people who do, see more value in a read-only web and user tracking. It's also usually much easier to make something new than fix old bugs.

Revision history for this message
In , L. David Baron (dbaron) wrote :

The problem is really that there's no good approach here with the current "Save As, Complete" model. If we fix this, we'll break the CSS working in other browsers.

A better solution would be replacing Save As, Complete (which attempts to rewrite the pages to fix up URLs) with a cross-browser package format for saved documents (which could save resources as they were).

Revision history for this message
In , C142592 (c142592) wrote :

(In reply to David Baron [:dbaron] from comment #148)
> If we fix this, we'll break the CSS working in other browsers.
Only for hacks and browser-specific properties. This is not a big deal. The same problem exists anyway for things like IE's conditional comments, or any kind of user-agent selectivity on the server, or indeed any other type of hack or browser-specific feature.

But obviously it's SO much better with the current situation that it works in no browsers at all!

Are you seriously more worried about browser-specific CSS features, which are probably expected to degrade gracefully anyway, than, oh I don't know, background-image!?

> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).
And you'd STILL need to rewrite the URLs. Come on, give it half a thimble of thought.

Revision history for this message
In , Neil-httl (neil-httl) wrote :

(In reply to Anonymous from comment #149)
> (In reply to David Baron from comment #148)
> > A better solution would be replacing Save As, Complete (which attempts to
> > rewrite the pages to fix up URLs) with a cross-browser package format for
> > saved documents (which could save resources as they were).
> And you'd STILL need to rewrite the URLs.
Let's suppose for the sake of argument that we used IE's .mht format. It doesn't rewrite the URLs. Instead, each entry in the file is associated with its original URL. (What I don't know is whether it would be possible to achieve this scheme in Gecko.)

Revision history for this message
In , C142592 (c142592) wrote :

(In reply to <email address hidden> from comment #150)
>Instead, each entry in the file is associated with its original URL.
Oh I see. Fair enough. But unless anyone thinks that the feature to save a web page as individually manipulable files should be completely removed, the bug is still not avoidable, and the URLs still need rewriting.

Revision history for this message
In , Daniel-glazman (daniel-glazman) wrote :

Just for the record, I do have code in BlueGriffon achieving a full rewrite
of all stylesheets attached to a given document, tweaking all URIs.
I needed it to be able to turn an arbitrary document into a reusable
well-packaged template. That code is based on my parser/serializer JSCSSP.
In other terms, a full rewrite is doable but requires considerable complexity
and footprint...

Revision history for this message
In , Vlad44-u (vlad44-u) wrote :

(In reply to David Baron [:dbaron] from comment #148)
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents

Modern pages have very big js files - 500kb or 1mb per page. I delete js files from folders of saved pages sometimes. And a lot of reasons to open folders of pages. And I will never use the browser with only archive format for save as.

Please don't add reasons not to use Gecko browsers. (Now I use SeaMonkey 1.1.19 as main browser only because of regression in save as function, which was developed by kind developers)) Bug 653522)

Revision history for this message
In , Mymailforreg (mymailforreg) wrote :

(In reply to David Baron [:dbaron] from comment #148)
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).

I use "Save As, Complete" to save pages complete to examine a source code. So I need a working local copy of the page, not just a package.

Revision history for this message
In , Jon (dotnetcarpenter) wrote :

I agree with Emin, save the source as seen in view-source and rewrite urls to resources to a local folder next to the saved page. multiple request are definitely allowed. no reason to get any performance on this task as it is user invoked and the expectation is that the current document is saved as-is for various reasons, for later use. Not as firefox reads the document, nor as quickly as possible or to consume least space as possible - there is save document (not complete) for that.

Revision history for this message
In , Lain_13 (lain-halfbit) wrote :

Actually it's not enough. Complete save must save page as it is visible with all elements inserted by ajax (like comments) and if some elements are missing then they should remain missing (like advertisements with active adblock). Even all additional user CSS applied to the elements on the page must be preserved as-is.
Otherwise it is not complete.
I mean it have to crawl DOM and generate page from it instead of saving page's source with rewritten links.
I do agree there is no reason for fast performance but when you open such page it must resemble page which you have saved as much as possible. It could not be considered as complete if it will download anything from the web on open. Also I don't think preserving interactivity is expected behavior in this case. Users usually saves page to be able to read it later or send someone to read. So, it must be preserved as he seen it when decided to save it.

For investigation purposes there is separate applications like extension "DownThemAll!". You can configure how exactly it will save things and how deep will it go. It does even more then Emin requests. It not only makes local copy of the page - it could make local copy of the whole site or decent part of it.

Revision history for this message
In , Jon (dotnetcarpenter) wrote :

@Lain_13 Do you mean something like PhantomJS for Gecko?
http://stackoverflow.com/questions/5490438/phantomjs-and-getting-modified-dom

Revision history for this message
In , Lain_13 (lain-halfbit) wrote :

Not exactly but kind of. In your example that code generates a screenshot of the site while complete copy of the page have to remain as text and linked objects to let you scale it, select, copy and everything else. The only thing which shouldn't work is interactivity. I mean scripts must not work.
BTW, Scrapbook does such a thing already. I'd just like to see ability to make complete stand-alone static copy of the page in the Fx itself.

Revision history for this message
In , Jon (dotnetcarpenter) wrote :

Well, the example also does that. As does this: http://code.google.com/p/phantomjs/wiki/QuickStart#DOM_Manipulation but it might not be so clear.
This is the interesting bit:
var page = require('webpage').create(),
    url = 'http://lite.yelp.com/search?find_desc=pizza&find_loc=94040&find_submit=Search';

page.open(url, function (status) {
    if (status !== 'success') {
        console.log('Unable to access network');
    } else {
        var html = page.evaluate(function() {
            return document.innerHTML; // <- get all generated and styled HTML
        });
        console.log(html);
    }
    phantom.exit();
});
There is a similar project for Gecko called offscreen but I can't see how it's possible to get the generated HTML with this though.
Project Page: http://offscreengecko.sourceforge.net/
Source code: http://hg.mozilla.org/incubator/offscreen/

Revision history for this message
In , Andrixnet (andrixnet) wrote :

(In reply to David Baron [:dbaron] from comment #148)
> The problem is really that there's no good approach here with the current
> "Save As, Complete" model. If we fix this, we'll break the CSS working in
> other browsers.
>
> A better solution would be replacing Save As, Complete (which attempts to
> rewrite the pages to fix up URLs) with a cross-browser package format for
> saved documents (which could save resources as they were).

Consider how some browsers are closed source and written to the interests of their owners and how they implement either CSS or JS compared to the standards ... duh

On the other hand, I would strongly vote against such a method, one reason being the examination of various resource files related to a page.

I use an extension called Web Developer Toolbar. This extension has a neat feature called "view generated source". This contains all content generate/modified by JS as displayed. Based on this one could easily reference each resource file (image/css/etc) for saving.

Also, saving locally a webpage, in principle, does not imply that the entire webpage functionality will be preserved for the simple reason that some features require online interractivity with the source server, and this is impossible to replicate in a local save and offline viewing scenario.

Revision history for this message
In , Bugzilla-tf (bugzilla-tf) wrote :

*** Bug 852007 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Dražen Lučanin (kermit666) wrote :

It would be nice to have that feature back. I also like saving pages for offline inspection later ("coding on the lake" and that sort of thing :).

Revision history for this message
In , TheKiller (rdragos18) wrote :

This bug report/ticket exists for eleventh years.
Is there really no one interested in fixing this bug.......?

Revision history for this message
In , Michal Čaplygin (myfonj) wrote :

Some remarks regarding creation of 'faithful offline snapshot of displayed page' from browser:
- Aforementioned "Save complete" addon [0] seems no longer maintained.
- There is "Mozilla Archive Format" addon [1][2] which promotes 'Faithful save system' or 'Exact snapshot' feature, seems to work quite well. Maybe worth checking.
- (Parity) Opera browser is able to successfully save page with imported resources as well.

[0] https://addons.mozilla.org/en-US/firefox/addon/save-complete-4723/
[1] http://maf.mozdev.org/
[2] https://addons.mozilla.org/en-US/firefox/addon/mozilla-archive-format/

Revision history for this message
In , Mozilla-kaply (mozilla-kaply) wrote :

I apologize for the "me too" voice here.

dbaron, Are you saying that this bug is simply not fixable?

> If we fix this, we'll break the CSS working in other browsers.

But right now the CSS doesn't work in any browser. What would be the harm in just making it work in Firefox?

Revision history for this message
In , L. David Baron (dbaron) wrote :

Well, it could break things that aren't broken today. Though now that prefixes are becoming less common that would be less of an issue, so I'm less worried about it than I was a few years ago.

Alternatively, we could do tokenization-level fixup, which actually might not be that bad, except it's a litte tricky for @import which takes strings in addition to url()s.

Revision history for this message
In , Jon (dotnetcarpenter) wrote :

I feel inclined to develop a Firefox Add-On for this. To me, this seems overly simple.
1. Just take the download tree (e.g. as seen in Firebug ect.) and redownload it all.
2. Save all files to a folder as is (which is, if I remember correct, the current behavior).
3. Parse all css/js files and replace URI's to local path.
4. Have heaps of fun with script loaders (AMD et al.) making NO. 3 difficult.

There you have it. The page, saved in it's entirety.

Revision history for this message
In , Mozilla-kaply (mozilla-kaply) wrote :

Actually, the MAF add-on (https://addons.mozilla.org/en-us/firefox/addon/mozilla-archive-format/) does appear to do exactly that.

If you have it installed, Firefox Save complete page works much better.

Revision history for this message
In , Jon (dotnetcarpenter) wrote :

@mkaply Yep, "Faithful to the original" is all I need.
Thanks

Revision history for this message
In , Ioana-damy (ioana-damy) wrote :

Removing "qawanted" since the need for an automated test is marked by "in-testsuite?".

Revision history for this message
In , Wknapek (wknapek) wrote :

Hello I'd like to be assigned to this bug please assign me.

Revision history for this message
In , L. David Baron (dbaron) wrote :

Roughly what steps are you planning to take to fix this?

Revision history for this message
In , Wknapek (wknapek) wrote :

[Blocking Requested - why for this release]:

Revision history for this message
In , Wknapek (wknapek) wrote :

I read the whole thread, and since I'm new to the project i try implement solution proposed by the dotnetCarpenter. What do you think David?

Revision history for this message
In , L. David Baron (dbaron) wrote :

Do you mean comment 167? That sounds like a proposal to rewrite the entire "Save As, Complete" feature, of which this bug (fixing up URIs in the CSS) would be just a small part. But it doesn't tell me anything about what you plan to do for the part covered by this bug: fixing up URIs in the CSS.

Revision history for this message
In , Firstpeterfourten (firstpeterfourten) wrote :

Created attachment 8599503
Bugzilla main page test case- original

Revision history for this message
In , Firstpeterfourten (firstpeterfourten) wrote :

Created attachment 8599504
Bugzilla main page test case- after save

This 13-year-old bug in basic functionality is surprising - why is this still broken?
I've attached a very local test demonstration from the Bugzilla main page, to help illustrate its status today.

Revision history for this message
In , Zefling-r (zefling-r) wrote :

UnMHT is on GPL licence ( https://addons.mozilla.org/fr/firefox/addon/unmht/license/7.3.0.5 ), is it not possible to use a part of code ? UnMHT is for me a great tool for save a complete page : HTML / CSS / JS / media

Revision history for this message
In , Davidbourguignon-net (davidbourguignon-net) wrote :

Hi all, here are my two cents... I hope it helps!

To sum up:
- I tried to use the "save as, complete web page" with https://slate.adobe.com/a/NzR3A/ and as a result background images and other CSS effects were missing.
- I tried to save this web page with https://addons.mozilla.org/en-us/firefox/addon/mozilla-archive-format/ in either MAF or MHT formats, and it failed (only the first background image and text were displayed).
- I tried to save this web page with https://addons.mozilla.org/fr/firefox/addon/unmht/ in MHT format and it worked perfectly!

In conclusion: a million thanks to UnMHT developers! :-)

Revision history for this message
In , jidanni (dan-jacobson) wrote :

*** Bug 1232103 has been marked as a duplicate of this bug. ***

Revision history for this message
dino99 (9d9) wrote :

Closing that outdated report as EOL has been reached long time ago

Changed in firefox (Ubuntu):
status: New → Invalid
Revision history for this message
teo1978 (teo8976) wrote :

Is dino99 a bot or a retarded person?

I have just checked, and the issue described in this "outdated report" is still present in the latest version of Firefox on the current version of Ubuntu.

Before closing issues just because they are old, please check if they still exist!

P.S. I have seen a few other issues that I reported or starred closed by the same user in the last few days. I am not going to waste my time checking how many of them were wrongly closed like this one.

Someone should mass-undo all the issue closing changes made by dino99 lately and probably revoke him permissions to manage bugs.

Changed in firefox (Ubuntu):
status: Invalid → Confirmed
dino99 (9d9)
tags: removed: lucid
Revision history for this message
dino99 (9d9) wrote :

To the comments made above:

- verify how many users have warns and validated your report
- note that no one have wasted time to fix or commented on that bug
- if you still get these issues with a fresh install not disturted by old borked config/settings, then open a new report about bug or missing feature(s)/wrong design.
- dont forget to report upstream if necessary.

Changed in firefox (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
teo1978 (teo8976) wrote :

> verify how many users have warns and validated your report

Not sure what you mean

> note that no one have wasted time to fix or commented on that bug

So what?? If nobody has fixed it yet, it means it shouldn't be fixed??

> if you still get these issues with a fresh install not disturted by old borked config/settings,

I f***ing told you I DO observe these issues with a current version. It does not need to be a fresh install. It's upgraded through the official channels. If there were any difference between that and a fresh install, that would be a bug in itself. But I'm pretty sure it's not the case.

> then open a new report

Why should I reopen a new report every time a new version is released if the bug is not fixed?? The original reports contains all the relevant information. You shouldn't have closed it without verifying that the issue was fixed (which it is not), and I have even verified again that it is not. So why waste time closing reports and opening new ones?

tags: added: wily
Changed in firefox (Ubuntu):
status: Invalid → New
Revision history for this message
dino99 (9d9) wrote :
Changed in firefox (Ubuntu):
status: New → Incomplete
Revision history for this message
teo1978 (teo8976) wrote :

I don't get what the link has to do with the issue.

Did you even read the report?? The issue is not the inability to download some "protected" content, it's just that when saving a complete webpage, urls used in CSS such as background images are not retrieved.

BTW, Last comment before doing what?

Revision history for this message
teo1978 (teo8976) wrote :

Oh, I now notice you just changed the status to "incomplete".
What is missing?

Revision history for this message
dino99 (9d9) wrote :

Can you compare with the httrack way on a page having issue ?

Revision history for this message
teo1978 (teo8976) wrote :

What's httrack?

Note that the issue is extremely easy to test, you should be able to compare whatever you want to compare by yourself (I don't mean to be rude, I'm just saying)

Revision history for this message
Alberto Salvia Novella (es20490446e) wrote :

If you are still experiencing this bug in any currently Ubuntu release (https://wiki.ubuntu.com/Releases) then:

1. Enter that release first name into the tag list.
2. Set this bug status back to "confirmed".

> Thank you.

Revision history for this message
teo1978 (teo8976) wrote :

> 1. Enter that release first name into the tag list.

It's already there since comment #5

> 2. Set this bug status back to "confirmed".

Let's hope this time dino99 won't change it back without providing a reason.

Changed in firefox (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Brian Murray (brian-murray) wrote :

I've followed the steps indicated in the bug description and confirm that this still happens on xenial with firefox version 45.0.1+build1-0ubuntu1.

tags: added: xenial
Changed in firefox (Ubuntu):
importance: Undecided → Medium
Revision history for this message
In , Aryx-bugmail (aryx-bugmail) wrote :

*** Bug 1328204 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Paolo-mozmail (paolo-mozmail) wrote :

*** Bug 1326669 has been marked as a duplicate of this bug. ***

Revision history for this message
In , F-bugzilla (f-bugzilla) wrote :

Why, 16 years later, does Firefox still have a Save as "Webpage complete" option which does *not* save a complete webpage?

This is a fairly fundamental flaw as obviously people expect a "Webpage complete" option to save a complete webpage and it could cause serious problems when they discover later that it actually does not.

Revision history for this message
In , Gijskruitbosch+bugs (gijskruitbosch+bugs) wrote :

*** Bug 1599543 has been marked as a duplicate of this bug. ***

Changed in firefox:
importance: Unknown → Low
status: Unknown → Confirmed
Revision history for this message
In , Alice0775-t (alice0775-t) wrote :

*** Bug 1760258 has been marked as a duplicate of this bug. ***

Changed in firefox:
importance: Low → Unknown
Revision history for this message
In , Release-mgmt-account-bot (release-mgmt-account-bot) wrote :

The severity field for this bug is relatively low, S4. However, the bug has 54 duplicates, 122 votes and 148 CCs.
:Gijs, could you consider increasing the bug severity?

For more information, please visit [auto_nag documentation](https://wiki.mozilla.org/Release_Management/autonag#severity_underestimated.py).

Revision history for this message
In , Autonag-nomail-bot (autonag-nomail-bot) wrote :

The last needinfo from me was triggered in error by recent activity on the bug. I'm clearing the needinfo since this is a very old bug and I don't know if it's still relevant.

Revision history for this message
In , Gijskruitbosch+bugs (gijskruitbosch+bugs) wrote :

*** Bug 1832220 has been marked as a duplicate of this bug. ***

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.