Bug #481605 “Scour ought to be able to combine adjacent polygons...” : Bugs : Scour

Revision history for this message

codedread (codedread) wrote on 2009-11-13:

#1

Thank you for the bug report. After thinking this over, I don't think this is appropriate for scour to do. I recommend one of two things:

* educate the graphics designer
* use a graphical editor (such as Inkscape) to combine/join the polygons/shapes into one giant path

Changed in scour:
status:	New → Won't Fix

Revision history for this message

codedread (codedread) wrote on 2009-11-13:

#2

I might consider this if a decent algorithm/spec could be written to tease out the details.

How do we consider a set of shapes as should-be-combined? do we look at the intersection of all possible shapes with other possible shapes and if they intersect and have the same fill-style, fill-opacity, fill color and no stroke then we would try to trace the contour? This is a really complicated algorithm, actually - and computationally expensive.

If you have an example file with these 'hundreds of simple shapes' that should be combined, I can take a look and see if it's remotely in the realm of one day accomplishing with scour. I still think the best bet is to educate the graphics designer. Scour is not the right place to correct these types of practices :)

Changed in scour:
status:	Won't Fix → Incomplete

Revision history for this message

Feneric (feneric-gmail) wrote on 2009-11-13:

#3

SVG image with adjacent / overlapping regions that ought to be combined Edit (190.2 KiB, image/svg+xml)

Download full text (3.6 KiB)

I'd think that probably the best approach would be to make one pass through and gather up / categorize regions by backgrounds if and only if they have no stroke or a stroke that matches their backgrounds. I've personally never seen a case in the wild where this sort of problem involved transparency or background patterns, so at first pass it may make sense to simply not include these at all (the simpler case of plain opaque backgrounds would give a better feel for how well it'd work). Once these regions have been categorized, any category with more than one could have an overlap search algorithm run on it (and I confess here that I'm currently naive regarding this type of algorithm; in cases where I've needed it it's always been already available, but I'm guessing that no matter what its form it'll quickly get slow as the number of objects increases, so running it category per category will probably be a lot more efficient than running it over the whole). For most of the SVG images I've encountered so far, there typically won't be that many regions in any given category, and the images that are exceptions are exactly the sorts that'd benefit from this.

I agree with you that it'd be best to correct the graphic designer. :) Unfortunately, I'm not viewing this so much from the perspective of my own use as I am from the perspective of widespread use of SVG for the Web (a la the svg2gfx.xslt filter I've been developing as part of the Dojo Toolkit), and I suspect that if just within a one-mile radius of where I sit now there's at least one such graphic designer, there are probably lots more of them scattered throughout the world, so having an automated way of cleaning up their messes is probably nearly as important as cleaning up the typical messes left by Illustrator.

I probably exaggerated when I said "hundreds". Probably "dozens" is more realistic. It just feels like hundreds when one is testing against it and trying to debug odd behaviors.

Anyhow, I've attached a recent sample SVG file demonstrating exactly what I'm talking about. It's a real-world sample that I've acquired and have been using for testing purposes -- it's not something designed to be punishing to any sort of SVG processing app. It was used to promote a particular political view prior to the recent local elections in Massachusetts, and is fairly likely to be (sadly) somewhat typical of what we can expect from at least some designers. It gave me particular fits as even though I was able to process it (apparently) correctly with svg2gfx.xslt, the result had mysterious (still not understood) troubles on MSIE that I'm still trying to track down. Since I had to actually put this into use prior to the aforementioned election, I ended up subbing in a bitmap in lieu of the SVG vectors for the single case of MSIE... not really what I'd intended. Anyhow, note the horror of the pig's mouth in particular. What ought to be a simple Bezier curve is instead composed of lots of little polygons. Converting those polygons into just one polygon is a big win. Other examples of places where adjacent / overlapping regions ought to be combined abound within the pig. I have no idea ...

I'd think that probably the best approach would be to make one pass through and gather up / categorize regions by backgrounds if and only if they have no stroke or a stroke that matches their backgrounds.  I've personally never seen a case in the wild where this sort of problem involved transparency or background patterns, so at first pass it may make sense to simply not include these at all (the simpler case of plain opaque backgrounds would give a better feel for how well it'd work).  Once these regions have been categorized, any category with more than one could have an overlap search algorithm run on it (and I confess here that I'm currently naive regarding this type of algorithm; in cases where I've needed it it's always been already available, but I'm guessing that no matter what its form it'll quickly get slow as the number of objects increases, so running it category per category will probably be a lot more efficient than running it over the whole).  For most of the SVG images I've encountered so far, there typically won't be that many regions in any given category, and the images that are exceptions are exactly the sorts that'd benefit from this.

I agree with you that it'd be best to correct the graphic designer.  :)  Unfortunately, I'm not viewing this so much from the perspective of my own use as I am from the perspective of widespread use of SVG for the Web (a la the svg2gfx.xslt filter I've been developing as part of the Dojo Toolkit), and I suspect that if just within a one-mile radius of where I sit now there's at least one such graphic designer, there are probably lots more of them scattered throughout the world, so having an automated way of cleaning up their messes is probably nearly as important as cleaning up the typical messes left by Illustrator.

I probably exaggerated when I said "hundreds".  Probably "dozens" is more realistic.  It just feels like hundreds when one is testing against it and trying to debug odd behaviors.

Anyhow, I've attached a recent sample SVG file demonstrating exactly what I'm talking about.  It's a real-world sample that I've acquired and have been using for testing purposes -- it's not something designed to be punishing to any sort of SVG processing app.  It was used to promote a particular political view prior to the recent local elections in Massachusetts, and is fairly likely to be (sadly) somewhat typical of what we can expect from at least some designers.  It gave me particular fits as even though I was able to process it (apparently) correctly with svg2gfx.xslt, the result had mysterious (still not understood) troubles on MSIE that I'm still trying to track down.  Since I had to actually put this into use prior to the aforementioned election, I ended up subbing in a bitmap in lieu of the SVG vectors for the single case of MSIE...  not really what I'd intended.  Anyhow, note the horror of the pig's mouth in particular.  What ought to be a simple Bezier curve is instead composed of lots of little polygons.  Converting those polygons into just one polygon is a big win.  Other examples of places where adjacent / overlapping regions ought to be combined abound within the pig.  I have no idea why the graphic was drawn this way; I've really only met the artist briefly and that was before I started looking at the details of this piece.

If I can help in any way, please let me know.  I'm currently working with a couple other open source projects (mostly Repoze and Dojo at the moment) and so I'm a little restricted for time, but I'm quite familiar with Python and do see Scour as a worthwhile project that's complimentary to what I'm already working on.

Revision history for this message

Rob Russell (rob-latenightpc) wrote on 2009-11-13:

#4

So it sounds like this is about looking at the visible portions of the svg image. As a first iteration, any elements that are totally obscured by other (opaque) elements can be totally removed from the image.

Next, for merging elements as this bug requests, you'd have to look for partial overlaps, find intersections and replace the intersecting elements. One problem here is that replacing a few overlapping rects (for example) with an equivalent path could end up actually making the svg source larger. The rendering time after replacement could be faster or slower, not sure.

Since this can alter the document pretty severely it should be off by default.

Revision history for this message

codedread (codedread) wrote on 2009-11-13:

#5

My main concern is the contour tracing and the arbitrary polygon intersection. I really have no idea how to start something like this frankly and it's much more math than I'm willing to bite off at the moment.

Patches welcome! :)

Revision history for this message

Feneric (feneric-gmail) wrote on 2009-11-14:

#6

Right, only visible portions need to be considered. Are objects that cannot be seen already getting removed?

Can replacing a few overlapping rects with an equivalent path really end up making the source larger? Could such a thing stay true as the number of overlapping rects increases? My gut feel is that we could probably figure out whether or not to do a replacement based upon the number of rects involved, but I haven't tried to analyze it yet. For the first pass I'm in agreement that it should be something that can be selectively turned on if desired.

I suspect the math ought not to be that hard for polygons; ultimately it then just breaks down to intersections between lines. I'll probably take a look into it myself, but I've got to finish up my current projects first unfortunately.

Revision history for this message

Louis Simard (louis-simard-deactivatedaccount) wrote on 2010-06-12:

#7

I have a bit more input for this bug.

If the opacity of two overlapping *translucent* figures is the same, the intersection of the two figures will be drawn with more opacity. Scour can't really do anything with that.

The set of circumstances where it would be perfectly safe to combine multiple figures into one is very narrow, and as such I doubt the savings in file size will be that much for most, if not all, files. I think the conditions would look like these:

1. Figures should not have any intervening figures defined in-between in the same container (such as <rect red> <rect green> <rect red>). If there are intervening figures, then the intervening figure must not intersect with the to-be-combined figures.
2. Figures must have either no transformation, or the same simple transformation (translate, skewX, skewY, scale), or two simple transformations that can be expressed in terms of each other (for instance, translate(25) and translate(50), which can both become translate(25) after adding 25 to the second figure's coordinates). This is to ensure that the coordinate system is the same for both.
3. Figures must intersect, or be adjacent.
4. Figures must have no stroke and either no fill or the same solid-color fill. Combining figures with gradients and patterns is troublesome with regards to the bounding box of the gradient or pattern.
5. Figures must have an opacity of 1, or be adjacent (not intersecting) and have any opacity.
6. Figures must not have different CSS classes.
7. Figures must not have IDs referenced in the CSS stylesheet defined with <style>.
8. Intersection calculations must take into account even-odd and non-zero fill/clip rules. Parts of a polygon could end up being "outside" itself if there's an intersection within its own coordinates, so the second figure might end up inside an "outside" region of the first polygon. This needs to be handled correctly.

It looks like the test file in comment 3 would benefit a fair bit, even with that impressive set of conditions. However, neither codedread nor I know where to start for this feature request.

I have a bit more input for this bug.

If the opacity of two overlapping *translucent* figures is the same, the intersection of the two figures will be drawn with more opacity. Scour can't really do anything with that.

The set of circumstances where it would be perfectly safe to combine multiple figures into one is very narrow, and as such I doubt the savings in file size will be that much for most, if not all, files. I think the conditions would look like these:

1. Figures should not have any intervening figures defined in-between in the same container (such as <rect red> <rect green> <rect red>). If there are intervening figures, then the intervening figure must not intersect with the to-be-combined figures.
2. Figures must have either no transformation, or the same simple transformation (translate, skewX, skewY, scale), or two simple transformations that can be expressed in terms of each other (for instance, translate(25) and translate(50), which can both become translate(25) after adding 25 to the second figure's coordinates). This is to ensure that the coordinate system is the same for both.
3. Figures must intersect, or be adjacent.
4. Figures must have no stroke and either no fill or the same solid-color fill. Combining figures with gradients and patterns is troublesome with regards to the bounding box of the gradient or pattern.
5. Figures must have an opacity of 1, or be adjacent (not intersecting) and have any opacity.
6. Figures must not have different CSS classes.
7. Figures must not have IDs referenced in the CSS stylesheet defined with <style>.
8. Intersection calculations must take into account even-odd and non-zero fill/clip rules. Parts of a polygon could end up being "outside" itself if there's an intersection within its own coordinates, so the second figure might end up inside an "outside" region of the first polygon. This needs to be handled correctly.

It looks like the test file in comment 3 would benefit a fair bit, even with that impressive set of conditions. However, neither codedread nor I know where to start for this feature request.

Changed in scour:
importance:	Undecided → Wishlist

Scour

Scour ought to be able to combine adjacent polygons when appropriate.

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches