Scour ought to be able to combine adjacent polygons when appropriate.

Bug #481605 reported by Feneric
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Scour
Incomplete
Wishlist
Unassigned

Bug Description

I've personally run into one graphics designer who seems to consistently build more complex curves and shapes by overlapping simpler shapes rather than using something like a pen tool or polyline tool to do it all in one (much more efficient) piece. I've seen some cases with literally hundreds of simple shapes being used to construct what ought to really just be one slightly more complicated shape. If Scour could reduce such multi-component shapes into just a single shape, it'd be a huge win for this type of inefficient SVG.

Revision history for this message
codedread (codedread) wrote :

Thank you for the bug report. After thinking this over, I don't think this is appropriate for scour to do. I recommend one of two things:

  * educate the graphics designer
  * use a graphical editor (such as Inkscape) to combine/join the polygons/shapes into one giant path

Changed in scour:
status: New → Won't Fix
Revision history for this message
codedread (codedread) wrote :

I might consider this if a decent algorithm/spec could be written to tease out the details.

How do we consider a set of shapes as should-be-combined? do we look at the intersection of all possible shapes with other possible shapes and if they intersect and have the same fill-style, fill-opacity, fill color and no stroke then we would try to trace the contour? This is a really complicated algorithm, actually - and computationally expensive.

If you have an example file with these 'hundreds of simple shapes' that should be combined, I can take a look and see if it's remotely in the realm of one day accomplishing with scour. I still think the best bet is to educate the graphics designer. Scour is not the right place to correct these types of practices :)

Changed in scour:
status: Won't Fix → Incomplete
Revision history for this message
Feneric (feneric-gmail) wrote :
Download full text (3.6 KiB)

I'd think that probably the best approach would be to make one pass through and gather up / categorize regions by backgrounds if and only if they have no stroke or a stroke that matches their backgrounds. I've personally never seen a case in the wild where this sort of problem involved transparency or background patterns, so at first pass it may make sense to simply not include these at all (the simpler case of plain opaque backgrounds would give a better feel for how well it'd work). Once these regions have been categorized, any category with more than one could have an overlap search algorithm run on it (and I confess here that I'm currently naive regarding this type of algorithm; in cases where I've needed it it's always been already available, but I'm guessing that no matter what its form it'll quickly get slow as the number of objects increases, so running it category per category will probably be a lot more efficient than running it over the whole). For most of the SVG images I've encountered so far, there typically won't be that many regions in any given category, and the images that are exceptions are exactly the sorts that'd benefit from this.

I agree with you that it'd be best to correct the graphic designer. :) Unfortunately, I'm not viewing this so much from the perspective of my own use as I am from the perspective of widespread use of SVG for the Web (a la the svg2gfx.xslt filter I've been developing as part of the Dojo Toolkit), and I suspect that if just within a one-mile radius of where I sit now there's at least one such graphic designer, there are probably lots more of them scattered throughout the world, so having an automated way of cleaning up their messes is probably nearly as important as cleaning up the typical messes left by Illustrator.

I probably exaggerated when I said "hundreds". Probably "dozens" is more realistic. It just feels like hundreds when one is testing against it and trying to debug odd behaviors.

Anyhow, I've attached a recent sample SVG file demonstrating exactly what I'm talking about. It's a real-world sample that I've acquired and have been using for testing purposes -- it's not something designed to be punishing to any sort of SVG processing app. It was used to promote a particular political view prior to the recent local elections in Massachusetts, and is fairly likely to be (sadly) somewhat typical of what we can expect from at least some designers. It gave me particular fits as even though I was able to process it (apparently) correctly with svg2gfx.xslt, the result had mysterious (still not understood) troubles on MSIE that I'm still trying to track down. Since I had to actually put this into use prior to the aforementioned election, I ended up subbing in a bitmap in lieu of the SVG vectors for the single case of MSIE... not really what I'd intended. Anyhow, note the horror of the pig's mouth in particular. What ought to be a simple Bezier curve is instead composed of lots of little polygons. Converting those polygons into just one polygon is a big win. Other examples of places where adjacent / overlapping regions ought to be combined abound within the pig. I have no idea ...

Read more...

Revision history for this message
Rob Russell (rob-latenightpc) wrote :

So it sounds like this is about looking at the visible portions of the svg image. As a first iteration, any elements that are totally obscured by other (opaque) elements can be totally removed from the image.

Next, for merging elements as this bug requests, you'd have to look for partial overlaps, find intersections and replace the intersecting elements. One problem here is that replacing a few overlapping rects (for example) with an equivalent path could end up actually making the svg source larger. The rendering time after replacement could be faster or slower, not sure.

Since this can alter the document pretty severely it should be off by default.

Revision history for this message
codedread (codedread) wrote :

My main concern is the contour tracing and the arbitrary polygon intersection. I really have no idea how to start something like this frankly and it's much more math than I'm willing to bite off at the moment.

Patches welcome! :)

Revision history for this message
Feneric (feneric-gmail) wrote :

Right, only visible portions need to be considered. Are objects that cannot be seen already getting removed?

Can replacing a few overlapping rects with an equivalent path really end up making the source larger? Could such a thing stay true as the number of overlapping rects increases? My gut feel is that we could probably figure out whether or not to do a replacement based upon the number of rects involved, but I haven't tried to analyze it yet. For the first pass I'm in agreement that it should be something that can be selectively turned on if desired.

I suspect the math ought not to be that hard for polygons; ultimately it then just breaks down to intersections between lines. I'll probably take a look into it myself, but I've got to finish up my current projects first unfortunately.

Revision history for this message
Louis Simard (louis-simard-deactivatedaccount) wrote :

I have a bit more input for this bug.

If the opacity of two overlapping *translucent* figures is the same, the intersection of the two figures will be drawn with more opacity. Scour can't really do anything with that.

The set of circumstances where it would be perfectly safe to combine multiple figures into one is very narrow, and as such I doubt the savings in file size will be that much for most, if not all, files. I think the conditions would look like these:

1. Figures should not have any intervening figures defined in-between in the same container (such as <rect red> <rect green> <rect red>). If there are intervening figures, then the intervening figure must not intersect with the to-be-combined figures.
2. Figures must have either no transformation, or the same simple transformation (translate, skewX, skewY, scale), or two simple transformations that can be expressed in terms of each other (for instance, translate(25) and translate(50), which can both become translate(25) after adding 25 to the second figure's coordinates). This is to ensure that the coordinate system is the same for both.
3. Figures must intersect, or be adjacent.
4. Figures must have no stroke and either no fill or the same solid-color fill. Combining figures with gradients and patterns is troublesome with regards to the bounding box of the gradient or pattern.
5. Figures must have an opacity of 1, or be adjacent (not intersecting) and have any opacity.
6. Figures must not have different CSS classes.
7. Figures must not have IDs referenced in the CSS stylesheet defined with <style>.
8. Intersection calculations must take into account even-odd and non-zero fill/clip rules. Parts of a polygon could end up being "outside" itself if there's an intersection within its own coordinates, so the second figure might end up inside an "outside" region of the first polygon. This needs to be handled correctly.

It looks like the test file in comment 3 would benefit a fair bit, even with that impressive set of conditions. However, neither codedread nor I know where to start for this feature request.

Changed in scour:
importance: Undecided → Wishlist
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.