Delete text attributes from groups with only non-text elements

Bug #714727 reported by Jan Thor on 2011-02-07
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Scour
Medium
Unassigned

Bug Description

Inkscape has the tendency to add a lot of style attributes to elements for which that doesn’t make sense, like font styles for path elements, and scour has the ability to remove them. Unfortunately, if you convert a text into paths elements within Inkscape, those font style attributes end in the group element that bundles the resulting path elements, and scour doesn’t delete such attributes from a group element. It would be nice if scour could recursively scan group elements whether they contain at least one text element, and otherwise remove useless font styles from the group element.

A possible patch could look like this: within the function repairStyle, replace the line

    if node.nodeName in ['rect', 'circle', 'ellipse', 'line', 'polyline', 'polygon', 'path']:

with

    if not contains_text_nodes(node):

and add a new function

def contains_text_nodes(node):
    u"""Returns True if the element of a descendant can implement text attributes."""
    # elements which have to do with text: answer yes
    if node.nodeName in ['text', 'tspan', 'flowRoot', 'flowRegion', 'flowPara']:
        return True
    # container elements: ask their children
    elif node.nodeName in ['svg', 'g', 'pattern', 'marker', 'switch', 'foreignObject']:
        for child in node.childNodes:
            if contains_text_nodes(child):
                return True
        return False
    # everything else: answer no
    else:
        return False

It is not terribly elegant, since repairStyle scans the tree starting from the root, and this new recursion adds another scan of some of the descendants to every node it passes; perhaps there are more economic solutions which traverse the whole tree just once, and avoid visiting some elements several times.

It also reverses the philosophy of using a list of known non-text elements by replacing it with a list of known text elements. A more conservative and robust approach would be this:

def contains_text_nodes(node):
    u"""Returns True if the element of a descendant can perhaps implement text attributes."""
    # elements which are known to be no text objects: answer no
    if node.nodeType != 1:
        return False
    elif node.nodeName in ['rect', 'circle', 'ellipse', 'line', 'polyline', 'polygon', 'path']:
        return False
    # we consider only the most important container, since our default answer is yes
    # and we are willing to accept some false positives, to avoid lengthy recursions
    elif node.nodeName in ['g']:
        for child in node.childNodes:
            if contains_text_nodes(child):
                return True
        return False
    # our new default: answer yes for everything else
    else:
        return True

Related branches

Since future versions of SVG may add more text elements, I like your function that blacklists shape elements better than the one that whitelists text elements.

Changed in scour:
importance: Undecided → Medium
status: New → Confirmed
Changed in scour:
assignee: nobody → Louis Simard (louis-simard)

Code to remove text-based attributes, based on your code, is now in the trunk as revision 199.

To avoid most of the recursion and performance penalty, I've taken the liberty to add caching to the function by creating a custom attribute called 'mayContainTextNodes' on the nodes.

Thanks for your bug report and code submission.

Changed in scour:
status: Confirmed → Fix Committed

This bug is fixed in release 0.26 of Scour.

Changed in scour:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers