Comment 4 for bug 1889653

Revision history for this message
scoder (scoder) wrote :

It runs through for me when I disable the comment discarding (comments=False).

It seems to eat up a lot of memory while discarding the row of comments (line ~400), for which it has to concatenate a lot of tail text.

        for el in _kill:
            el.drop_tree()

A subsequent row of elements to discard is the worst-case scenario for the simplistic algorithm, which discards one element after the other. This could be improved by 'inlining' the ".drop_tree()" method into the cleaner and letting it detect sequences of elements (which also share the same parent), so that we could collect and generate the new text/tail only once after removing all of them. Basically, collect tail texts and elements, then use parent.remove() to discard those elements, then set the new text/tail.

It's not entirely as trivial as that, because there is already a bit of ordering going on, but I think this is a reasonable direction to test out.

Would you like to give it a try?