Comment 6 for bug 1205637

Revision history for this message
Marc Na (ub40) wrote :

The expression seems to be https://github.com/kovidgoyal/calibre/blob/v3.2.1/src/calibre/ebooks/conversion/utils.py#L440

The problematic fact is <br/> matches <b[^>]*> (which was meant for capturing <b>)

test case:

import re
html = "<p>We use <i>italics with a self-closing br<br/></i> element.</p>"
html = re.sub(
            r"\s*<(font|[ibu]|em|strong)[^>]*>\s*(<(font|[ibu]|em|strong)[^>]*>\s*</(font|[ibu]|em|strong)>\s*){0,2}\s*</(font|[ibu]|em|strong)>", " ", html)
print html

Result:

<p>We use <i>italics with a self-closing br element.</p>

(Match = "<br/></i>")

test case #2:

import re
html = "<p>We use <i>italics with a self-closing span<span/></i> element.</p>"
html = re.sub(
            r"\s*<(font|[ibu]|em|strong)[^>]*>\s*(<(font|[ibu]|em|strong)[^>]*>\s*</(font|[ibu]|em|strong)>\s*){0,2}\s*</(font|[ibu]|em|strong)>", " ", html)
print html

Result:

<p>We use <i>italics with a self-closing span<span/></i> element.</p>

(No match)