calibre

Bug #1205637
Comment #6

Comment 6 for bug 1205637

Revision history for this message

Marc Na (ub40) wrote on 2017-07-06:

The expression seems to be https://github.com/kovidgoyal/calibre/blob/v3.2.1/src/calibre/ebooks/conversion/utils.py#L440

The problematic fact is matches <b[^>]*> (which was meant for capturing )

test case:

import re
html = "We use italics with a self-closing br element."
html = re.sub(
r"\s*<(font|[ibu]|em|strong)[^>]*>\s*(<(font|[ibu]|em|strong)[^>]*>\s*</(font|[ibu]|em|strong)>\s*){0,2}\s*</(font|[ibu]|em|strong)>", " ", html)
print html

Result:

We use italics with a self-closing br element.

(Match = " ")

test case #2:

import re
html = "We use italics with a self-closing span element."
html = re.sub(
r"\s*<(font|[ibu]|em|strong)[^>]*>\s*(<(font|[ibu]|em|strong)[^>]*>\s*</(font|[ibu]|em|strong)>\s*){0,2}\s*</(font|[ibu]|em|strong)>", " ", html)
print html

Result:

We use italics with a self-closing span element.

(No match)