The problematic fact is <br/> matches <b[^>]*> (which was meant for capturing <b>)
test case:
import re
html = "<p>We use <i>italics with a self-closing br<br/></i> element.</p>"
html = re.sub( r"\s*<(font|[ibu]|em|strong)[^>]*>\s*(<(font|[ibu]|em|strong)[^>]*>\s*</(font|[ibu]|em|strong)>\s*){0,2}\s*</(font|[ibu]|em|strong)>", " ", html)
print html
Result:
<p>We use <i>italics with a self-closing br element.</p>
(Match = "<br/></i>")
test case #2:
import re
html = "<p>We use <i>italics with a self-closing span<span/></i> element.</p>"
html = re.sub( r"\s*<(font|[ibu]|em|strong)[^>]*>\s*(<(font|[ibu]|em|strong)[^>]*>\s*</(font|[ibu]|em|strong)>\s*){0,2}\s*</(font|[ibu]|em|strong)>", " ", html)
print html
Result:
<p>We use <i>italics with a self-closing span<span/></i> element.</p>
The expression seems to be https:/ /github. com/kovidgoyal/ calibre/ blob/v3. 2.1/src/ calibre/ ebooks/ conversion/ utils.py# L440
The problematic fact is <br/> matches <b[^>]*> (which was meant for capturing <b>)
test case:
import re
r" \s*<(font| [ibu]|em| strong) [^>]*>\ s*(<(font| [ibu]|em| strong) [^>]*>\ s*</(font| [ibu]|em| strong) >\s*){0, 2}\s*</ (font|[ ibu]|em| strong) >", " ", html)
html = "<p>We use <i>italics with a self-closing br<br/></i> element.</p>"
html = re.sub(
print html
Result:
<p>We use <i>italics with a self-closing br element.</p>
(Match = "<br/></i>")
test case #2:
import re
r" \s*<(font| [ibu]|em| strong) [^>]*>\ s*(<(font| [ibu]|em| strong) [^>]*>\ s*</(font| [ibu]|em| strong) >\s*){0, 2}\s*</ (font|[ ibu]|em| strong) >", " ", html)
html = "<p>We use <i>italics with a self-closing span<span/></i> element.</p>"
html = re.sub(
print html
Result:
<p>We use <i>italics with a self-closing span<span/></i> element.</p>
(No match)