lxml.html.diff gives peculiar diffs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Triaged
|
Undecided
|
Unassigned |
Bug Description
The following two cases strike me as peculiar:
>>> from lxml.html import diff
>>> diff.htmldiff('<p>a b c</p>', '<p>a b d e f</p> <div>f</div> <div>g</div>')
u'<p>a b <ins>d e f</ins>
>>> diff.htmldiff('<p>a b c</p>', '<p>c d e</p> <div>f</div> <div>g</div>')
u'<p><ins>c d e</ins></p><ins> </ins><
The lesser problem is that in the second, a new <p> is created, which is inconsistent. The major problem is that in both cases, the <div> elements are interspersed into the <p>, whereas they should be treated separately. The following, for example, would be reasonable markup to produce:
u'<p>a b <ins>d e f</ins> <del>c</del> </p><div>
As it stands, the markup produced by htmldiff(...) is unusable for me.
Changed in lxml: | |
assignee: | nobody → ianb |
Changed in lxml: | |
status: | New → Fix Committed |
status: | Fix Committed → In Progress |
Changed in lxml: | |
assignee: | Ian Bicking (ianb) → nobody |
status: | In Progress → Triaged |
Any update on this issue?
A certain chap I know has his knickers in a twist about it.