htmldiff creates erroneous diff

Bug #1190768 reported by Nicolas Dietrich
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

I reported that previously in the Github issue tracker, but this has been deactivated, so I'll add it in here again.

I'm creating a new issue instead of adding information to bug 315511 as this is a much simpler example, which doesn't involve nesting and only uses one tag type and thus might be easier to fix:

Htmldiff does the following:

    >>> htmldiff(u'<div>H</div><div>1</div><div>2</div>', u'<div>H</div><div>0</div><div>1</div><div>2</div>')
    u'<div><ins>H</ins></div><div><ins>0</ins></div> <div><del>H</del></div> <div>1</div><div>2</div>'

However the correct result would be - there's no need to touch the `<div>H</div>`.

    >>> htmldiff(u'<div>H</div><div>1</div><div>2</div>', u'<div>H</div><div>0</div><div>1</div><div>2</div>')
    u'<div>H</div><div><ins>0</ins></div><div>H</div><div>1</div><div>2</div>'

A similar example from the original Github issue is the following:

    >>> htmldiff(u'<h1>H</h1><ul><li>1</li><li>2</li></ul>', u'<h1>H</h1><ul><li>0</li><li>1</li><li>2</li></ul>')
    u'<ul><h1><ins>H</ins></h1><li><ins>0</ins></li> <h1><del>H</del></h1> <li>1</li><li>2</li></ul>'

instead of

    u'<h1>H</h1><ul><li><ins>0</ins></li><li>1</li><li>2</li></ul>'

This happens in lxml 3.0.1 as well as 3.2.1.

Thanks for looking into that stuff again at some time!

description: updated
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.