htmldiff creates erroneous diff

Bug #1190768 reported by Nicolas Dietrich on 2013-06-13
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Undecided
Unassigned

Bug Description

I reported that previously in the Github issue tracker, but this has been deactivated, so I'll add it in here again.

I'm creating a new issue instead of adding information to bug 315511 as this is a much simpler example, which doesn't involve nesting and only uses one tag type and thus might be easier to fix:

Htmldiff does the following:

    >>> htmldiff(u'<div>H</div><div>1</div><div>2</div>', u'<div>H</div><div>0</div><div>1</div><div>2</div>')
    u'<div><ins>H</ins></div><div><ins>0</ins></div> <div><del>H</del></div> <div>1</div><div>2</div>'

However the correct result would be - there's no need to touch the `<div>H</div>`.

    >>> htmldiff(u'<div>H</div><div>1</div><div>2</div>', u'<div>H</div><div>0</div><div>1</div><div>2</div>')
    u'<div>H</div><div><ins>0</ins></div><div>H</div><div>1</div><div>2</div>'

A similar example from the original Github issue is the following:

    >>> htmldiff(u'<h1>H</h1><ul><li>1</li><li>2</li></ul>', u'<h1>H</h1><ul><li>0</li><li>1</li><li>2</li></ul>')
    u'<ul><h1><ins>H</ins></h1><li><ins>0</ins></li> <h1><del>H</del></h1> <li>1</li><li>2</li></ul>'

instead of

    u'<h1>H</h1><ul><li><ins>0</ins></li><li>1</li><li>2</li></ul>'

This happens in lxml 3.0.1 as well as 3.2.1.

Thanks for looking into that stuff again at some time!

description: updated
description: updated
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers