Tag replacement
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
New
|
Undecided
|
Unassigned |
Bug Description
Replaces `html` tag with `div`.
Example code:
In [20]: import lxml
In [21]: import lxml.html
In [22]: from lxml.html.clean import Cleaner
In [23]: text = u"""<html>
...: <body>
...: <h1>Hello, Parsel!</h1>
...: <ul>
...: <li><a href="http://
...: <li><a href="http://
...: </ul>
...: </body>
...: </html>"""
In [24]: html_root = lxml.html.
In [25]: html = Cleaner(
In [26]: lxml.html.
Out[26]: '<div>\n <body>\n <h1>Hello, Parsel!</h1>\n <ul>\n <li><a href="http://
li>\n <li><a href="http://
description: | updated |
description: | updated |
Hi,
I have noticed the same behavior. The html tags are replaced by a div which messes up a load of stuff. If you load from string and then to string a div is added for no reason and that messes up xpath for example.
if page_structure is set to False, then the html tag isn't replaced or removed, but the divs that may be in the head tag are still placed in the body tag.
Bug or feature ?