2019-06-17 14:25:05 |
Alexander Lebedev |
bug |
|
|
added bug |
2019-10-20 10:46:34 |
Alexander Lebedev |
description |
Replaces html with div.
In [20]: import lxml
In [21]: import lxml.html
In [22]: from lxml.html.clean import Cleaner
In [23]: text = u"""<html>
...: <body>
...: <h1>Hello, Parsel!</h1>
...: <ul>
...: <li><a href="http://example.com">Link 1</a></li>
...: <li><a href="http://scrapy.org">Link 2</a></li>
...: </ul>
...: </body>
...: </html>"""
In [24]: html_root = lxml.html.document_fromstring(text)
In [25]: html = Cleaner().clean_html(html_root)
In [26]: lxml.html.tostring(html)
Out[26]: '<div>\n <body>\n <h1>Hello, Parsel!</h1>\n <ul>\n <li><a href="http://example.com">Link 1</a></
li>\n <li><a href="http://scrapy.org">Link 2</a></li>\n </ul>\n </body>\n </div>' |
Replaces `html` tag with div.
In [20]: import lxml
In [21]: import lxml.html
In [22]: from lxml.html.clean import Cleaner
In [23]: text = u"""<html>
...: <body>
...: <h1>Hello, Parsel!</h1>
...: <ul>
...: <li><a href="http://example.com">Link 1</a></li>
...: <li><a href="http://scrapy.org">Link 2</a></li>
...: </ul>
...: </body>
...: </html>"""
In [24]: html_root = lxml.html.document_fromstring(text)
In [25]: html = Cleaner().clean_html(html_root)
In [26]: lxml.html.tostring(html)
Out[26]: '<div>\n <body>\n <h1>Hello, Parsel!</h1>\n <ul>\n <li><a href="http://example.com">Link 1</a></
li>\n <li><a href="http://scrapy.org">Link 2</a></li>\n </ul>\n </body>\n </div>' |
|
2019-10-20 10:46:45 |
Alexander Lebedev |
description |
Replaces `html` tag with div.
In [20]: import lxml
In [21]: import lxml.html
In [22]: from lxml.html.clean import Cleaner
In [23]: text = u"""<html>
...: <body>
...: <h1>Hello, Parsel!</h1>
...: <ul>
...: <li><a href="http://example.com">Link 1</a></li>
...: <li><a href="http://scrapy.org">Link 2</a></li>
...: </ul>
...: </body>
...: </html>"""
In [24]: html_root = lxml.html.document_fromstring(text)
In [25]: html = Cleaner().clean_html(html_root)
In [26]: lxml.html.tostring(html)
Out[26]: '<div>\n <body>\n <h1>Hello, Parsel!</h1>\n <ul>\n <li><a href="http://example.com">Link 1</a></
li>\n <li><a href="http://scrapy.org">Link 2</a></li>\n </ul>\n </body>\n </div>' |
Replaces `html` tag with `div`.
Example code:
In [20]: import lxml
In [21]: import lxml.html
In [22]: from lxml.html.clean import Cleaner
In [23]: text = u"""<html>
...: <body>
...: <h1>Hello, Parsel!</h1>
...: <ul>
...: <li><a href="http://example.com">Link 1</a></li>
...: <li><a href="http://scrapy.org">Link 2</a></li>
...: </ul>
...: </body>
...: </html>"""
In [24]: html_root = lxml.html.document_fromstring(text)
In [25]: html = Cleaner().clean_html(html_root)
In [26]: lxml.html.tostring(html)
Out[26]: '<div>\n <body>\n <h1>Hello, Parsel!</h1>\n <ul>\n <li><a href="http://example.com">Link 1</a></
li>\n <li><a href="http://scrapy.org">Link 2</a></li>\n </ul>\n </body>\n </div>' |
|
2020-06-24 09:05:46 |
Removed by request |
bug |
|
|
added subscriber Jules Lasne |