'<a href="asdf">test</a>' # Still the comment is stripped
Am I missing something here?
> Note that passing a parsed tree into the cleaner does not suffer from this issue.
I don't find that to be true either, but perhaps I'm misunderstanding. I have this code:
def clean_a_tree(trees):
assert isinstance(tree, lxml.html.HtmlElement), (
"`tree` must be of type HtmlElement, but is of type %s. Cleaner() can "
"work with strs and unicode, but it does bad things to encodings if "
"given the chance."
% type(tree)
)
cleaner = Cleaner( javascript=False, safe_attrs_only=False, forms=False, comments=False, processing_instructions=False, scripts=True,
style=True,
links=True, embedded=True, frames=True,
)
return cleaner.clean_html(tree)
So it asserts that it's getting a tree, but this still suffers from the issue.
Thanks for the help. I'm pretty lost and I admit I'm frustrated with this issue. I'm guessing it's just a documentation issue, but I haven't been able to sort it out yet.
Hm, I'd expect that setting the default parser would fix this then?
I just tried this, but didn't get the fix I was hoping for:
etree.set_ default_ parser( etree.HTMLParse r()) >test</ a>' attrs_only= False, style=False, structure= False, instructions= False, tags=False, unknown_ tags=False
html = '<!-- comment--><a href="asdf"
Cleaner(
javascript=False,
safe_
scripts=False,
comments=False,
style=False,
inline_
links=False,
meta=False,
page_
processing_
embedded=False,
frames=False,
forms=False,
annoying_
remove_
).clean_html(html)
'<a href="asdf" >test</ a>' # Still the comment is stripped
Am I missing something here?
> Note that passing a parsed tree into the cleaner does not suffer from this issue.
I don't find that to be true either, but perhaps I'm misunderstanding. I have this code:
def clean_a_ tree(trees) : HtmlElement) , (
javascript= False,
safe_attrs_ only=False,
forms= False,
comments= False,
processing_ instructions= False,
scripts= True,
embedded= True,
frames= True, clean_html( tree)
assert isinstance(tree, lxml.html.
"`tree` must be of type HtmlElement, but is of type %s. Cleaner() can "
"work with strs and unicode, but it does bad things to encodings if "
"given the chance."
% type(tree)
)
cleaner = Cleaner(
style=True,
links=True,
)
return cleaner.
So it asserts that it's getting a tree, but this still suffers from the issue.
Thanks for the help. I'm pretty lost and I admit I'm frustrated with this issue. I'm guessing it's just a documentation issue, but I haven't been able to sort it out yet.
Thank you again,
Mike