hxselect crash over not-closed tag
Bug #1878637 reported by
Fabio
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
html-xml-utils (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Hello
I can't parse any html page anymore because hxselect crash when ecountering a tag that was not closed.
For example
<img src=".......>
<a>....</a>
crash because img was not closed correctly, even after the command xhnormalize.
Example
curl http://
% Total % Received % Xferd Average Speed Time Time Time Current
100 37821 100 37821 0 0 22182 0 0:00:01 0:00:01 --:--:-- 22182
End tag </a> doesn't match start tag <img> <------
Thank you
To post a comment you must log in.
Same problem, different tags:
curl -s "https:/ /www.altrogiorn ale.org/ aristarco- di-samo- e-la-luna/" | tac | tac | hxclean | hxnormalize | hxselect "div.cmsmasters _post_content: nth-child( 1)"
End tag </div> doesn't match start tag <input>