recognize tag <meta name="content-type" content="charset=utf-8" /> issue
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Invalid
|
Undecided
|
Unassigned |
Bug Description
## -- environment
---------- python27 ----------
Python : sys.version_
lxml.etree : (4, 2, 2, 0)
libxml used : (2, 9, 7)
libxml compiled : (2, 9, 7)
libxslt used : (1, 1, 32)
libxslt compiled : (1, 1, 32)
## -----
test script
-------------------
# -*- coding=utf8 -*-
from lxml import html
style='.item_title'
html_recognize_
<!doctype html>
<html lang="en">
<head>
<meta name="content-type" content=
<title></title>
</head>
<body>
<span class="
</span>
<body></html>
'''
html_recognize_
<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8"> <-- recognize correct -->
<title></title>
</head>
<body>
<span class="
</span>
<body></html>
'''
#print html_frag
doc = html.fromstring
#doc = html.fromstring
for span in doc.cssselect(
text=span.
#print(repr(text))
print(text)
The parsing is done by libxml2. Please report the problem there.