I have this same issue on my Windows/cygwin environment using the following construct to parse an xhtml document:
parser = etree.XMLParser(load_dtd = True, dtd_validation = True, remove_blank_text=True, attribute_defaults = True) html = etree.parse(inputhtmlfile,parser)
returns:
Python : (2, 6, 5, 'final', 0) lxml.etree : (2, 2, 6, 0) libxml used : (2, 7, 7) libxml compiled : (2, 7, 7) libxslt used : (1, 1, 26) libxslt compiled : (1, 1, 26) Traceback (most recent call last): File "generatehtml.py", line 61, in <module> html = etree.parse(inputhtmlfile,parser) File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/lxml.etree.c:49958) File "parser.pxi", line 1500, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:71797) File "parser.pxi", line 1529, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:72080) File "parser.pxi", line 1429, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:71175) File "parser.pxi", line 975, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:68173) File "parser.pxi", line 539, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:64257) File "parser.pxi", line 625, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:65178) File "parser.pxi", line 565, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:64521) lxml.etree.XMLSyntaxError: Attempt to load network entity http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd
Whereas on my Debian environment, with an older installation, there is no problem doing the same:
Python : (2, 5, 2, 'final', 0) lxml.etree : (2, 1, 1, 0) libxml used : (2, 6, 32) libxml compiled : (2, 6, 32) libxslt used : (1, 1, 24) libxslt compiled : (1, 1, 24)
I have this same issue on my Windows/cygwin environment using the following construct to parse an xhtml document:
parser = etree.XMLParser (load_dtd = True, dtd_validation = True, remove_ blank_text= True, attribute_defaults = True) inputhtmlfile, parser)
html = etree.parse(
returns:
Python : (2, 6, 5, 'final', 0) inputhtmlfile, parser) lxml.etree. c:49958) _parseDocument (src/lxml/ lxml.etree. c:71797) _parseDocumentF romURL (src/lxml/ lxml.etree. c:72080) _parseDocFromFi le (src/lxml/ lxml.etree. c:71175) _BaseParser. _parseDocFromFi le (src/lxml/ lxml.etree. c:68173) _ParserContext. _handleParseRes ultDoc (src/lxml/ lxml.etree. c:64257) _handleParseRes ult (src/lxml/ lxml.etree. c:65178) _raiseParseErro r (src/lxml/ lxml.etree. c:64521) XMLSyntaxError: Attempt to load network entity http:// www.w3. org/TR/ xhtml1/ DTD/xhtml1- strict. dtd
lxml.etree : (2, 2, 6, 0)
libxml used : (2, 7, 7)
libxml compiled : (2, 7, 7)
libxslt used : (1, 1, 26)
libxslt compiled : (1, 1, 26)
Traceback (most recent call last):
File "generatehtml.py", line 61, in <module>
html = etree.parse(
File "lxml.etree.pyx", line 2706, in lxml.etree.parse (src/lxml/
File "parser.pxi", line 1500, in lxml.etree.
File "parser.pxi", line 1529, in lxml.etree.
File "parser.pxi", line 1429, in lxml.etree.
File "parser.pxi", line 975, in lxml.etree.
File "parser.pxi", line 539, in lxml.etree.
File "parser.pxi", line 625, in lxml.etree.
File "parser.pxi", line 565, in lxml.etree.
lxml.etree.
Whereas on my Debian environment, with an older installation, there is no problem doing the same:
Python : (2, 5, 2, 'final', 0)
lxml.etree : (2, 1, 1, 0)
libxml used : (2, 6, 32)
libxml compiled : (2, 6, 32)
libxslt used : (1, 1, 24)
libxslt compiled : (1, 1, 24)