Failure to parse XML file with reversed UTF-8 BOM
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Invalid
|
Undecided
|
Unassigned |
Bug Description
Hi Team,
Below is my input file:
<?xml version="1.0" encoding="UTF-8"?>
<book>
<title>This is test file</title>
</book>
If is will try to parse this file. I have got the error 'lxml.etree.
This is a starting text is a BOM. If i am open this file in notepad or another editor it is not show.
If i will try to resolve this issue then remove the BOM in this file after that i will parse this file through lxml and do another things.
The issue is if i will replace this text and save this file after my file modification date is change.
I want to know how to parse this file without removing BOM.
I request to you please handle this type of situation and solve this issue as soon as possible.
Python : sys.version_
lxml.etree : (4, 2, 5, 0)
libxml used : (2, 9, 8)
libxml compiled : (2, 9, 8)
libxslt used : (1, 1, 32)
libxslt compiled : (1, 1, 32)
Regards,
Anil Prasad
Could it be that this is an issue with your input file? Try opening it with a hexeditor to see if the first bytes are really the UTF-8 BOM: 0xEF,0xBB,0xBF. If so, please attach the file instead of copying it into the text.