XML is not recognized as XML

Bug #1535113 reported by Cornelius Kölbel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Beautiful Soup
Invalid
Undecided
Unassigned

Bug Description

I am running beautifulsoup 4.3.2 and lxml 3.5.0

I have a valid xml_data:

<?xml version="1.0" encoding="UTF-8"?>
<KeyContainer Version="1.0" xmlns ="urn:ietf:params:xml:ns:keyprov:pskc">
  <KeyPackage>
    <DeviceInfo>
      <Manufacturer>Feitian Technology Co.,Ltd</Manufacturer>
      <SerialNo>1000133508267</SerialNo>
    </DeviceInfo>
    <Key Id="1000133508267" Algorithm="urn:ietf:params:xml:ns:keyprov:pskc:hotp">
      <AlgorithmParameters>
        <ResponseFormat Length="6" Encoding="DECIMAL"/>
      </AlgorithmParameters>
      <Data>
        <Secret>
          <PlainValue>PuMnCivln/14Ii3DNhR4/1zGN5A=</PlainValue>
        </Secret>
        <Counter>
          <PlainValue>0</PlainValue>
        </Counter>
      </Data>
....

I am running

   xml = BeautifulSoup(xml_data, ["lxml"])

xml.builder is of type "LXMLTreeBuilder". So this looks fine. But:

xml.is_xml is False and

xml.contents[0] is

<html>
<body>
<KeyContainer Version="1.0" xmlns ="urn:ietf:params:xml:ns:keyprov:pskc">
  <KeyPackage>
    <DeviceInfo>
  ....

I assume xml.is_xml should be true and the contents should not be embedded in an html-body.

Maybe I am missing something?

I attached a short script.

Revision history for this message
Cornelius Kölbel (cornelius-koelbel-o) wrote :
Revision history for this message
tommy (tommy-shem1) wrote :

I have looked at this and tested the code and if you change this line it will work.

xml = BeautifulSoup(XML_PSKC, "xml")

Note the change to the line is ["lxml"] to "xml"

hope this helps.

Revision history for this message
Leonard Richardson (leonardr) wrote :

tommy's change is correct. Since Beautiful Soup is primarily an HTML parser, telling it to parse a document using "lxml" will use LXML's HTML parser. To get it to use LXML's XML parser you can say "xml" or "lxml-xml". This is documented here:

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser

Changed in beautifulsoup:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.