HTMLPullParser doesn't return events if feed terminates within an attribute value

Bug #1990055 reported by abdul monim
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

Consider the following script

from lxml import etree

def print_elements(parser):
    print(list(element for event, element in parser.read_events()))

def this_works():
    parser = etree.HTMLPullParser()
    parser.feed('<span class="helooooooooooooo">Welcome</span>')
    print_elements(parser)

def this_also_works():
    parser = etree.HTMLPullParser()
    parser.feed('<span class="helooooooooooooo"')
    parser.feed(">Welcome</span>")
    print_elements(parser)

def this_fails():
    parser = etree.HTMLPullParser()
    parser.feed('<span class="heloooooo')
    parser.feed('ooooooo">Welcome</span>')
    print_elements(parser)

this_works()
this_also_works()
this_fails()

Output:

[<Element span at 0x103cf15c0>]
[<Element span at 0x103cf1c00>]
[]

The last one does not return any event or element as feed 1 ended within quoted attribute value.

Python : sys.version_info(major=3, minor=10, micro=5, releaselevel='final', serial=0)
lxml.etree : (4, 9, 1, 0)
libxml used : (2, 9, 14)
libxml compiled : (2, 9, 14)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.