lxml.html.iterlinks no longer works with bytestrings in lxml 5.1.0

Bug #2048920 reported by Chris Warrick
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Medium
scoder

Bug Description

Python : sys.version_info(major=3, minor=11, micro=7, releaselevel='final', serial=0)
lxml.etree : (5, 1, 0, 0)
libxml used : (2, 12, 3)
libxml compiled : (2, 12, 3)
libxslt used : (1, 1, 39)
libxslt compiled : (1, 1, 39)

Minimum example:

import lxml.html
for link in lxml.html.iterlinks(b'<a href="https://example.com">hello</a>'):
    print(link)

Result with lxml 5.0.1:
(<Element a at 0x7fa4b17fb570>, 'href', 'https://example.com', 0)

Result with lxml 5.1.0:
Traceback (most recent call last):
  File "/tmp/test.py", line 2, in <module>
    for link in lxml.html.iterlinks(b'<a href="https://example.com">hello</a>'):
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/lv/lib64/python3.11/site-packages/lxml/html/__init__.py", line 647, in __call__ meth = getattr(doc, self.name)
           ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'bytes' object has no attribute 'iterlinks'

I suspect this is caused by the replacement of `basestring` with `str` in `_MethodFunc`.
Old: https://github.com/lxml/lxml/blob/3fdcfaa283bdd2b09aff85ceb79f5d4b3e12e392/src/lxml/html/__init__.py#L658
New: https://github.com/lxml/lxml/blob/6133c0e6feeb4714576b8ee3c259dbcad6728e5d/src/lxml/html/__init__.py#L635

Some other files (e.g. src/lxml/html/clean.py) seem to also have this removed and may exhibit this regression.

Revision history for this message
scoder (scoder) wrote :

Thanks. This was accidentally lost when modernising the code base after removing Py2 support.
Fixed in https://github.com/lxml/lxml/commit/6619dfd4c446b3a813ab380b22ddd583d32b9a29

Changed in lxml:
assignee: nobody → scoder (scoder)
importance: Undecided → Medium
status: New → Fix Committed
milestone: none → 5.1.1
scoder (scoder)
Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.