invalid predicate when using text() in query

Bug #1939925 reported by Rufus
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Triaged
Wishlist
Unassigned

Bug Description

Python : sys.version_info(major=3, minor=7, micro=2, releaselevel='final', serial=0)
lxml.etree : (4, 6, 3, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)

I have this html tree
```
>>> tree = etree.fromstring("""
<html><body>
  <div>
    <table>
      <tr>
        <th>First</th>
        <td>Val1</td>
      </tr>
    </table>
    <table>
      <tr>
        <th>Second</th>
        <td>Val2</td>
      </tr>
    </table>
  </div>
</body></html>
""")
```

and I keep getting an error when trying to do the following query on the tree (I'm specifically trying to walk the table elements that have a specific header value)
```
>>> tree.find('./body/div/table[tr/th/text() = "First"]')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "src/lxml/etree.pyx", line 1532, in lxml.etree._Element.find
  File "src/lxml/_elementpath.py", line 323, in lxml._elementpath.find
  File "src/lxml/_elementpath.py", line 312, in lxml._elementpath.iterfind
  File "src/lxml/_elementpath.py", line 295, in lxml._elementpath._build_path_iterator
  File "src/lxml/_elementpath.py", line 237, in lxml._elementpath.prepare_predicate
SyntaxError: invalid predicate
```

meanwhile this works as expected
```
>>> finder = etree.XPath('./body/div/table[tr/th/text() = "First"]')
>>> finder(tree)
[<Element table at 0x7fde6812f988>]
```

It looks like the `tree.find` query fails whenever I try to use the path names or the text() operator as part of the filter. I would've expected `tree.find` (and `iterfind`) to support the same syntax as `etree.XPath` so you can efficiently iterate the tree one at a time. Is this a bug or are more advanced queries just not supported?

Revision history for this message
scoder (scoder) wrote :

> Is this a bug or are more advanced queries just not supported?

Missing features. find() and friends (a.k.a. ElementPath) only support a subset of XPath (and use an independent implementation), which could be extended. PRs welcome.

Note that ElementPath is mostly what CPython's xml.etree.ElementPath module provides in the standard library.

Changed in lxml:
importance: Undecided → Wishlist
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.