doctestcompare could use a "match any amount of tags or text" facility

Bug #1733204 reported by Colin Watson on 2017-11-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Wishlist
Unassigned

Bug Description

I'm in the middle of porting Launchpad's BeautifulSoup-based tests to BeautifulSoup 4. This makes a number of changes to how it renders parsed tags which are essentially cosmetic (different attribute order, "<br/>" rather than "<br />", etc.), but they throw off our doctests. As a result I currently have an unmanageable thousands-of-lines patch and I'm trying to figure out ways to make that more sensible so that I can push it for review: one appealing possibility is to have a smarter comparison method for doctest examples.

Unfortunately lxml.doctestcompare doesn't quite work as I think we'd like it to. Here's an example of one of our tests:

    >>> print text.findAll('p')[-1]
    <p><span class="foldable">--...
    &lt;email address hidden&gt;<br />
    Witty signatures rock!
    </span></p>

The text being matched here is something like:

  <p>
    <span class="foldable">
    --
      <br></br>_______
      <wbr></wbr>_______
      <wbr></wbr>_______
      <wbr></wbr>_
      <br></br>&lt;email address hidden&gt;
      <br></br>Witty signatures rock!
    </span>
  </p>

In this test we don't care about the fine detail of the <wbr/> tags that are inserted to make sure that very long words are breakable somewhere, but just that it's wrapped by a span tag with class="foldable" and that it has the trailing content given inside that. This works fine with plain doctest, but it doesn't seem possible to express quite this kind of thing with lxml.doctestcompare; we would have to be more specific than we need to be and thus make our tests more fragile.

As a reduced example, I'd like something like the attached doctestcompare-example.txt to work (not necessarily this exact syntax; I don't mind having to make some minor adjustments if necessary).

I know we're using a somewhat old version of lxml at the moment, but I checked the master branch on GitHub and it appears to have basically the same behaviour here.

Python : sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
lxml.etree : (3, 5, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

Colin Watson (cjwatson) wrote :
scoder (scoder) on 2018-03-21
Changed in lxml:
importance: Undecided → Wishlist
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers