doctestcompare could use a "match any amount of tags or text" facility

Bug #1733204 reported by Colin Watson
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Confirmed
Wishlist
Unassigned

Bug Description

I'm in the middle of porting Launchpad's BeautifulSoup-based tests to BeautifulSoup 4. This makes a number of changes to how it renders parsed tags which are essentially cosmetic (different attribute order, "<br/>" rather than "<br />", etc.), but they throw off our doctests. As a result I currently have an unmanageable thousands-of-lines patch and I'm trying to figure out ways to make that more sensible so that I can push it for review: one appealing possibility is to have a smarter comparison method for doctest examples.

Unfortunately lxml.doctestcompare doesn't quite work as I think we'd like it to. Here's an example of one of our tests:

    >>> print text.findAll('p')[-1]
    <p><span class="foldable">--...
    &lt;email address hidden&gt;<br />
    Witty signatures rock!
    </span></p>

The text being matched here is something like:

  <p>
    <span class="foldable">
    --
      <br></br>_______
      <wbr></wbr>_______
      <wbr></wbr>_______
      <wbr></wbr>_
      <br></br>&lt;email address hidden&gt;
      <br></br>Witty signatures rock!
    </span>
  </p>

In this test we don't care about the fine detail of the <wbr/> tags that are inserted to make sure that very long words are breakable somewhere, but just that it's wrapped by a span tag with class="foldable" and that it has the trailing content given inside that. This works fine with plain doctest, but it doesn't seem possible to express quite this kind of thing with lxml.doctestcompare; we would have to be more specific than we need to be and thus make our tests more fragile.

As a reduced example, I'd like something like the attached doctestcompare-example.txt to work (not necessarily this exact syntax; I don't mind having to make some minor adjustments if necessary).

I know we're using a somewhat old version of lxml at the moment, but I checked the master branch on GitHub and it appears to have basically the same behaviour here.

Python : sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
lxml.etree : (3, 5, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

Revision history for this message
Colin Watson (cjwatson) wrote :
scoder (scoder)
Changed in lxml:
importance: Undecided → Wishlist
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.