lxml

doctestcompare could use a "match any amount of tags or text" facility

Bug #1733204 reported by Colin Watson on 2017-11-19

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	lxml	Confirmed	Wishlist	Unassigned

Bug Description

I'm in the middle of porting Launchpad's BeautifulSoup-based tests to BeautifulSoup 4. This makes a number of changes to how it renders parsed tags which are essentially cosmetic (different attribute order, " " rather than " ", etc.), but they throw off our doctests. As a result I currently have an unmanageable thousands-of-lines patch and I'm trying to figure out ways to make that more sensible so that I can push it for review: one appealing possibility is to have a smarter comparison method for doctest examples.

Unfortunately lxml.doctestcompare doesn't quite work as I think we'd like it to. Here's an example of one of our tests:

>>> print text.findAll('p')[-1]
 --...
 <email address hidden> 
 Witty signatures rock!

The text being matched here is something like:

--
 _______
 _______
 _______
 _
 <email address hidden>
 Witty signatures rock!

In this test we don't care about the fine detail of the tags that are inserted to make sure that very long words are breakable somewhere, but just that it's wrapped by a span tag with class="foldable" and that it has the trailing content given inside that. This works fine with plain doctest, but it doesn't seem possible to express quite this kind of thing with lxml.doctestcompare; we would have to be more specific than we need to be and thus make our tests more fragile.

As a reduced example, I'd like something like the attached doctestcompare-example.txt to work (not necessarily this exact syntax; I don't mind having to make some minor adjustments if necessary).

I know we're using a somewhat old version of lxml at the moment, but I checked the master branch on GitHub and it appears to have basically the same behaviour here.

Python : sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0)
lxml.etree : (3, 5, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)