doctestcompare could use a "match any amount of tags or text" facility
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
lxml |
Confirmed
|
Wishlist
|
Unassigned |
Bug Description
I'm in the middle of porting Launchpad's BeautifulSoup-based tests to BeautifulSoup 4. This makes a number of changes to how it renders parsed tags which are essentially cosmetic (different attribute order, "<br/>" rather than "<br />", etc.), but they throw off our doctests. As a result I currently have an unmanageable thousands-of-lines patch and I'm trying to figure out ways to make that more sensible so that I can push it for review: one appealing possibility is to have a smarter comparison method for doctest examples.
Unfortunately lxml.doctestcompare doesn't quite work as I think we'd like it to. Here's an example of one of our tests:
>>> print text.findAll(
<p><span class="
<email address hidden><br />
Witty signatures rock!
</span></p>
The text being matched here is something like:
<p>
<span class="foldable">
--
<
<
<
<wbr></wbr>_
<
<
</span>
</p>
In this test we don't care about the fine detail of the <wbr/> tags that are inserted to make sure that very long words are breakable somewhere, but just that it's wrapped by a span tag with class="foldable" and that it has the trailing content given inside that. This works fine with plain doctest, but it doesn't seem possible to express quite this kind of thing with lxml.doctestcom
As a reduced example, I'd like something like the attached doctestcompare-
I know we're using a somewhat old version of lxml at the moment, but I checked the master branch on GitHub and it appears to have basically the same behaviour here.
Python : sys.version_
lxml.etree : (3, 5, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)
Changed in lxml: | |
importance: | Undecided → Wishlist |
status: | New → Confirmed |