Questionable handling of implied end tags

Bug #1866555 reported by Bob Kline
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

According to https://html.spec.whatwg.org/multipage/grouping-content.html#the-dt-element

"A dt element's end tag can be omitted if the dt element is immediately followed by another dt element or a dd element."

Based on that language, I would have expected this

from lxml import html
html.tostring(html.fromstring("<dl><dt>one<dt>two</dl>"))

to have resulted in

b'<dl><dt>one</dt><dt>two</dt></dl>'

but instead I get this:

b'<dl><dt>one<dt>two</dt></dt></dl>'

... so the path of the second dt becomes dl/dt/dt instead of dl/dt as expected.

Note that, by contrast, the handling of li elements appears to match what the spec says:

"An li element's end tag can be omitted if the li element is immediately followed by another li element or if there is no more content in the parent element."

html.tostring(html.fromstring("<ol><li>one<li>two</ol>"))
b'<ol><li>one</li><li>two</li></ol>'

I tried pre-flighting this on the mailing list, but I've been getting strange mail delivery failures for my most recent messages:

*** MAIL DELIVERY FAILURE REPORT ***

The original message was received at Sun, 08 Mar 2020 09:12:09 -0700 (PDT)
from host by mail-wr1-x432.google.com with SMTP id n7so8011429wrt.11 for .
          <email address hidden>.

Subject: Questionable handling of implied end tags
From: <email address hidden>

Mail delivery to the following recipient has finally failed:

<email address hidden>
   Last reason: 550 5.1.0
   Explanation: host mxa.eu.mailgun.org [18.195.181.121] said: Recipient rejected:
                <email address hidden>

   Transcript of session:
   ... while talking to mxa.eu.mailgun.org [18.195.181.121]:
   >>> RCPT TO:<email address hidden>
   <<< 550 5.1.0 Recipient rejected: <email address hidden>

Seems odd that the recipient address doesn't match my own. At any rate, I'm going straight to the bug tracker. Here's the requested environment report:

Python : sys.version_info(major=3, minor=7, micro=6, releaselevel='final', serial=0)
lxml.etree : (4, 4, 2, 0)
libxml used : (2, 9, 10)
libxml compiled : (2, 9, 10)
libxslt used : (1, 1, 34)
libxslt compiled : (1, 1, 34)

That's on MacOS. Also reproducible on Linux and Windows, Python 3.8.0, lxml 4.5.0.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.