Lower casing the attribute names

Bug #1849229 reported by Vishwas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
New
Undecided
Unassigned

Bug Description

Version Information

Python : sys.version_info(major=3, minor=7, micro=4, releaselevel='final', serial=0)
lxml.etree : (4, 4, 1, 0)
libxml used : (2, 9, 9)
libxml compiled : (2, 9, 9)
libxslt used : (1, 1, 33)
libxslt compiled : (1, 1, 33)

For HTML strings like

```html

<span myCamelCasedAttr="value">TestSpan</span>

```

parsing it with lxml changes the attribute name from camel case to lower case as

```html

<span mycamelcasedattr="value">TestSpan</span>

```

This especially is a problem if you are searching for attributes using string comparision. For example using BeautifulSoup it looks something like this.

```python

from bs4 import BeautifulSoup
import re

htmlstr = "<span mycamelcasedattr="value">TestSpan</span>"

soup = BeautifulSoup(htmlstr, "html.parser")
# Even if initialized as below.
# soup = BeautifulSoup(htmlstr, "lxml")
res = soup.find_all(True, {'myCamelCasedAttr': re.compile(r".*")})

# Returns
# res = []

```

Parsing HTML strings should maintain the casing of the attributes that gets parsed.

Revision history for this message
Leonard Richardson (leonardr) wrote :

This ticket was originally filed against Beautiful Soup: https://bugs.launchpad.net/beautifulsoup/+bug/1849211

The HTML spec defines tag names and attribute values as case-insensitive (http://w3c.github.io/html-reference/documents.html#case-insensitivity), so lxml's behavior is correct, but it might make sense to allow this as an option.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.