html.SelectElement stripping whitespace from <option> values

Bug #1665241 reported by Ashish Kulkarni
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Fix Released
Low
Unassigned

Bug Description

Hello,

The fix for #399249 (https://github.com/lxml/lxml/commit/0b14af50fdc878199f8a3be3053eef42d3e9851f) has a bug: it strips the value and not just the text. If provided, the value is to be used as-is (which is confirmed with multiple browsers).

A sample test case:

==========================================================
from lxml import etree, html

doc = etree.fromstring("""
<html><body><form><select name="option_with_blanks">
  <option value="01 " selected="selected">First</option>
  <option value="02 ">Second</option>
</select></form></body></html>""", html.HTMLParser())

print('"%s"' % doc.xpath('//select')[0].value)
==========================================================

The versions doesn't really matter, as it is present in all versions since 2.2.3:

Python : sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
lxml.etree : (3, 5, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

Thanks,
Ashish

Revision history for this message
Ashish Kulkarni (ashkulz) wrote :

I've added a PR on Github which should fix this issue:

https://github.com/lxml/lxml/pull/228

scoder (scoder)
Changed in lxml:
milestone: none → 3.8.0
importance: Undecided → Low
status: New → Fix Committed
scoder (scoder)
Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.