html.SelectElement stripping whitespace from <option> values

Bug #1665241 reported by Ashish Kulkarni on 2017-02-16
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
lxml
Low
Unassigned

Bug Description

Hello,

The fix for #399249 (https://github.com/lxml/lxml/commit/0b14af50fdc878199f8a3be3053eef42d3e9851f) has a bug: it strips the value and not just the text. If provided, the value is to be used as-is (which is confirmed with multiple browsers).

A sample test case:

==========================================================
from lxml import etree, html

doc = etree.fromstring("""
<html><body><form><select name="option_with_blanks">
  <option value="01 " selected="selected">First</option>
  <option value="02 ">Second</option>
</select></form></body></html>""", html.HTMLParser())

print('"%s"' % doc.xpath('//select')[0].value)
==========================================================

The versions doesn't really matter, as it is present in all versions since 2.2.3:

Python : sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)
lxml.etree : (3, 5, 0, 0)
libxml used : (2, 9, 3)
libxml compiled : (2, 9, 2)
libxslt used : (1, 1, 28)
libxslt compiled : (1, 1, 28)

Thanks,
Ashish

Ashish Kulkarni (ashkulz) wrote :

I've added a PR on Github which should fix this issue:

https://github.com/lxml/lxml/pull/228

scoder (scoder) on 2017-02-17
Changed in lxml:
milestone: none → 3.8.0
importance: Undecided → Low
status: New → Fix Committed
scoder (scoder) on 2017-06-03
Changed in lxml:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers