nbsp Causes Trouble with Tidy
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Zorba |
New
|
Medium
|
Sorin Marian Nasoi |
Bug Description
-------
Submitted:
-------
Name: Michael Westbay
Email: ***************
Reason: other reasons
web site: http://
feedbackID: 24
-------
Message:
-------
A major problem with parsing HTML is the ever present nbsp entity. As you know, it's not part of the XML specification, so it needs to be converted to   before an XML parser can deal with it.
Playing with the "tidy funciton with options" live demo I've tried the following:
import module namespace html="http://
import schema namespace html-options="http://
html:parse(
With the "quote-nbsp" option set to "no" it will work if the paragraph is set to 'Foo!  Spaces' but not set to 'Foo!&#nbsp; Spaces'. It fails either way when "quote-nbsp" is set to "yes". This is also the case when "output-xml" is set to either "yes" or "no".
I was hoping that the tidy module would output the nbsp entity as   instead of for XML output, but that does not seem to be the case.
Is this something that can be fixed on the Zorba side within the HTML module? Or does this issue need to be handled on the Tidy side?
-------
Query:
-------
import module namespace html="http://
import schema namespace html-options="http://
html:parse(
In your first example &#nbsp; should be replaced by either one of the following:
-   or
-  
Wrt. to the second issue, the one related to the "quote-nbsp" option in tidy: /sourceforge. net/tracker/ ?func=detail& aid=3405598& group_id= 226244& atid=1067586
setting the
<tidyParam name="quote-nbsp" value="yes" />
raises an error and this seems like a bug in the html module.
I have created SF bug #3405598:
https:/