Comment 34 for bug 191199

Revision history for this message
In , Khampton (khampton) wrote :

Created attachment 354687
V1

The proposed patch addresses two issues, both related to generating well-formed XML:

1) It replaces instances of '&' with '&' while skipping over valid character entities (so that '™' would *not* become '™' for example).

2) It knocks out characters from disallowed character ranges (control characters, etc) per the XML 1.0 Spec (production 2.2). Note that the more verbose form of expressing the hex characters is used in the regex because the abbreviated form appearing in character classes evidently makes certain perl's lexers cry bitter tears of failure.