xmllint does not recognize emdash (—)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
libxml2 (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
I'm using Ubuntu 20.04.2 LTS, x85_64, fully patched. I'm using DocBook to build a PDF. One of the steps I use in my build script is to validate and format the XML using xmllint from libxml2-utils
2.9.13+
echo "Validating book..."
if ! xmllint --xinclude --noout --postvalid book.xml
then
echo "Validation failed. Exiting."
exit 1
fi
echo "Complete."
echo "Formatting source code..."
for file in *.xml
do
if xmllint --format "${file}" --output "${file}.format"
then
mv "${file}.format" "${file}"
fi
done
echo "Complete."
When I added an emdash (—) the book failed to format:
Validating book...
Complete.
Formatting source code...
ch02.xml:58: parser error : Entity 'mdash' not defined
injections are remediated using several methods. And two output devices —
ch02.xml:58: parser error : Entity 'mdash' not defined
methods. And two output devices — the printer and plaintext email —
Complete.
The text is:
<para>... And two output devices — the printer and plaintext email — do not require...</para>
It seems like emdash should be recognized.
-----
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy
-----
$ xmllint --version
xmllint: using libxml version 20913
compiled with: Threads Tree Output Push Reader Patterns Writer SAXv1 FTP HTTP DTDValid HTML Legacy C14N Catalog XPath XPointer XInclude Iconv ICU ISO8859X Unicode Regexps Automata Schemas Schematron Modules Debug Zlib Lzma
$ command -v xmllint
/usr/bin/xmllint
$ dpkg -S /usr/bin/xmllint
libxml2-utils: /usr/bin/xmllint
$ apt-cache show libxml2-utils
Package: libxml2-utils
Architecture: amd64
Version: 2.9.13+
Multi-Arch: foreign
Priority: optional
Section: text
Source: libxml2
Origin: Ubuntu
Maintainer: Ubuntu Developers <email address hidden>
Original-
Bugs: https:/
Installed-Size: 202
Depends: libc6 (>= 2.34), libxml2 (>= 2.9.0)
Filename: pool/main/
Size: 40192
MD5sum: 3ca7de07562010f
SHA1: 128a9cfaff49e85
SHA256: c279c07caf90954
SHA512: 51600d7206c9a55
Homepage: http://
...
I doubt this is a bug: nowhere do you pass the validator a DTD, and entities are defined in the DTD.
It’s best practice nowadays to not use entities but just write the UTF-8 characters directly.
An em dash surrounded by hair spaces is: “ — ” (for your copy/paste convenience)