a2x: pdf output fails with german umlauts in labels

Bug #1073247 reported by Guenther Montag
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
asciidoc (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Description: Ubuntu 12.10
Release: 12.10

asciidoc:
  Installiert: 8.6.7-1
  Kandidat: 8.6.7-1
  Versionstabelle:
 *** 8.6.7-1 0
        500 http://archive.ubuntu.com/ubuntu/ quantal/main amd64 Packages
        100 /var/lib/dpkg/status

Using german umlauts in headers or labels makes a2x -f pdf test.asciidoc fail. test.asciidoc is written in UTF-8.

...
pdflatex -interaction=batchmode test.tex
pdflatex failed
test.aux:25: Missing \endcsname inserted.
test.aux:25: leading text: ...ne erste �berschrift\relax }{section.1}{}}
test.aux:30: Missing \endcsname inserted.
test.aux:30: leading text: ...1}{1}{\refname \relax }{subsection.2.1}{}}
Unexpected error occured
Error: pdflatex compilation failed

a2x: ERROR: "dblatex" -t pdf -p "/etc/asciidoc/dblatex/asciidoc-dblatex.xsl" -s "/etc/asciidoc/dblatex/asciidoc-dblatex.sty" -V "/home/guenther/fh/3.Sem/wiw464_PLM/Winkelmann/test.xml" returned non-zero exit status 1
...

Trying the same with tex output (a2x -f tex test.asciidoc) shows, that umlauts in labels are not converted to latin1.

The bug seems to be known in Debian:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=622359#10

Revision history for this message
Guenther Montag (g-montag) wrote :
affects: kile (Ubuntu) → asciidoc (Ubuntu)
Revision history for this message
Oliver Burtchen (oliver-burtchen) wrote :

At least on Ubuntu 12.04 it seems there are two Problems involved:

1. asciidoc does not convert german-umlauts in xref-labels for the xml-output.
A workaround is to manually give plain ascii xref-labels if a header contains german umlauts and you need the xreference. Otherwise just ignore them.

2. texlive is not installed with utf-support by default, so pdflatex are unable to parse the generated output from asciidoc.
On Ubuntu you have to install the cyrillic-texlive package, which includes utf8 support.

sudo apt-get install texlive-lang-cyrillic

Now you have to tell asciidoc and dblatex to use it

a2x -a encoding=utf-8 -a lang=de --dblatex-opts='-P latex.encoding=utf8 -P latex.unicode.use=1' -v test.asciidoc

This works well with umlauts, but probably fails with the german sharp s (ß). So you have to set manual xrefs here, even if you don't use them.

Revision history for this message
Guenther Montag (g-montag) wrote : Re: [Bug 1073247] Re: a2x: pdf output fails with german umlauts in labels

Hallo Oliver,

das sieht ganz gut aus.

Am 23.11.2012 09:04, schrieb Oliver Burtchen:
> At least on Ubuntu 12.04 it seems there are two Problems involved:
>
> 1. asciidoc does not convert german-umlauts in xref-labels for the
> xml-output. A workaround is to manually give plain ascii xref-labels
> if a header contains german umlauts and you need the xreference.
> Otherwise just ignore them.

Wie soll ich die innerhalb des automatischen a2x-Prozesses händisch
setzen, ohne dass sie immer wieder überschrieben werden?

> 2. texlive is not installed with utf-support by default, so pdflatex
> are unable to parse the generated output from asciidoc. On Ubuntu you
> have to install the cyrillic-texlive package, which includes utf8
> support.
>
> sudo apt-get install texlive-lang-cyrillic
>
> Now you have to tell asciidoc and dblatex to use it
>
> a2x -a encoding=utf-8 -a lang=de --dblatex-opts='-P
> latex.encoding=utf8 -P latex.unicode.use=1' -v test.asciidoc

Lässt sich das in einer config-Datei von a2x als default festlegen?

> This works well with umlauts, but probably fails with the german
> sharp s (ß). So you have to set manual xrefs here, even if you don't
> use them.

Wie soll das Problem gelöst werden, damit eine automatische Generierung
möglich wird? Bei manueller Nacharbeit muss ich wohl nach jeder Änderung
am asciidoc-Dokument die Labels ändern. Das ist so nicht praktikabel.

Gruß
   Günther

--
Guenther Montag
<email address hidden> - (01 51) 50 99 33 22 - 14059 Berlin - Neufertstr. 10

Revision history for this message
Oliver Burtchen (oliver-burtchen) wrote :

Dear Günther,

> Wie soll ich die innerhalb des automatischen a2x-Prozesses händisch
> setzen, ohne dass sie immer wieder überschrieben werden?

You can manually give xref-labels in your asciidoc-document like this:

[[ueberschrift1]]
Eine erste Überschrift
------------------------
Das soll die Aufgabenstellung sein.

Now the xref-label for the header is not auto-generated, but is 'ueberschrift1'.

> Lässt sich das in einer config-Datei von a2x als default festlegen?

You can create a file /etc/asciidoc/a2x.conf (systemwide) or $HOME/.asciidoc/a2x.conf with content:

ASCIIDOC_OPTS = '-a encoding=utf-8 -a lang=de'
DBLATEX_OPTS = '-P latex.encoding=utf8 -P latex.unicode.use=1'

Now a simple 'a2x test.asciidoc' works for me.

> Wie soll das Problem gelöst werden, damit eine automatische Generierung
> möglich wird? Bei manueller Nacharbeit muss ich wohl nach jeder Änderung
> am asciidoc-Dokument die Labels ändern. Das ist so nicht praktikabel.

Was this a missunderstanding what I meant with "manual setting a label" (see above)?

Best Regards,
Oliver

Joseph (aerostitch)
Changed in asciidoc (Ubuntu):
assignee: nobody → Joseph HERLANT (herlantj)
status: New → Opinion
assignee: Joseph HERLANT (herlantj) → nobody
Revision history for this message
Joseph (aerostitch) wrote :

Hi,

Another solution here is to use fop:
a2x --fop test.asciidoc

The reason of this is explained here:
https://groups.google.com/d/msg/asciidoc/MnO-Cs1Xhgo/nIz-Gu4p3UUJ

Best,
Joseph

Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

This bug hit me, too, with french documents.

A simple document with one accent in any section title is enough to choke the program.

How to reproduce:

download attached test.asciidoc. Run:

a2x test.asciidoc

Observed:

a2x: ERROR: "dblatex" -t pdf -p "/etc/asciidoc/dblatex/asciidoc-dblatex.xsl" -s "/etc/asciidoc/dblatex/asciidoc-dblatex.sty" "/tmp/test.xml" returned non-zero exit status 1

Changed in asciidoc (Ubuntu):
status: Opinion → Confirmed
Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

# Minimal test case

Since the simplest test case with the smallest difference between pass and fail is often a good thing to have, here is one:

Copy-paste the lines below to test-mini-pass.asciidoc, run "a2x test-mini-pass.asciidoc", see that it generated a PDF.

= A

A

== B

A

Now replace the "B" with an accented letter, for example "é" like below.
Or copy-paste the lines below to test-mini-fail.asciidoc, run "a2x test-mini-fail.asciidoc", see that it fails.

= A

A

== é

A

Command typed: a2x test-mini-fail.asciidoc

Observed error message:

a2x: ERROR: "dblatex" -t pdf -p "/etc/asciidoc/dblatex/asciidoc-dblatex.xsl" -s "/etc/asciidoc/dblatex/asciidoc-dblatex.sty" "/tmp/test-mini-fail.xml" returned non-zero exit status 1

# Real bug ? Definitely.

Running the program on a simple document and have it fail makes it definitely a real bug.
Strictly, it does not prove that the bug is in asciidoc. The bug may be in dblatex.

# Bug severity: high

The fact is: this bug is important because it quickly hits any user that happen to write in a language with characters outside 7-bit ascii and uses a2x to generate a PDF (without any option).

It could be nice if some maintainer would look at the options mentioned above. If the bug is really in another package (say, dblatex) then the best thing to do is to assign the bug to that package to make maintainers aware and have a chance to fix it.

Revision history for this message
Joseph (aerostitch) wrote :

Hi,

Ok, I wasn't clear: IMHO this is not a bug but a configuration issue. You're just using a2x without the correct parameters.
If you want a2x/asciidoc to support utf-8 characters in the titles you have to use it with one of the following:
 - use the ascii-ids attribute (see section ids in official documentation and what Lex answered in the given google group link)
 - disable the section ids unsetting sectids (see section ids in official documentation)
 - give manually an ascii id to the section (see section ids in official documentation)
 - use the latex.encoding and latex.unicode.use attributes as mentionned above
 - use FOP instead of dblatex

If you want to use one of those solutions permanently, use a configuration file (see official documentation for more information about custom configuration files).

Best,
Joseph

Revision history for this message
Joseph (aerostitch) wrote :

This has been fixed in version 8.6.10+git20190307.51d7c14-1 as it is now using python3, so natively supporting unicode characters.

Changed in asciidoc (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.