Comment 2 for bug 872376

Revision history for this message
su_v (suv-lp) wrote :

Based on the lengthy discussion on #inkscape (irc): could you attach the original python script which created those file names with mixed or undefined encodings in the first place? (script + input (file, variables, command line parameters) + generated SVG file):

> 15:28 < CarlFK> su-v: no AI... process is this:
> 1) create pyconde.svg with id="title" and id="authors" on 2 text elements.
> 2) use python to read svg, set title/authors, save as <title>.svg,
> 3. shell out inkscape -- <title>.svg --export-png <title>.png

More details about step 1 could possibly be helpful, too:
- how is 'pyconde.svg' created?
- is it UTF-8 (default Inkscape SVG file)?
  <?xml version="1.0" encoding="UTF-8" standalone="no"?>
- is the content (string) of the <title> tag (which the script seems to reuse as file name) properly UTF-8 encoded as well?
- or is 'pyconde.svg' generated by a third-party tool (possibly encoded in ISO-8859-1)?
  <?xml version="1.0" encoding="ISO-8859-1" standalone="no"?>

It seems to me that if anything, this issue needs to be addressed in the python script to produce file names consistent with the user locale setting (you originally reported the locale to be "en_US" (non-UTF-8 locale) <http://paste.ubuntu.com/706185/>). However, the default encoding of file names on Ubuntu is 'UTF-8' (whichever version "Fairly vinalla ubuntu install" refers to).

AFAICT your bashism "cp x.svg $'\xdf.svg'" produces latin-1 (ISO-8859) encoded characters, whereas to generate UTF-8 encoded output, you'd have to use "cp x.svg $'\xC3\x9F'.svg" in bash.