cvs2cl outputs bad xml from mixed charset log messages

Bug #302467 reported by Dario
2
Affects Status Importance Assigned to Milestone
cvs2cl (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: cvs2cl

Environment information:
Description: Ubuntu 8.04.1
Release: 8.04
Package: cvs2cl 2.59-2

What I expected: well-formed xml outputted independently from input data (e.g. mixed charsets);
What happened instead: wrong xml, letting mixed charsets in input data break any xml validation.

Scenario: serving my cvs changelog as an html page, by getting it as xml and then applying an xslt transformation:
( cvs -d /var/my_repo rlog ) | cvs2cl --rcs /var/my_repo --xml --xml-encoding=utf-8 --stdin --stdout | xsltproc my_stylesheet.xslt -

When a CVS repository is accessed from many different operating systems, you will collect log messages with mixed text encoding, say utf-8, iso-8859-1, etc.
cvs2cl will output those messages in a <msg /> tag "as is", assuming just one encoding by the --xml-encoding option.
This breaks any xsltproc transformation because of invalid utf-8 charset.
Giving --xml-encoding=iso-8859-1 passes validation, but utf-8 log messages are corrupt.

Dario (shizuto)
description: updated
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.