Declared and actual XML encoding should match, and the encoding should be XML valid

Bug #394943 reported by Display Name
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
bzr-xmloutput
Triaged
High
Guillermo Gonzalez

Bug Description

Windows XP, console encoding cp850, Bazaar 1.16.1. The command bzr xmlstatus > file is saving the file as latin-1 with the following content:

<?xml version="1.0" encoding="UTF-8"?><status workingtree_root="C:/Programa‡Æo/"></status>

‡Æ should be çã (it is, when I do not send the output to a file, or when I run chcp 1252 before the command). I tried to look into the source code of xmloutput and bzr but I could not figure out what's wrong.

Vincent Ladeuil (vila)
affects: bzr → bzr-xmloutput
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 394943] Re: Wrong chars in bzr xmloutput > file

I suspect this is a problem with console character encoding in bzr

Revision history for this message
Alexander Belchenko (bialix) wrote :

as claimed by directive <?xml version="1.0" encoding="UTF-8"?> all strings should be actually utf-8.

xmloutput plugin should use exact encoding for its output and explicitly encode all unicode strings to utf-8.

Revision history for this message
Alexander Belchenko (bialix) wrote :

I suspect this problem occurs because Linux developers used to have utf-8 as encoding everywhere. This is obviously wrong.

Revision history for this message
Alexander Belchenko (bialix) wrote :

The fix seems right for me, but it should be applied to all xml-* commands.

I'd suggest to look at _setup_outf method in bzrlib.commands.Command class and/or overload it.

summary: - Wrong chars in bzr xmloutput > file
+ Declared and actual XML encoding do not match
Revision history for this message
Guillermo Gonzalez (verterok) wrote : Re: Declared and actual XML encoding do not match

Renato,
Thanks for linking the branches

Changed in bzr-xmloutput:
assignee: nobody → Guillermo Gonzalez (verterok)
importance: Undecided → High
status: New → Confirmed
Changed in bzr-xmloutput:
milestone: none → 0.8.6
Changed in bzr-xmloutput:
status: Confirmed → Fix Committed
Changed in bzr-xmloutput:
status: Fix Committed → Fix Released
Changed in bzr-xmloutput:
status: Fix Released → Confirmed
Revision history for this message
Guillermo Gonzalez (verterok) wrote :

Hi Renato,

I thought this was fixed. sorry for marking it released :/

Is this happening in all commands or just in xmlstatus?

Changed in bzr-xmloutput:
milestone: 0.8.6 → 0.8.7
summary: - Declared and actual XML encoding do not match
+ Declared and actual XML encoding should match, and the encoding should
+ be XML valid
Revision history for this message
Guillermo Gonzalez (verterok) wrote :

Ok, I see the patch, thanks!

I'm not sure that's going to work for the terminal use case, but I need to check how bzr handle the text internally.
If it's unicode we are ok, but if it's bytes we need to encode/decode using the right encoding before encoding it using utf-8

Revision history for this message
Guillermo Gonzalez (verterok) wrote :

That sounds reasonable.

Changed in bzr-xmloutput:
milestone: 0.8.7 → later
Changed in bzr-xmloutput:
milestone: 0.8.8 → none
milestone: none → 0.8.8
status: Confirmed → Triaged
description: updated
Changed in bzr-xmloutput:
milestone: 0.8.8 → none
Changed in bzr-xmloutput:
milestone: none → 0.9
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.