bzr ci does not accept unicode -m message if a file by the same name exists

Bug #563646 reported by PresuntoRJ
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
Parth Malwankar

Bug Description

I had a similar problem a loooong time ago, back in bazaar 0 something

Bazaar is once again not accepting certain caracters (actually, with marks) in pt_br (my case, might be happening in other languages as well)

I cannot write any letter with an accent in the commit message or it crashes...

In the following example, its the "ã" (witch is an "a" + "~") but it has happened recently with á é ó ú and õ that I have seen

$ bzr ci Travessia\ do\ eixão -m "Travessia do eixão"
bzr: ERROR: exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in position 66: ordinal not in range(128)

*** Bazaar has encountered an internal error. This probably indicates a
    bug in Bazaar. You can help us fix it by filing a bug report at
        https://bugs.launchpad.net/bzr/+filebug
    attaching the crash file
        /home/leitao/.cache/crash/bzr-20100415085029-10273.crash
    and including a description of the problem.

    The crash file is plain text and you can inspect or edit it to remove
    private information.

$ uname -a
Linux eee-u 2.6.32-21-generic #31-Ubuntu SMP Tue Apr 13 20:34:00 UTC 2010 i686 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu lucid (development branch)
Release: 10.04
Codename: lucid

$ dpkg --list | fgrep bzr
ii aptdaemon 0.11+bzr345-0ubuntu3 transaction based package management service
ii bzr 2.1.1-1 easy to use distributed version control syst
ii bzr-cvsps-import 0.0.1~bzr63-2 CVS to Bazaar importer
ii bzr-dbus 0.1~bzr39-1 D-Bus announcements plugin for Bazaar
ii bzr-explorer 1.0.1-0ubuntu1 GUI application for using Bazaar
ii bzr-git 0.4.3-2~ubuntu2~karmic Bazaar plugin providing Git integration
ii bzr-gtk 0.98.0-1ubuntu1 provides graphical interfaces to Bazaar (bzr
ii bzr-search 1.7.0~bzr77-1 search plugin for Bazaar
ii bzr-stats 0.0.1~bzr37-1 statistics plugin for Bazaar
ii bzr-svn 1.0.2-2 Bazaar plugin providing Subversion integrati
ii bzr-upload 0.1.1+bzr60-1 Bazaar plugin for uploading to web servers
ii bzr-xmloutput 0.8.6-1ubuntu1 XML Communication plugin for Bazaar
ii bzrtools 2.1.0-2~bazaar1~karmic Collection of tools for bzr
ii etckeeper 0.41ubuntu3 store /etc in git, mercurial, bzr or darcs
ii nautilus-bzr 0.98.0-1ubuntu1 Bazaar (bzr) integration for nautilus
ii python-aptdaemon 0.11+bzr345-0ubuntu3 Python module for the server and client of a
ii python-aptdaemon-gtk 0.11+bzr345-0ubuntu3 Python GTK+ widgets to run an aptdaemon clie
ii qbzr 0.18.5-0ubuntu1 Graphical interface for Bazaar using the Qt

$ env | fgrep LANG
LANG=pt_BR.utf8
GDM_LANG=pt_BR.utf8
LANGUAGE=pt_BR:en_US:en

Tags: unicode

Related branches

Revision history for this message
PresuntoRJ (fabio-tleitao) wrote :
Revision history for this message
PresuntoRJ (fabio-tleitao) wrote :
Revision history for this message
PresuntoRJ (fabio-tleitao) wrote :

feel free to ask for more log files, or conf files, or tests...

Revision history for this message
PresuntoRJ (fabio-tleitao) wrote :

as a workaround I have been avoiding these symbols, in the example above, I can keep the file name as is: Travessia\ do\ eixão , but I must change the comment in the commit to something like -m "Travessia do eixao" witch might be kind of acceptable, but in the end, ultimately wrong...

it has been working fine up until a couple of bzr versions ago (not sure the last one it has worked, probably 2.0.4, but I wont put my hand on a bible for that, I might be mistaken, and I am not sure how to verify this info)

Revision history for this message
Parth Malwankar (parthm) wrote :

I can confirm this.

[abc]% bzr ci € -m "€"
bzr: ERROR: exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 50: ordinal not in range(128)

*** Bazaar has encountered an internal error. This probably indicates a
    bug in Bazaar. You can help us fix it by filing a bug report at
        https://bugs.launchpad.net/bzr/+filebug
    attaching the crash file
        /home/parthm/.cache/crash/bzr-20100415103022-3546.crash
    and including a description of the problem.

    The crash file is plain text and you can inspect or edit it to remove
    private information.
[abc]% bzr -Derror ci € -m "€"
bzr: ERROR: exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 50: ordinal not in range(128)

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 853, in exception_to_return_code
    return the_callable(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 1055, in run_bzr
    ret = run(*run_argv)
  File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 661, in run_argv_aliases
    return self.run_direct(**all_cmd_args)
  File "/usr/lib/python2.6/dist-packages/bzrlib/commands.py", line 665, in run_direct
    return self._operation.run_simple(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/bzrlib/cleanup.py", line 122, in run_simple
    self.cleanups, self.func, *args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/bzrlib/cleanup.py", line 156, in _do_with_cleanups
    result = func(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/bzrlib/builtins.py", line 3099, in run
    ui.ui_factory.show_warning(warning_msg)
  File "/usr/lib/python2.6/dist-packages/bzrlib/ui/text.py", line 236, in show_warning
    self.stderr.write("bzr: warning: %s\n" % msg)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 50: ordinal not in range(128)
[abc]%
[abc]% bzr -Derror ci € -m "foo"
Committing to: /home/parthm/tmp/abc/
added €
Committed revision 1.
[abc]%

tags: added: unicode
Changed in bzr:
status: New → Confirmed
importance: Undecided → High
summary: - exceptions.UnicodeEncodeError: 'ascii' codec can't encode character
- u'\xe3' in position 66: ordinal not in range
+ bzr ci does not accept unicode with -m message
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 563646] Re: exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in position 66: ordinal not in range

The error is coming from changed UI code, I think.

File "/usr/lib/python2.6/dist-
packages/bzrlib/ui/text.py", line 236, in show_warning
   self.stderr.write("bzr: warning: %s\n" % msg)

This is called into from:
        if message is not None:
            try:
                file_exists = osutils.lexists(message)
            except UnicodeError:
                # The commit message contains unicode characters that can't
be
                # represented in the filesystem encoding, so that can't be a
                # file.
                file_exists = False
            if file_exists:
                warning_msg = (
                    'The commit message is a file name: "%(f)s".\n'
                    '(use --file "%(f)s" to take commit message from that
file)'
                    % { 'f': message })
                ui.ui_factory.show_warning(warning_msg)

Note that we attempt to use message as a filename; file_exists appears to be
True, for you, so you also have a file on disk called €. I don't know if the
original filer has that circumstance.

The root cause appears to be that stderr isn't a magicaly-encoding stream,
so the text_ui show_warning method is triggering an implicit conversion.

Martin, who I've copied, has been very active in this area recently, I'd
like his thoughts on what to do. My immediate reaction is to say that
text_ui.show_warning should be encoding if needed.

Revision history for this message
PresuntoRJ (fabio-tleitao) wrote : Re: [Bug 563646] Re: exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xe3' in position 66: ordinal not in range

Creepy,

I would have imagined it to be a parsing problem from the command
options... since the file do exit, and is accepted when I change the
commit comment to something else...

Let me know what I can do to help any further... for now, I'll keep
working around changing the comment.

Em Qui, 2010-04-15 às 21:30 +0000, Robert Collins escreveu:
> The error is coming from changed UI code, I think.
>
> File "/usr/lib/python2.6/dist-
> packages/bzrlib/ui/text.py", line 236, in show_warning
> self.stderr.write("bzr: warning: %s\n" % msg)
>
>
> This is called into from:
> if message is not None:
> try:
> file_exists = osutils.lexists(message)
> except UnicodeError:
> # The commit message contains unicode characters that can't
> be
> # represented in the filesystem encoding, so that can't be a
> # file.
> file_exists = False
> if file_exists:
> warning_msg = (
> 'The commit message is a file name: "%(f)s".\n'
> '(use --file "%(f)s" to take commit message from that
> file)'
> % { 'f': message })
> ui.ui_factory.show_warning(warning_msg)
>
> Note that we attempt to use message as a filename; file_exists appears to be
> True, for you, so you also have a file on disk called €. I don't know if the
> original filer has that circumstance.
>
> The root cause appears to be that stderr isn't a magicaly-encoding stream,
> so the text_ui show_warning method is triggering an implicit conversion.
>
> Martin, who I've copied, has been very active in this area recently, I'd
> like his thoughts on what to do. My immediate reaction is to say that
> text_ui.show_warning should be encoding if needed.
>

--
Fábio Leitão
..-. .- -... .. --- .-.. . .. - .- --- ...-.-

Revision history for this message
Parth Malwankar (parthm) wrote : Re: bzr ci does not accept unicode with -m message

Hi PresuntoRJ,

Robert is right. Unicode message seems to be a problem only if a file by the exact same name exists.
Having unicode in -m seems to work otherwise even with unicode content.
This is still a bug that need to be fixed though, just that is a specific case.

[abc]% ls
€/ x y
[abc]% bzr touch z
[abc]% bzr ci -m "€x"
Committing to: /home/parthm/tmp/abc/
added x
added y
added z
Committed revision 3.
[abc]%

summary: - bzr ci does not accept unicode with -m message
+ bzr ci does not accept unicode -m message if a file by the same name
+ exists
Parth Malwankar (parthm)
Changed in bzr:
status: Confirmed → In Progress
importance: High → Medium
assignee: nobody → Parth Malwankar (parthm)
Parth Malwankar (parthm)
Changed in bzr:
status: In Progress → Fix Released
milestone: none → 2.2b3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.