[5.0] [trunk] non-ascii characters not handled properly in XML views

Bug #613721 reported by forstera
84
This bug affects 13 people
Affects Status Importance Assigned to Milestone
Odoo Server (MOVED TO GITHUB)
Status tracked in Trunk
5.0
Fix Released
Medium
Olivier Dony (Odoo)
Trunk
Fix Released
Medium
Olivier Dony (Odoo)

Bug Description

Hello,
I just installed the last 5.0.12 server version but when restarting the server, I get the following error :

  File "/usr/lib/openerp-server/osv/orm.py", line 1104, in __view_look_dom
    node.set('string', trans)
  File "lxml.etree.pyx", line 646, in lxml.etree._Element.set (src/lxml/lxml.etree.c:9638)
  File "apihelpers.pxi", line 416, in lxml.etree._setAttributeValue (src/lxml/lxml.etree.c:31554)
  File "apihelpers.pxi", line 1136, in lxml.etree._utf8 (src/lxml/lxml.etree.c:36998)
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes

This is due to a xml file i made which contains the following instruction :

<separator string="Valeurs dans la monnaie de la société" colspan="4"/>

So, I changed my accented letters with the symbols &#233; but the problem remains ...

This problem did not occured in the previous versions

Thanks

Changed in openobject-server:
status: New → Confirmed
Revision history for this message
Ravindra Mekhiya(OpenERP) (rme-openerp) wrote :

Hello Arnaud,

I found the solution and attached patch for it.
Would you please apply this patch and notify us ?

Hope it will help you.

Thank you for pointing out.

Changed in openobject-server:
assignee: nobody → RME(OpenERP) (rme-openerp)
Revision history for this message
forstera (arnaud-forster-deactivatedaccount) wrote :

ok, I check

Revision history for this message
forstera (arnaud-forster-deactivatedaccount) wrote :

Hello,

the patch is fine, the display is correct now. The error doesnt occur anymore

Thanks

Revision history for this message
Christophe CHAUVET (christophe-chauvet) wrote :

Hi

Sorry it's not a bug, you must write in XML term with no accent, and use the translation for this

Regards,

Revision history for this message
forstera (arnaud-forster-deactivatedaccount) wrote :

Hello,

I'm not a pro of xml, but how can I translate a term included in the xml file like this ? are all the terms stored in the base ?

<separator string="Valeurs dans la monnaie de la société" colspan="4"/>

Thanks

Revision history for this message
Ravindra Mekhiya(OpenERP) (rme-openerp) wrote :

Hi,

I agree Christophe, but what if someone is adding an accented string from client(via admin menus)?

It would be better if we provide a cover for this error not to occur.

Let us know if this patch is breaking anything.

Meanwhile,I am investigating more.

Thanks.

Revision history for this message
Borja López Soilán (NeoPolus) (borjals) wrote :

forstera Christophe is saying that you should write your strings in English and then provide a translation to your language.

Anyway I'm with RME: we need to support accented characters anyway, just think about final users editing the view from the client, or about developers wanting to create a quick prototype...

Revision history for this message
Christophe CHAUVET (christophe-chauvet) wrote :

I think this behaviour mustn't be corrected in 5.0 (because may introduce a regression)

But why not for 6.0

Regards,

Revision history for this message
Nhomar - Vauxoo (nhomar) wrote : Re: [Bug 613721] Re: accented letters are not accepted anymore in 5.0.12

The patch works fine,,,

The regression was when the server, not now with the patch, at moment it
doesn't allow anymore accents... I think is a bug, and bug should be
corrected in both 5.0 and ported to 6.0....

Regards

Revision history for this message
Borja López Soilán (NeoPolus) (borjals) wrote : Re: accented letters are not accepted anymore in 5.0.12

Christophe: The thing is, that this used to work on 5.0, so some people may have used such 'accented strings' on the views already (I would say it might be common bad-practice in Spain...), OpenERP would suddenly break for those if we don't fix the regression.

Revision history for this message
Christophe CHAUVET (christophe-chauvet) wrote :

Sorry but view with accent have never works in 5.0, it's not a regression

It's a new feature for me.

Regards,

Revision history for this message
Ravindra Mekhiya(OpenERP) (rme-openerp) wrote :

Hi Guys,

I feel like agreeing to both the sides, but not without proper reason.

We can fix it on 6.0 undoubtedly.

As far as 5.0 is concerned, would you please let me know where will the patch introduce regression? We are bound to fix that too, true?

If so, won't it introduce regressions on 6.0 too? We would like to fix that even.

Aim will be to fix the ugly tracebacks,not letting the system confuse user why he got some error or trapped in a bug.

However, suggestions are warmly invited.

Thanks for your interest.

Revision history for this message
Borja López Soilán (NeoPolus) (borjals) wrote :

Christophe, I just checked on 5.0.10 and it works, so it is a regression on 5.0.11 or 5.0.12: I edited the res.partner.form view and changed the title of the "General" page to "Génêràl", then I went to the partners and opened one partner in form view: it displayed the expected "Génêràl" on both GTK and Koo clients.

Revision history for this message
xrg (xrg) wrote : Re: [Bug 613721] Re: accented letters are not accepted anymore in 5.0.12

On Friday 06 August 2010, you wrote:
> Hi Guys,
>
> I feel like agreeing to both the sides, but not without proper reason.

IMHO, xml files *must* support the full unicode charset when they load. That
said, we must play by the rules and specify the encoding at the head of the
xml file. Else, Latin-1 is assumed and strings cannot contain non-latin chars.

It is not about English as a base of translation. It is about being able to
properly support charsets. Example, if you want a string to contain the
copyright, Euro, degree or other symbols.

As for v5.0, my suggestion is to NOT change the behavior between minor
versions. If it is not a bug, if it doesn't stop somebody from working, don't
change it!

Revision history for this message
Ravindra Mekhiya(OpenERP) (rme-openerp) wrote : Re: accented letters are not accepted anymore in 5.0.12

Hi Guys,

I have investigated the problem. Its more a technical.

This cannot be called as a regression, but it was a hidden bug.

Here I explain:

--> Look at the fix http://bazaar.launchpad.net/~openerp/openobject-server/5.0/revision/2086 and the bug linked.

Earlier, It was not returning the original source if the the language is False or the translation was not found.
After the the fix in 2086, the return value is an encoded one and thus the actual issue has come out of the box.

Talking technically,
orm.py Line 1097 onwards... i.e texting from... if ('lang' in context) and not result:..

Earlier, before the fix the values of trans was None/False/'' from the result of _get_source.
After the fix, its now the original source other than None/False/''.

Now, if the text has some accented characters, you would get an error.

Thus, the patch attached solves the problem.

Please share your views here.

Revision history for this message
Shah Japan (jsh.axelor) wrote :

IMHO: instead of using "return source" on /server/bin/addons/base/ir/ir_translation.py
it should be "return tools.ustr(source)"

Revision history for this message
Ravindra Mekhiya(OpenERP) (rme-openerp) wrote :

Hello Experts,

We are waiting for the confirmations from you.

Kindly have a look at comment https://bugs.launchpad.net/openobject-server/+bug/613721/comments/15 and let us know your feedback.

Thanks for your time.

Revision history for this message
Shah Japan (jsh.axelor) wrote :

Its a bug and must be resolved - this can not be called as regression.

Revision history for this message
Borja López Soilán (NeoPolus) (borjals) wrote :

By the way, the proposed patch seems to work for us :)

Revision history for this message
Stephane Wirtel (OpenERP) (stephane-openerp) wrote :

I'm sorry, but it's not a bug, the XML files should be in English, not in French or Spanish.

If your separator must be in French, you can use the translation process. and translate the english sentence to french.

Revision history for this message
Borja López Soilán (NeoPolus) (borjals) wrote :

Stephane, but:

  - As xrg mentioned, this not only prevents using French or Spanish strings, but you won't be able to use symbols like € or ¥ either.

  - Views working on previous versions might raise errors on 5.0.12. (This is why I call it regression: OpenERP could give a warning or show a 'accents in XML views are deprecated' message on the logs, so the users adapt their views slowly, but it is crashing instead).

  - OpenObject becomes less R.A.D.: you can't prototype in your customer language anymore! Now you need to translate those views even if you just wanted to do a small preview. - Wasn't R.A.D. one of the reasons to allow the users to edit the views XML from OpenERP?...

Revision history for this message
Raphaël Valyi - http://www.akretion.com (rvalyi) wrote :

Guys,

we had this trouble in found that in v6 too: https://bugs.launchpad.net/openobject-server/+bug/617484

Now, I would say: Python stuff definitely needs to be English. But for XML in views, as Borja says, it looks like it's much less RAD now, much less user friendly: all that slideware http://www.slideshare.net/tiny07/openobject-intro goes now to trash as now the noob users are supposed to know master all the tricks of Openobject translation system (even advanced integrators here don't know them as reported in that thread, how cool and smart is that to require beginners to know them?)

But more importantly, what is the the purpose of that whole code that blows up when encountering non UTF-8 in XML ???

Look, the patch I propose is even simpler: https://bugs.launchpad.net/openobject-server/+bug/617484/+attachment/1488450/+files/unicode.patch

It seems to me that OpenERP is trying to convert string into unicode twice currently which makes overhead and that new bug what is the purpose of that? Is my patch wrong?

I mean, if that whole encoding here is useless and pure overhead, then are we sure we want to make that bug occur just to make Openobject more cumbersome ??? Or am I getting it wrong is there a bug in my patch https://bugs.launchpad.net/openobject-server/+bug/617484/+attachment/1488450/+files/unicode.patch ?

Please explain...

Revision history for this message
Jay Vora (Serpent Consulting Services) (jayvora) wrote :

Hi Guys,

I agree with RME(OpenERP), it seems like a fix has shown a hidden bug to us.

Agreeing with the patch of Raphael, we may opt to send (non-encoded) string to _get_source.

Recent changes that took place on those lines are : http://bazaar.launchpad.net/~openerp/openobject-server/5.0/revision/1866.3.1

Moreover, have a look at comment https://bugs.launchpad.net/openobject-server/+bug/613721/comments/15 and let us know your feedback.

Thanks.

Revision history for this message
Raphaël Valyi - http://www.akretion.com (rvalyi) wrote :

Again, we had no problem with my patch so far. The previous patch at comment #15 was encoding translations to utf-8, then transforming them back to string and then encoding them again before sending them to _get_source (!!!) What was the purpose of all that overhead? In https://bugs.launchpad.net/openobject-server/+bug/617484/+attachment/1488450/+files/unicode.patch I simply proposed to send th UTF-8 encoding from translation to _get_source which just works unless proven the contrary.

Revision history for this message
Blqt (benoit-luquet) wrote :

Hi,
Another regression, apparently : The title of a view in a dashboard cannot contain accented characters.
We had to edit all our users dashboards

To my knowledge, the translation mechanism does not apply to dashboard views titles.

Benoit

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Hmm, eventually we will have to fix the framework to use unicode everywhere, but that's not feasible right now.

The rule is as follows: as a best practice developers should write all strings in code and XML in plain English (and use the translation system for other languages). But of course this does not prevent them from using non-ASCII characters in some circumstances (symbols, quick prototypes/tests, etc.). So the framework/server should of course allow non-ASCII characters in XML.

A bug was introduced by the fix for bug 608029 because it returns one of the parameters passed to _get_source() without checking its type, causing _get_source() to return inconsistent types. This is the case both in 5.0.12+ and in trunk.

Raphael's patch is correct (because _get_source() should always return unicode and node.set() expects unicode), but not sufficient because we need to also make sure that _get_source() does return unicode in all circumstances.

Here is what I will do:

1. In both 5.0 and trunk: apply Raphael's patch + also jsh's suggestion: ensure _get_source() returns unicode even when it returns its 'source' parameter

2. In trunk: eventually we should improve things even more: _get_source() should document that it expects only unicode arguments and always returns unicode arguments. We should add an assert verifying that the parameters are indeed unicode, and fix all callers in the framework so that they do indeed pass unicode (this method is private and should not be used directly by addons).

summary: - accented letters are not accepted anymore in 5.0.12
+ [5.0] [trunk] non-ascii characters not handled properly in XML views
Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Fixes landed for 5.0 in revisions:
- 2120 <email address hidden>
- 2119 <email address hidden>

Thanks everyone for the analysis and proposed patches!

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Fixed in trunk as well with:
 - 2714 <email address hidden>
 - 2713 <email address hidden>

Revision history for this message
Borja López Soilán (NeoPolus) (borjals) wrote :

Thanks to you Olivier :)

Revision history for this message
Cristian Salamea (ovnicraft) wrote :

Hello i am working in 6.1 too and is not fixed.

Regards,

Revision history for this message
Alexandre Fayolle - camptocamp (alexandre-fayolle-c2c) wrote : Re: [Bug 613721] Re: [5.0] [trunk] non-ascii characters not handled properly in XML views

On jeu. 09 août 2012 03:23:50 CEST, Cristian Salamea (Gnuthink) wrote:
> Hello i am working in 6.1 too and is not fixed.
>
> Regards,
>

Do you have a proper XML encoding declaration at the begining of your
file?

--
Alexandre Fayolle
Chef de Projet
Tel : + 33 (0)4 79 26 57 94

Camptocamp France SAS
Savoie Technolac, BP 352
73377 Le Bourget du Lac Cedex
http://www.camptocamp.com

Revision history for this message
Cristian Salamea (ovnicraft) wrote :

@Alex i tested it in 6.1 and works well in XML but the error is raised when in _description object attribute has any special character, please test it with áéí (i used them commonly).

Regards,

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.