On 07/19/2013 05:32 PM, Cedders wrote: > > Thanks for the reply. By the way, it was you who suggested this > approach, and I still think you were right back then! I know, but that was almost 6 years ago, and there are issues with that approach. > Firstly, according to http://wiki.python.org/moin/DefaultEncoding, > sys.getdefaultencoding() is pretty much deprecated and will be removed > in Python 3.0 (as you say "Python's default encoding is ascii regardless > of locale"). True, but this is Mailman 2.1 and Python 2.x and Mailman 2.1 will never be made compatible with Python 3. > Secondly, I don't think the input to sync_members should > be interpreted as a 7-bit message header with possibly RFC 2047 > encoding. I didn't say it should be. I said that the return from email.Utils.formataddr() should be 7-bit ascii, but that would make for an ugly report, particularly if things were RFC 2047 base-64 encoded. > Finally, yes, modifying site.py as you describe does fix both problems > (with or without the patch), but in practice are most sysadmins likely > to do that? If they fail to m odify it, should sync_members crash? And > what if for some reason the system locale changes to, eg iso-8859-1? If you enable the locale encoding in site.py, it gets the encoding from local.getdefaultlocale() so it should be locale aware. If you go the sitecustomize.py route, you can use something like this (adapted from site.py) import sys import locale loc = locale.getdefaultlocale() if loc[1]: sys.setdefaultencoding(loc[1]) > On > a site with a UTF-8 encoding, as I unders tand it, all this > functionality does is convert from utf-8 to utf-8. There is a per-list > encoding, as might be useful on a non-unicode system hosting lists in > both ISO-8859-5 and ISO-8859-1, but as far as I can see, the list > encoding is not taken into account in the command-line scripts. That's true, but the encoding for the list's language might not be compatible with the encoding for the console that's running sync_members or list_members. > I did wonder if assigning > enc = locale.getdefaultlocale()[1] or locale.getpreferredencoding() or "UTF8" > within the script would help (outputting to correct encoding for console), but it doesn't; as you say it's the implied decode on the output of formataddr and join that is not seen as a Unicode string. Logically perhaps it should first be decoded from the input encoding and re-encoded as enc, the expected encoding in the system locale; but that's equivalent to doing nothing. It's really a can of worms. Dropping the encode() is probably fine most of the time, but we really don't know what the encoding is for the input to sync_members. It could be different from and incompatible with the default for the locale. > If the defaultencoding approach were to be implemented in Python in > future in a way that doesn't cause this problem (beyond being applied in > concatenation and join), then encoding the strings from (for example) an > ISO-8859-5 to give legible output on a UTF-8 console would be the way to > go. But it doesn't look to me like that is the way the wind is blowing. But how do I know that the input to sync_members and hence the output from email.Utils.formataddr() is iso-8859-5 (or whatever it is) encoded? I understand that there's an issue and that modifying site.py or adding something to sitecustomize.py is not a solution that is viable for all. I'm just reluctant to open this can of worms. -- Mark Sapiro