ascii is a bad default filesystem encoding
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Bazaar |
Fix Released
|
High
|
Martin Packman |
Bug Description
bzr's architectural approach is to decode filenames to unicode when they come in from the filesystem. On Unix, to do this, we need to know what encoding is used, since the OS API only works in byte strings.
It seems that often (always?) if no locale is set, we default to trying to decode them in ascii, and fail if the names are not ascii.
Modern Unix machines strongly encourage using UTF-8 and that would be a more reasonable default. We could also provide a way to configure it.
This is distinct from bug 63324, which says that if there are names that really are invalid in the encoding (even if the encoding's set properly) bzr can't represent them.
This might be complicated to implement if Python assumes the fsencoding is set only once at startup, but it's probably still possible.
Related branches
- Vincent Ladeuil: Approve
- Jelmer Vernooij (community): Approve (code)
-
Diff: 204 lines (+71/-16)7 files modifiedbzr (+4/-0)
bzrlib/__init__.py (+42/-0)
bzrlib/osutils.py (+2/-5)
bzrlib/tests/blackbox/test_exceptions.py (+8/-0)
bzrlib/tests/test_osutils.py (+3/-11)
doc/en/release-notes/bzr-2.5.txt (+5/-0)
doc/en/whats-new/whats-new-in-2.5.txt (+7/-0)
Changed in bzr: | |
assignee: | nobody → Martin Packman (gz) |
status: | Confirmed → In Progress |
Changed in bzr: | |
milestone: | none → 2.5b5 |
status: | In Progress → Fix Released |
As noted on unicode.org (sorry don't have time to look it up), it's also very unlikely that a text can be interpreted as valid UTF-8, if it isn't indeed UTF-8. That makes it a somewhat safe default.