UnicodeDecodeError when rebalancing a ring
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Object Storage (swift) |
Medium
|
Christian Schwede |
Bug Description
I had servers under Ubuntu 15.10 wih swift 2.5 where replication seems to work properly.
Since I've upgraded my servers to Ubuntu 16.04 with swift 2.7.0, I see a lot a UnicodeDecodeError each time I rebalance my object ring with a replica 3 policy.
Here is an extract of the centralized logs :
May 9 11:45:53 STACO3 object-replicator: <f+++++++++ 527/7913026d10c
May 9 11:45:53 STACO3 object-replicator: STDERR: Traceback (most recent call last):
May 9 11:45:53 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:53 STACO3 object-replicator: STDERR: msg = self.format(record) + '\000'
May 9 11:45:53 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:53 STACO3 object-replicator: STDERR: return fmt.format(record)
May 9 11:45:53 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:53 STACO3 object-replicator: STDERR: record.message = record.getMessage()
May 9 11:45:53 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:53 STACO3 object-replicator: STDERR: msg = msg % self.args
May 9 11:45:53 STACO3 object-replicator: STDERR: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
May 9 11:45:53 STACO3 object-replicator: STDERR: Logged from file replicator.py, line 207
May 9 11:45:53 STACO3 object-replicator: Suppression partition: /srv/node/
May 9 11:45:53 STACO1 object-replicator: Suppression partition: /srv/node/
May 9 11:45:53 STACO3 object-replicator: <f+++++++++ a32/78f6f82ce45
May 9 11:45:53 STACO3 object-replicator: STDERR: Traceback (most recent call last):
May 9 11:45:53 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:53 STACO3 object-replicator: STDERR: msg = self.format(record) + '\000'
May 9 11:45:53 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:53 STACO3 object-replicator: STDERR: return fmt.format(record)
May 9 11:45:53 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:53 STACO3 object-replicator: STDERR: record.message = record.getMessage()
May 9 11:45:53 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:53 STACO3 object-replicator: STDERR: msg = msg % self.args
May 9 11:45:53 STACO3 object-replicator: STDERR: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
May 9 11:45:53 STACO3 object-replicator: STDERR: Logged from file replicator.py, line 207
May 9 11:45:54 STACO2 object-replicator: Suppression partition: /srv/node/
May 9 11:45:54 STACO3 object-replicator: <f+++++++++ 272/11b44d21141
May 9 11:45:54 STACO3 object-replicator: STDERR: Traceback (most recent call last):
May 9 11:45:54 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:54 STACO3 object-replicator: STDERR: msg = self.format(record) + '\000'
May 9 11:45:54 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:54 STACO3 object-replicator: STDERR: return fmt.format(record)
May 9 11:45:54 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:54 STACO3 object-replicator: STDERR: record.message = record.getMessage()
May 9 11:45:54 STACO3 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:54 STACO3 object-replicator: STDERR: msg = msg % self.args
May 9 11:45:54 STACO3 object-replicator: STDERR: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
May 9 11:45:54 STACO3 object-replicator: STDERR: Logged from file replicator.py, line 207
May 9 11:45:54 STACO2 object-replicator: <f+++++++++ 8c4/7706a79605d
May 9 11:45:54 STACO2 object-replicator: STDERR: Traceback (most recent call last):
May 9 11:45:54 STACO2 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:54 STACO2 object-replicator: STDERR: msg = self.format(record) + '\000'
May 9 11:45:54 STACO2 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:54 STACO2 object-replicator: STDERR: return fmt.format(record)
May 9 11:45:54 STACO2 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:54 STACO2 object-replicator: STDERR: record.message = record.getMessage()
May 9 11:45:54 STACO2 object-replicator: STDERR: File "/usr/lib/
May 9 11:45:54 STACO2 object-replicator: STDERR: msg = msg % self.args
May 9 11:45:54 STACO2 object-replicator: STDERR: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
May 9 11:45:54 STACO2 object-replicator: STDERR: Logged from file replicator.py, line 207
May 9 11:45:54 STACO1 object-replicator: Suppression partition: /srv/node/
Here is the definition of this object ring :
# swift-ring-builder object.builder
object.builder, build version 49
262144 partitions, 3.000000 replicas, 1 regions, 4 zones, 12 devices, 7.97 balance, 0.00 dispersion
The minimum number of hours before a partition can be reassigned is 1 (0:37:13 remaining)
The overload factor is 0.00% (0.000000)
Ring file object.ring.gz is up-to-date
Devices: id region zone ip address port replication ip replication port name weight partitions balance flags meta
0 1 1 10.10.1.51 6000 10.10.2.51 6000 s01z1rpd01 4000.00 70036 2.19
1 1 1 10.10.1.51 6000 10.10.2.51 6000 s01z1rpd02 4000.00 70035 2.19
2 1 1 10.10.1.51 6000 10.10.2.51 6000 s01z1rpd03 4000.00 70034 2.19
3 1 2 10.10.1.52 6000 10.10.2.52 6000 s02z2rpd01 4000.00 70036 2.19
4 1 2 10.10.1.52 6000 10.10.2.52 6000 s02z2rpd02 4000.00 70035 2.19
5 1 3 10.10.1.53 6000 10.10.2.53 6000 s03z3rpd01 4000.00 70036 2.19
6 1 3 10.10.1.53 6000 10.10.2.53 6000 s03z3rpd02 4000.00 70036 2.19
7 1 2 10.10.1.52 6000 10.10.2.52 6000 s02z2rpd03 4000.00 70036 2.19
8 1 3 10.10.1.53 6000 10.10.2.53 6000 s03z3rpd03 4000.00 70036 2.19
9 1 4 10.10.1.54 6000 10.10.2.54 6000 s04z4rpd01 3300.00 52037 -7.97
10 1 4 10.10.1.54 6000 10.10.2.54 6000 s04z4rpd02 3300.00 52037 -7.97
11 1 4 10.10.1.54 6000 10.10.2.54 6000 s04z4rpd03 3300.00 52038 -7.96
clayg (clay-gerrard) wrote : | #1 |
Alistair Coles (alistair-coles) wrote : | #2 |
Suppression partition suggests locale is French:
https:/
which means we'll log this:
https:/
"Succès de Rsync pour %(src)s dans %(dst)s (%(time).03f)"
which could be problematic if one of the format args is unicode:
>>> "Succès de Rsync pour %s" % u'foo'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
Admin6 (jmbonnefond) wrote : | #3 |
This is a locale related problem. The current locales on the servers are french :
root@STACO4:~# locale
LANG=fr_FR.UTF-8
LANGUAGE=
LC_CTYPE=
LC_NUMERIC=
LC_TIME=
LC_COLLATE=
LC_MONETARY=
LC_MESSAGES=
LC_PAPER=
LC_NAME=
LC_ADDRESS=
LC_TELEPHONE=
LC_MEASUREMENT=
LC_IDENTIFICATI
LC_ALL=
I've 4 account/
Here is a new extract of the logs :
May 12 14:14:32 STACO3 object-replicator: Removing partition: /srv/node/
May 12 14:14:33 STACO2 object-replicator: <f+++++++++ c14/b69a972b6a5
May 12 14:14:33 STACO2 object-replicator: Successful rsync of /srv/node/
May 12 14:14:33 STACO3 object-replicator: <f+++++++++ 220/1276697801d
May 12 14:14:33 STACO3 object-replicator: Successful rsync of /srv/node/
May 12 14:14:33 STACO1 object-replicator: <f+++++++++ 00e/19121c65af0
May 12 14:14:33 STACO1 object-replicator: <f+++++++++ 371/19120dab9be
May 12 14:14:33 STACO1 object-replicator: Successful rsync of /srv/node/
May 12 14:14:33 STACO4 object-replicator: <f+++++++++ 0b8/34ca332fe3f
May 12 14:14:33 STACO4 object-replicator: <f+++++++++ 13d/34ca1d594ef
May 12 14:14:33 STACO4 object-replicator: <f+++++++++ 871/34ca369cc1b
May 12 14:14:33 STACO4 object-replicator: STDERR: Traceback (most recent call last):
May 12 14:14:33 STACO4 object-replicator: STDERR: File "/usr/lib/
May 12 14:14:33 STACO4 object-replicator: STDERR: msg = self.format(record) + '\000'
May 12 14:14:33 STACO4 object-replicator: STDERR: File "/usr/lib/
May 12 14:14:33 STACO4 object-replicator: STDERR: return fmt.format(record)
May 12 14:14:33 STACO4 object-replicator: STDERR: File "/usr/lib/
May 12 14:14:33 STACO4 object-replicator: STDERR: record.message = record.getMessage()
May 12 14:14:33 STACO4 object-replicator: STDERR: File "/usr/lib/
May 12 14:14:33 STACO4 object-replicator: STDERR: msg = msg % self.args
May 12 14:14:33 STACO4 object-replicator: STDERR: UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
May 12 14:14:33 STACO4 object-replicator: STDERR: Logged from file replicator.py, line 207
May 12 14:14:33 STACO1...
Christian Schwede (cschwede) wrote : | #4 |
I can confirm the error, and agree with Alistair - the problem occurs on non-english languages when the arg is a unicode. Most likely it happens in other parts of Swift as well.
I'm refactoring the locale tests right now (nearly done) to have something testable, afterwards we need to fix the issue itself.
Changed in swift: | |
assignee: | nobody → Christian Schwede (cschwede) |
status: | New → Confirmed |
Fix proposed to branch: feature/hummingbird
Review: https:/
Changed in swift: | |
status: | Confirmed → In Progress |
Fix proposed to branch: master
Review: https:/
Change abandoned by Christian Schwede (<email address hidden>) on branch: feature/hummingbird
Review: https:/
Reason: Wrong branch.
Brian Cline (briancline) wrote : | #8 |
We are currently this with a 2.7.0 release on a node with LANG=en_US, which hasn't changed since the deployment of the last version we ran. However we see it in a much more problematic case: if a single object-server has services offline or is down, and Swift tries to log the corresponding error for it, any user-facing PUT request that attempts to touch that node fails with a 500, rather than gracefully trying a handoff node.
Un-mangled stack trace is at https:/
swift-proxy: ERROR Unhandled exception in request:
Traceback (most recent call last):
File "/usr/local/
return handler(req)
File "/usr/local/
return func(*a, **kw)
File "/usr/local/
req, data_source, nodes, partition, outgoing_headers)
File "/usr/local/
outgoing_
File "/usr/local/
conns = [conn for conn in pile if conn]
File "/usr/local/
return self.waiters.
File "/usr/local/
return self._exit_
File "/usr/local/
return hubs.get_
File "/usr/local/
return self.greenlet.
File "/usr/local/
result = function(*args, **kwargs)
File "/usr/local/
_('Expect: 100-continue on %s') % path)
File "/usr/local/
**kwargs)
File "/usr/local/
call('%s: %s' % (msg, emsg), *args, **kwargs)
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
self.
File "/usr/lib/
hdlr.
File "/usr/lib/
self.
File "/usr/lib/
msg = self.format(record) + '\000'
File "/usr/lib/
return fmt.format(record)
File "/usr/local/...
Brian Cline (briancline) wrote : | #9 |
Sorry, I should clarify -- it doesn't fail *ANY* PUT request, but seemingly only ones with URLs that get urllib.unquote'd to actual Unicode characters.
So we actually noticed this bug while running Swift functests against a freshly upgraded proxy. Specifically, test.functional
Updated the gist showing the func test failure and proxy log line in addition to just the stacktrace. (https:/
clayg (clay-gerrard) wrote : | #10 |
seems to be blowing up here:
I'd say try to set the LANG on all nodes to LANG=en_US.UTF-8
But of course I'd also probably say you should rewrite gettext_ in swift/__init__.py as the identity function...
Brian Cline (briancline) wrote : | #11 |
Sadly, no change in behavior with any of these:
- applying patch 323950
- changing gettext_ to just return whatever it was passed in
- setting system locale to en_US.UTF-8
- explicitly setting LANG/LC_
After digging deeper, the exception I get isn't occurring because the translation string templates have unicode in them (the ones I'm using don't) -- it's occurring because the str it is trying to expand into the already-translated template is just that -- a str instead of a unicode type. This explains why 323950 doesn't apply to my issue.
Since my exception occurs at a different point I've opened bug #1597210 for it.
clayg (clay-gerrard) wrote : | #12 |
and some of the args come from the ring devices - so they're unicode.
Changed in swift: | |
importance: | Undecided → High |
clayg (clay-gerrard) wrote : | #13 |
I think this one might be related too -> https:/
clayg (clay-gerrard) wrote : | #14 |
It might be worth noting, that in the *other* critical bug related to this issue - the translation string was already unicode [1] ... so then the fix was to take the bytestring arguments and turn them into unicode.
In the suggested change [2] we're going out of way to ensure that we *always* get unicode out of gettext.
So going forward *anytime* we try to encode non-ascii bytes into into a translated string we're going to blow up regardless of the locale (in lp bug #1597210 - the locale didn't matter either - just the presence of non ascii bytes in the arguments to the unicode translation message).
So rather than fixing some "root cause" - making all translated gettext messages unicode will in someways make unicode errors *more* likely!
Bytes to bytes would always "just work" - so I think it's funny that the best path forward is probably to introduce *more* unicode so that we can't cheat - but it's very much dependent on our tests helping us out to ensure that when we try to introduce non-ascii utf-8 bytestring sources it run through the real machinery! :D
1. because of the additional unicode ring arguments interpolated into the utf-8 bytes coming out of the translation function forced the entire string to become unicode https:/
Andrey Groshrev (greenx) wrote : | #15 |
To some extent, this bug intersects with the recently described me:
https:/
Maybe my decision in your case, too, can help:
>> 2. Try get codec from gettext and change finction gettext_ in swift/__init__.py, as
def gettext_(msg):
return _t.gettext(
clayg (clay-gerrard) wrote : | #16 |
I think this is the same bug as lp bug #1648082
We should drop the translation on this message
Changed in swift: | |
importance: | High → Medium |
Change abandoned by Christian Schwede (<email address hidden>) on branch: master
Review: https:/
looks like some sort of log message coming from rsync? you might try changing your locale options on the server and restarting services? LC_ALL=C maybe?
Even if that fixes it, it's probably still a bug - but knowing what the default locale on the system was might make it easier to reproduce the failure.