Comment 17 for bug 1006725

Revision history for this message
Christopher Yeoh (cyeoh-0) wrote :

I think the tempest test is bogus.

For starters the test name:

invalid_name = data_utils.rand_name(u'\xc3\x28')

is a unicode string, not utf-8.

"\xc3\x28" is indeed an invalid utf-8 string but the tempest infrastructure prevents us from sending invalid utf-8 strings. At the lower levels a json.dumps is done on the post body. JSON represents strings as a sequence of unicode characters (http://tools.ietf.org/html/rfc4627.html). Therefore it attempts to convert utf-8 (which is the default encoding for dumps) to unicode characters and fails if it seens an invalid utf-8 string.

You can see this if you remove the u prefix from the tempest test which then makes it correctly pass an invalid utf-8 string.

The reason I believe the fundamental issue of sending an invalid utf-8 string is not actually valid is that we accept JSON on the nova API side. Again, JSON strings are just a sequence of unicode characters and we can always convert that to valid UTF-8. If the JSON string itself is not a sequence of unicode characters then the json decoding will fail in wsgi json deserialization and the user will be returned a 400. It is possible to directly inject an invalid utf-8 string in a unittest but that won't happen in real life with the json requests. Now it is probably possible to send an invalid utf-8 string via XML but we're not supporting that for long anyway.

I'll propose a couple of patches:
- remove the test_create_image_specify_multibyte_character_image_name tempest test
- add a check in Nova so if a non valid multibyte utf-8 string is passed it rejects it.

I did discover glance does 500 if you send it a unicode string which request a 4 byte utf-8 because thats all the mysql's utf8 supports (you need to use utf8mb4 instead). But I think that should be addressed separately. And the nova side fix will depend on what the glance team decides to do.