BrowserRequest and HTTPRequest contain a mixture of str and unicode strings
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
zope.publisher |
Invalid
|
Medium
|
Unassigned |
Bug Description
The environment in the request contains a mixture of str and unicode strings because some HTTP headers are explicitly not converted to unicode, and sometimes form values can't be decoded with utf-8, so they are left as they are (as a str object).
This causes subtle problems here and there, since Zope3 is said to use unicode internally, no one cares to check if the string is a unicode or str string. This is IMHO the right thing to do, since otherwise we would have to add tons of checks everywhere.
One example of this causing problem is when you want to convert the request to a string. It simply joins the environment strings together, and if any of the str strings contain non-ascii characters it will break since it can't be converted to unicode.
My suggestion is that we always keep the request environment variables as unicode strings, if some header or form value can't be decode it with the default strategy, we should use iso-8859-1 which is the standard encoding if no encoding is given.
Changed in zope3: | |
status: | New → Confirmed |
affects: | zope3 → zope.publisher |
I am curious; in which RFC is it written that iso-8859-1 is a standard encoding for HTTP headers?
The solution you propose is an improvement on the current situation.
I'd like to propose something a little different: how about converting HTTP headers to unicode, (perhaps using an ASCII codec, and replacing unknown characters with '?', perhaps using iso-8859-1), but also keeping the unencoded HTTP headers accessible through an API on the request.
That way, we can add handlers to the request processing that will custom convert particular HTTP headers.