utils: safeunicode error

Bug #435370 reported by SeC
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
web.py
Confirmed
Undecided
Zed A. Shaw

Bug Description

>>> from web.utils import safeunicode
>>> safeunicode('\xff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.6/site-packages/web.py-0.32-py2.6.egg/web/utils.py", line 235, in safeunicode
    return obj.decode(encoding)
  File "/usr/local/lib/python2.6/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xff in position 0: unexpected code byte

This happens for example when form data is uploaded with binary data and there is some exception raised in POST function handler. No full error stacktrace is displayed on web page, instead python is using 100% cpu. How about this simple patch:
--- utils.py.org 2009-09-23 18:18:01.000000000 +0200
+++ utils.py 2009-09-23 18:15:58.000000000 +0200
@@ -232,7 +232,7 @@
     if isinstance(obj, unicode):
         return obj
     elif isinstance(obj, str):
- return obj.decode(encoding)
+ return obj.decode(encoding, 'ignore')
     else:
         if hasattr(obj, '__unicode__'):
             return unicode(obj)

Then:
>>> from web.utils import safeunicode
>>> safeunicode('\xff')
u''
>>> safeunicode('hello\xffworld')
u'helloworld'

Revision history for this message
Anand Chitipothu (anandology) wrote :

That is dangerous. It results in loss of data.

web.input doesn't convert input data to unicode if optional argument _unicode=False is passed. This should be used when there is binary data.

Changed in webpy:
status: New → Won't Fix
Revision history for this message
Zed A. Shaw (zedshaw) wrote :

Hey, this bug should not be Won't Fix...not at all. It turns out that it's trivial to craft a malformed unicode request to web.py and then cause it to have a UTF-8 error and go into an infinite loop. This is a serious DOS bug, and in addition to that the web.input(_unicode=False) doesn't protect against it. Pass that in with a file upload and it still aborts, goes into an infinite loop.

Two things need to happen:

1) Make _unicode=False unecessary. How?
2) Have a try, if you can't decode the unicode, then it's binary and leave it alone in an except.
3) If _unicode=False is given then assume it's binary and shortcircuit so you don't need to above.

Now, as to why you decided this was don't fix when the original poster said clearly that it pegs their CPU at 100% is beyond me, but now let's frame this as a serious DOS attack against your framework and you work from that premise instead.

Changed in webpy:
assignee: nobody → Zed A. Shaw (zedshaw)
status: Won't Fix → Confirmed
Revision history for this message
Zed A. Shaw (zedshaw) wrote :

Here's a rewrite of that function so that it at least continues to operate. Feel free to do something similar that at least won't peg web.py at 100% CPU.

 def safeunicode(obj, encoding='utf-8'):
    r"""
    Converts any given object to unicode string.

        >>> safeunicode('hello')
        u'hello'
        >>> safeunicode(2)
        u'2'
        >>> safeunicode('\xe1\x88\xb4')
        u'\u1234'
    """
    if isinstance(obj, unicode):
        return obj
    elif isinstance(obj, str):
        try:
            return obj.decode(encoding)
        except UnicodeDecodeError:
            # failed to unicode decode, so they get it raw
            return obj
    else:
        if hasattr(obj, '__unicode__'):
            return unicode(obj)
        else:
            try:
                return str(obj).decode(encoding)
            except UnicodeDecodeError:
                # failed to unicode decode, so they get it raw
                return str(obj)

Revision history for this message
Anand Chitipothu (anandology) wrote : Re: [Bug 435370] Re: utils: safeunicode error

> Two things need to happen:
>
> 1) Make _unicode=False unecessary.  How?
> 2) Have a try, if you can't decode the unicode, then it's binary and leave it alone in an except.
> 3) If _unicode=False is given then assume it's binary and shortcircuit so you don't need to above.

I've a fix to avoid unicode conversion of uploaded files. Need to
polish it some more before checking in.

> Now, as to why you decided this was don't fix when the original poster
> said clearly that it pegs their CPU at 100% is beyond me, but now let's
> frame this as a serious DOS attack against your framework and you work
> from that premise instead.

100% CPU issue is because of an error in web.debugerror. It is already
fixed (Sorry, I forgot to inform about that).

http://github.com/webpy/webpy/commit/dea8e77054675245158507171abafa533cf5d1bc

And it is not a DOS attack issue because it is not in the core
framework, but in the debugerror used only during development (which
is anyway fixed).

Anand

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.