Non-ascii in the URL causes an OOPS

Bug #200572 reported by Diogo Matsubara
12
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Launchpad itself
Fix Released
Critical
Tim Penhey

Bug Description

1. Open http://launchpad.dev/ubuntu/hoary/%ED%B4%B5
2. OOPS-796F3143 ProgrammingError ERROR: invalid byte sequence for encoding "UTF8": 0xedb4b5 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding".

2010-08-16 OOPS looks a bit different: OOPS-1689S359

Tags: oops qa-ok

Related branches

Revision history for this message
Diogo Matsubara (matsubara) wrote :

<flacoste> we should tansform that into UnexpectedFormData?
<flacoste> this can happen in a lot of place actually
<Rinchen> +1 for UFDs
<Rinchen> the more we can capture, the better we can diagnose
<kiko> well
<kiko> sure, UFD is fine.
<matsubara> I think it should be a 404
<flacoste> for example, i'm sure http://launchpad.dev/%ED45 would suffer the same fate
<BjornT> i think it should be a 404 as well. it's not a form.
<kiko> if you visit http://launchpad.net/ubuntu/hoary/bar
<kiko> you get a 404
<kiko> so I correct myself
<kiko> 404 -- this is just an artifact of improper utf8 handling in the query?
<matsubara> flacoste: the traceback looks different.
<kiko> matsubara, if you visit https://edge.launchpad.net/ubuntu/hoary/%ee you get no OOPS page, but a Please try again. :)
<matsubara> flacoste: and the one described in the bug report goes directly into the db
<flacoste> normal, it's not the same code path
<flacoste> but the exception should be the same, no?
<matsubara> I mean, the one described in the report triggers a DB ProgrammingError kind of error, while the other triggers the usual UnicodeDecodeError
<flacoste> right,
<stub> We don't have any guards in our FooSet.getByName methods for invalid UTF8, so PG will barf.
<flacoste> i remember adding a ascii encoding to Pillar for tests purpose
<kiko> flacoste, matsubara: can you guys meet up and make a decision later?
<flacoste> actually, there are many places where invalid UTF8 can be introduced
<kiko> this is an issue that really annoys me as I /hate/ nonsense in the OOPS summaries
<stub> A decorator for our byname methods might be cool
<flacoste> stub is volunteering :-)
<stub> I'm bikeshedding
...
<kiko> [ACTION] matsubara, flacoste and stub to discuss UTF-8 safety of byName and other random attacks to launchpad.net
<stub> its safe - PG barfs because it checks properly

Changed in launchpad:
assignee: nobody → flacoste
Revision history for this message
Steve Alexander (stevea) wrote :

We don't want unicode names in URLs anyway. So, let's filter this in the publication code, and offer a 404 (with a subclass of the NotFound exception perhaps) if any path steps are not plain ascii. (When decoded from URL encoding, that is.)

I'm not in favour of decorating byName() calls for this purpose. We should filter URLs coming in.

Revision history for this message
Diogo Matsubara (matsubara) wrote :

As an workaround for this exception clogging the Oops Summaries, I'm going to special case and move it another section in the summary.

Changed in launchpad-foundations:
assignee: flacoste → nobody
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Robert Collins (lifeless) wrote :

Does this need to be private? Seems like no obvious attack vector...

Gary Poster (gary)
Changed in launchpad-foundations:
importance: Medium → Low
Gary Poster (gary)
description: updated
Revision history for this message
Gary Poster (gary) wrote :

If there is an attack vector, this should be a high priority and private. If there's not, it should be low and public. I'm guessing there's not, but I'll ask Stuart to verify.

When we decide to solve this, I'd like to review Steve's assertion that we should only accept plain ascii. I believe I can guess his rationale, and if we already enforce this elsewhere (i.e., in the creation of all elements that are used un the URL) then I'm fine with it.

Revision history for this message
Curtis Hovey (sinzui) wrote :

We are getting dozens of encoding oopses in the web oops reports. I do not see this as any different. I do not think this is a security issue. If there is a chance there is one, let's subscribe William Grant to get his opinion.

Revision history for this message
Stuart Bishop (stub) wrote :

Its an avenue for sending badly encoded data to PostgreSQL. If there are PostgreSQL bugs attacks can be sent through. In the past, there have been PostgreSQL attacks based on encoding (since fixed, and not applicable to our setup anyway).

Ideally, these should all be caught by Apache or Launchpad where we can generate nice error pages instead of OOPS reports, with the side effect that we are not exposed to PostgreSQL security flaws. I don't see why we want the publisher to happily accept path segments we know are crap either. The only character in our URLs that might need quoting is ~.

Revision history for this message
Robert Collins (lifeless) wrote :

@Gary Steve's assertion is correct, we only generate ascii urls, nonascii urls coming in (*excluding the query params*) are definitely made up and can just 404.

For query params, the params have to be ascii (urls are a subset of ascii) but we may want to % decode and then utf8 decode to support searching for folk with internationalised names.

If I were hacking on this I would fix by checking for ascii in the non-query part of the path before doing any traversal.

Revision history for this message
Robert Collins (lifeless) wrote :

High as per zero-oops-policy.

Changed in launchpad-foundations:
importance: Low → High
Changed in launchpad:
importance: High → Critical
Curtis Hovey (sinzui)
summary: - ProgrammingError triggered traversing on distro architecture
+ Non-ascii in the URL causes an OOPS
visibility: private → public
Revision history for this message
Curtis Hovey (sinzui) wrote :

I see two places we could put the sanity check, either as the first action in Navigation.publishTraverse or in Navigation._publishTraverse() in the setFirstLayer() block. I do not think NotFound error page will work without the layer being set first. The other issue is that the NotFound page errors trying to say that the url is invalid.

webapp.url.urlparse ensures the url is forces the encoding to ascii so calling
    urlparse(unquote('https://launchpad.net/CRACK'))
where CRACK is url encoded or raw non-ascii will raise UnicodeDecodeError. We can catch this error and raise a NotFoundError. The NotFound page will need to avoid working with the URL.

The are a few problems with this suggestion. It will cause 404s when param values contain non-ascii ('https://launchpad.net/+search?text=%E9'). We could parse what is before the query string, then parse the params and values individually as ascii and unicode. This might give us a comprehensive solution to bad encoding in URLs.

Tim Penhey (thumper)
Changed in launchpad:
status: Triaged → In Progress
assignee: nobody → Tim Penhey (thumper)
tags: removed: lp-foundations
Revision history for this message
Launchpad QA Bot (lpqabot) wrote :
Changed in launchpad:
milestone: none → 11.04
tags: added: qa-needstesting
Changed in launchpad:
status: In Progress → Fix Committed
William Grant (wgrant)
tags: added: qa-ok
removed: qa-needstesting
William Grant (wgrant)
Changed in launchpad:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.