Bug #457466 (psiphon-162) “Generic site compatibility - investig...” : Bugs : psiphon

Adam P (adam+) on 2009-10-29

Changed in psiphon:
status:	New → Confirmed

Revision history for this message

Chris (poser) wrote on 2009-10-30:

#1

There are generalized security concerns around losing Javascript context, right? Allowing a malicious site to have persistent access to any subsequent site visited (during a given browser session?) because Java thinks they're the same site?

tags:

added: poser

Chris (poser) on 2009-10-30

tags:

removed: poser

Rod (rod-psiphon) on 2009-11-24

visibility:

private → public

Revision history for this message

Adam P (adam+) wrote on 2009-12-14:

#2

__Report: Caja__

_Intro_

Caja enables sites that want to safely allow users to add their own rich content (or run site-specific plug-in "apps") using JavaScript+HTML+CSS to do so safely (i.e., prevent XSS, etc.). The site owner can specify restrictions on the allowable JavaScript, which the users/developers must then adhere to. Yahoo's documentation for how it uses Caja -- and what JavaScript restrictions it imposes -- can be found here: http://developer.yahoo.com/yap/guide/caja-support.html

Caja works by "transforming ordinary HTML and JavaScript into a restricted form of JavaScript". See the Yahoo link for details and example of what the rewritten JavaScript looks like.

_JS Function Blacklist_

Our initial intention in looking at Caja was to try to find a list of functions that Caja strips that we could then strip/neutralize in the sites we load. For example, here's a list of JavaScript functions and some handling guidelines:
http://google-caja.googlecode.com/svn/trunk/doc/domado/basicTaming.csv
http://google-caja.googlecode.com/svn/trunk/doc/domado/tamingIntroAppendix.html

However, the list isn't really usable. Some very commonly-used functions (like getElementById) are "ShutOff" and many properties are "readOnly", which we (probably) can't implement. Caja completely rewrites all JavaScript into an object-capability model, so the way they approach the language is much different than the way we would.

_Direct Use of Caja_

Using Caja directly in Psiphon is something that can be investigated, although it will take a lot of work to try, and there's a good chance that it won't be fruitful. The model that Caja was originally designed for (sites like Yahoo and Facebook that can dictate language restrictions to developers, and only for the user-content subsections of those sites) is very different than the environment Psiphon needs to operate in.

For example, with Caja, all JavaScript must be inline. This is very different from how web pages normally work. But perhaps we can separately read and inline all .js files. We might also have to rewrite some JavaScript before handing it off to Caja.

One mitigating factor is that we don't necessarily need all JavaScript to work perfectly all the time. We just need it to work well enough to make sites functional.

__Report: Caja__

_Intro_

Caja enables sites that want to safely allow users to add their own rich content (or run site-specific plug-in "apps") using JavaScript+HTML+CSS to do so safely (i.e., prevent XSS, etc.). The site owner can specify restrictions on the allowable JavaScript, which the users/developers must then adhere to. Yahoo's documentation for how it uses Caja -- and what JavaScript restrictions it imposes -- can be found here: http://developer.yahoo.com/yap/guide/caja-support.html

Caja works by "transforming ordinary HTML and JavaScript into a restricted form of JavaScript". See the Yahoo link for details and example of what the rewritten JavaScript looks like.

_JS Function Blacklist_

Our initial intention in looking at Caja was to try to find a list of functions that Caja strips that we could then strip/neutralize in the sites we load. For example, here's a list of JavaScript functions and some handling guidelines:
http://google-caja.googlecode.com/svn/trunk/doc/domado/basicTaming.csv
http://google-caja.googlecode.com/svn/trunk/doc/domado/tamingIntroAppendix.html

However, the list isn't really usable. Some very commonly-used functions (like getElementById) are "ShutOff" and many properties are "readOnly", which we (probably) can't implement. Caja completely rewrites all JavaScript into an object-capability model, so the way they approach the language is much different than the way we would.

_Direct Use of Caja_

Using Caja directly in Psiphon is something that can be investigated, although it will take a lot of work to try, and there's a good chance that it won't be fruitful. The model that Caja was originally designed for (sites like Yahoo and Facebook that can dictate language restrictions to developers, and only for the user-content subsections of those sites) is very different than the environment Psiphon needs to operate in.

For example, with Caja, all JavaScript must be inline. This is very different from how web pages normally work. But perhaps we can separately read and inline all .js files. We might also have to rewrite some JavaScript before handing it off to Caja.

One mitigating factor is that we don't necessarily need all JavaScript to work perfectly all the time. We just need it to work well enough to make sites functional.

Changed in psiphon:
status:	Confirmed → Fix Released

Revision history for this message

e.fryntov (e-fryntov) wrote on 2009-12-14:

#3

Download full text (4.0 KiB)

function werapping approach report:

Javascript "whitelisting" in psiphon

Goal

Perform URL rewriting in javascript methods that make HTTP requests of their own(javascript:open, XMLHTTPRequet.open, etc.)

Method

Overriding native Javascript methods with our own versions that perform URL rewriting prior to making HTTP request.

Theory

It is impossible to recognize wrapped or obfuscated Javascript methods and arguments without parsing the code with a
Javascript engine. Since writing a good Javascript engine and performing parsing of all event driven methods found in the
HTML document is out of scope of this project we will try to make the client browser's JS engine do the job.

Consider the following piece.

Code:

/*This method will be overridden*/
function foo(yourname)
{
alert('Hello ' + yourname);
}

/*New method that overrides foo()*/
foo = function(yourname)
{
alert('(overide) Hello ' + yourname);
}
</script>

Next time we call foo('John') in our code we'll see "(override) Hello John" popup.
Now, we need to keep a reference to the original method somehow, because we do not want to override the method completely, just add some URL
rewriting routine.

Code:

/*This method will be overridden*/
function foo(url)
{
alert('The URL is '+url);
}

/*keeping the reference to the original foo as base_foo*/
var base_foo = foo;

/*New method that overrides foo()*/
foo = function(url)
{
var new_url = psiphon_rewrite(url);
base_foo(url);
}
</script>

Practice

We've tried this approach with the following websites:

www.youtube.com(upload a video functionality)
it uses a lot of AJAX techniques so it was a good candidate. The XMLHTTPRequest(XHR) was wrapped using
ideas from XMLHTTPRequest.js project. We were able to upload files but the page would show 'Unknown upload errors'.

ozodlik.org(general JS functionality)
We couldn't get search button to work.

Problems analysis

In case of youtube.com some XHRs return responses that are used to dynamically change InnerHTML or other properties of
DOM elements. Sometimes these properties are URLs(divs dynamically changed to imgs).

Further debugging showed dynamic attachment of
JS events to DOM elements via XHR, such as "onload". Combination of the two produced the error described above as the dynamic image couldn't load causing
"onload" event to never fire.

Code:

The ozodlik.org showed a different kind of problem which is a 'document.location' object. In theory location.href is a property of location object, but changing it
causes document to reload with the new URL.

Code:

<input id="keywords0_" name="keywords0_" onkeyp...

function werapping approach report:

Javascript "whitelisting" in psiphon

Goal

Perform URL rewriting in javascript methods that make HTTP requests of their own(javascript:open, XMLHTTPRequet.open, etc.)

Method

Overriding native Javascript methods with our own versions that perform URL rewriting prior to making HTTP request.

Theory

It is impossible to recognize wrapped or obfuscated Javascript methods and arguments without parsing the code with a
Javascript engine. Since writing a good Javascript engine and performing parsing of all event driven methods found in the
HTML document is out of scope of this project we will try to make the client browser's JS engine do the job.

Consider the following piece.

Code:

/*This method will be overridden*/ 
function foo(yourname)
{
    alert('Hello ' + yourname);
}

/*New method that overrides foo()*/
foo = function(yourname)
{
    alert('(overide) Hello ' + yourname);
}
</script>
 
Next time we call foo('John') in our code we'll see "(override) Hello John"  popup.
Now, we need to keep a reference to the original method somehow, because we do not want to override the method completely, just add some URL 
rewriting routine.

Code:

/*This method will be overridden*/ 
function foo(url)
{
    alert('The URL is '+url);
}

/*keeping the reference to the original foo as base_foo*/
var base_foo = foo;

/*New method that overrides foo()*/
foo = function(url)
{
    var new_url = psiphon_rewrite(url);
    base_foo(url);
}
</script>

Practice

We've tried this approach with the following websites:

www.youtube.com(upload a video functionality)
it uses a lot of AJAX techniques so it was a good candidate. The XMLHTTPRequest(XHR) was wrapped using 
ideas from XMLHTTPRequest.js project. We were able to upload files but the page would show 'Unknown upload errors'.

ozodlik.org(general JS functionality)
We couldn't get search button to work.

Problems analysis

In case of youtube.com some XHRs return responses that are used to dynamically change InnerHTML or other properties of
DOM elements. Sometimes these properties are URLs(divs dynamically changed to imgs).

Further debugging showed dynamic attachment of
JS events to DOM elements via XHR, such as "onload". Combination of the two produced the error described above as the dynamic image couldn't load causing
"onload" event to never fire.

Code:

The ozodlik.org showed a different kind of problem which is a 'document.location' object. In theory location.href is a property of location object, but changing it
causes document to reload with the new URL.

Code:

<input id="keywords0_" name="keywords0_" onkeypress="if (event.keyCode == 13) {location.href ='/search/?k=' + encodeURI(this.value);return false;}"...

As we can see there is no method here so it is not clear how to intercept the URL in such a construct if possible at all.

Another potential problem is to recognize and prevent re-wrapping overridden methods by the document itself as it may use something like XMLHTTPRequest.js to wrap
native XMLHTTPRequest for example.

Code:

/*This method will be overridden*/ 
function foo(url)
{
    alert('The URL is '+url);
}

/*keeping the reference to the original foo as base_foo*/
var base_foo = foo;

/*New method that overrides foo()*/
foo = function(url)
{
    var new_url = psiphon_rewrite(url);
    base_foo(url);
}

/*Yet another method that overrides foo()*/
foo = function(url)
{
    dosomething_else();
}
</script>

Conclusion: this approach is a possible but far from full solution in javascript white-listing.

Adam P (adam+) on 2009-12-14

Changed in psiphon:
status:	Fix Released → Confirmed

Adam P (adam+) on 2009-12-14

Changed in psiphon:
importance:	Unknown → Critical
milestone:	none → release2.4

Revision history for this message

Rod (rod-psiphon) wrote on 2009-12-14:

#4

Regarding the security concerns, yes,if arbitrary Javascript from a proxied site is included in the re-written web page, then it can do such things as grab the Psiphon cookie and send it anywhere (in other words, the same origin policy security measure breaks when sites are rewritten and returned in the Psiphon proxy domain). Another example risk is exposing user timezone via Javascript calls.

Here’s an idea. What if we support assigning dedicated in-proxies to the whitelisted-Javascript sites? So when a user enters e.g., suspectsite.com in the bluebar in their regular in-proxy, they are redirected to a sub-in-proxy (separate in-proxy, associated with the regular one, and dedicated to the suspectsite.com domain) with a new session cookie. As long as the new session cookie can’t be mapped back to the regular one, it doesn’t matter if suspectsite.com “steals” the sub-in-proxy Psiphon cookie, as it only grants access to the suspectsite.com domain cookies. So, basically, we reinstate the same origin policy defence, at the cost of expending additional IP addresses per whitelisted-site, per in-proxy.

Revision history for this message

Mark Miller (erights) wrote on 2009-12-16:

#5

Hi,

I'm one of the architects of Caja. We at the Caja project are very pleased you are thinking of Caja for Psiphon. Except for the Yahoo! pages, the other documents you are looking at are quite old. Caja is presently much less restrictive than it would appear from those pages. If I understand your needs correctly, then yes, I think you need Caja to preserve your security while passing through *some* existing JavaScript.

To get a better sense of how well existing web content and websites survive being passed through Caja, please try the Caja sandbox at http://caja.appspot.com/ . URLs are indeed rewritten according to a policy that you (as a Caja container provider) would specify. Cajoled JavaScript would indeed be denied access to the user's Psiphon cookies, unless you explicitly choose to provide it with such access.

Please let us know what we can do to help. Best would be to subscribe to http://groups.google.com/group/google-caja-discuss and post questions there. Good luck with Psiphon!

Revision history for this message

Jasvir Nagra (from-launchpad-nagras) wrote on 2009-12-16:

#6

Adam suggested above that:

> For example, with Caja, all JavaScript must be inline. This is very different from how web pages normally work. But perhaps we

Caja certainly limit webpages to a subset of html and javascript but its a fairly large subset. In particular, there's no need for all javascript to be inlined. One of the cajoler pipeline stages fetches outlined scripts (and css) before rewriting them safely.

Revision history for this message

Rod (rod-psiphon) wrote on 2009-12-16:

#7

Hi Mark and Jasvir,

Thanks for the information. Yes, it would be beneficial for us to get some Javascript working (safely) and it's not necessary for everything to get through.

We tried a selection of our target sites in the Caja sandbox and I think all of them ended up with a compile error. For example, radiofarda.com. Is there an option in the compiler to simply remove the offending lines when there's an error and carry on? Since we don't control the site we want to "cajole", and there are many sites, we're interested in some sort of automated way of getting them into a compiling state. I'm not sure what we'll do about run time errors.

We'll review your docs and mailing list again.

e.fryntov (e-fryntov) on 2010-05-06

tags:	added: category3
summary:	- Site compatibility - investigate white-listing Javascript for certain - sites + Generic site compatibility - investigate white-listing Javascript for + certain sites

Affects		Status	Importance	Assigned to	Milestone
	psiphon	Confirmed	Critical	Unassigned	psiphon 2.4
	2.4	Fix Released	Undecided	Unassigned

psiphon

Generic site compatibility - investigate white-listing Javascript for certain sites

Bug Description

Other bug subscribers

Remote bug watches