Generic site compatibility - investigate white-listing Javascript for certain sites

Bug #457466 reported by root
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
psiphon
Confirmed
Critical
Unassigned
2.4
Fix Released
Undecided
Unassigned

Bug Description

from email:

            As we discussed the other day, we will schedule a task to investigate the possibility of not stripping javascript from certain white-listed sites.

            With our whitepaper/design doc explicitly saying we're not an anonymity service, our primary concern with white-listed but broken javascript (e.g., URLs aren't rewritten) is functionality, not IP leakage. So a 95% success rate, for example, is probably acceptable. Whereas with anonymity, any real risk of exposure is bad.

            What happens if we just don't strip Javascript from Facebook? How bad it is? What if we did rewriting of "static" URLs that we find in the code, as we do with HTML?

            We discussed URL encoding. If we take that out, we might have better luck with static rewriting since relative links may work. With URL encoding, relative links will certainly not work.

            We discussed injecting "some code" at the top of the Javascript that hooks into calls that request URLs so we could rewrite dynamic links. What sorts of tools might help us here? Consider some Javascript "sandbox" tools. For example, Google Caja runs Javascript in a sandbox and proxies URL requests, or so it claims: http://code.google.com/p/google-caja/

Tags: category3
Adam P (adam+)
Changed in psiphon:
status: New → Confirmed
Revision history for this message
Chris (poser) wrote :

There are generalized security concerns around losing Javascript context, right? Allowing a malicious site to have persistent access to any subsequent site visited (during a given browser session?) because Java thinks they're the same site?

tags: added: poser
Chris (poser)
tags: removed: poser
Rod (rod-psiphon)
visibility: private → public
Revision history for this message
Adam P (adam+) wrote :

__Report: Caja__

_Intro_

Caja enables sites that want to safely allow users to add their own rich content (or run site-specific plug-in "apps") using JavaScript+HTML+CSS to do so safely (i.e., prevent XSS, etc.). The site owner can specify restrictions on the allowable JavaScript, which the users/developers must then adhere to. Yahoo's documentation for how it uses Caja -- and what JavaScript restrictions it imposes -- can be found here: http://developer.yahoo.com/yap/guide/caja-support.html

Caja works by "transforming ordinary HTML and JavaScript into a restricted form of JavaScript". See the Yahoo link for details and example of what the rewritten JavaScript looks like.

_JS Function Blacklist_

Our initial intention in looking at Caja was to try to find a list of functions that Caja strips that we could then strip/neutralize in the sites we load. For example, here's a list of JavaScript functions and some handling guidelines:
http://google-caja.googlecode.com/svn/trunk/doc/domado/basicTaming.csv
http://google-caja.googlecode.com/svn/trunk/doc/domado/tamingIntroAppendix.html

However, the list isn't really usable. Some very commonly-used functions (like getElementById) are "ShutOff" and many properties are "readOnly", which we (probably) can't implement. Caja completely rewrites all JavaScript into an object-capability model, so the way they approach the language is much different than the way we would.

_Direct Use of Caja_

Using Caja directly in Psiphon is something that can be investigated, although it will take a lot of work to try, and there's a good chance that it won't be fruitful. The model that Caja was originally designed for (sites like Yahoo and Facebook that can dictate language restrictions to developers, and only for the user-content subsections of those sites) is very different than the environment Psiphon needs to operate in.

For example, with Caja, all JavaScript must be inline. This is very different from how web pages normally work. But perhaps we can separately read and inline all .js files. We might also have to rewrite some JavaScript before handing it off to Caja.

One mitigating factor is that we don't necessarily need all JavaScript to work perfectly all the time. We just need it to work well enough to make sites functional.

Changed in psiphon:
status: Confirmed → Fix Released
Revision history for this message
e.fryntov (e-fryntov) wrote :
Download full text (4.0 KiB)

function werapping approach report:

Javascript "whitelisting" in psiphon

Goal

Perform URL rewriting in javascript methods that make HTTP requests of their own(javascript:open, XMLHTTPRequet.open, etc.)

Method

Overriding native Javascript methods with our own versions that perform URL rewriting prior to making HTTP request.

Theory

It is impossible to recognize wrapped or obfuscated Javascript methods and arguments without parsing the code with a
Javascript engine. Since writing a good Javascript engine and performing parsing of all event driven methods found in the
HTML document is out of scope of this project we will try to make the client browser's JS engine do the job.

Consider the following piece.

Code:

 <script>

/*This method will be overridden*/
function foo(yourname)
{
    alert('Hello ' + yourname);
}

/*New method that overrides foo()*/
foo = function(yourname)
{
    alert('(overide) Hello ' + yourname);
}
</script>

Next time we call foo('John') in our code we'll see "(override) Hello John" popup.
Now, we need to keep a reference to the original method somehow, because we do not want to override the method completely, just add some URL
rewriting routine.

Code:

 <script>

/*This method will be overridden*/
function foo(url)
{
    alert('The URL is '+url);
}

/*keeping the reference to the original foo as base_foo*/
var base_foo = foo;

/*New method that overrides foo()*/
foo = function(url)
{
    var new_url = psiphon_rewrite(url);
    base_foo(url);
}
</script>

Practice

We've tried this approach with the following websites:

www.youtube.com(upload a video functionality)
it uses a lot of AJAX techniques so it was a good candidate. The XMLHTTPRequest(XHR) was wrapped using
ideas from XMLHTTPRequest.js project. We were able to upload files but the page would show 'Unknown upload errors'.

ozodlik.org(general JS functionality)
We couldn't get search button to work.

Problems analysis

In case of youtube.com some XHRs return responses that are used to dynamically change InnerHTML or other properties of
DOM elements. Sometimes these properties are URLs(divs dynamically changed to imgs).

Further debugging showed dynamic attachment of
JS events to DOM elements via XHR, such as "onload". Combination of the two produced the error described above as the dynamic image couldn't load causing
"onload" event to never fire.

Code:

<script>
.............
      xmlHttp.onreadystatechange=function()
     {
         if(xmlHttp.readyState==4)
         {
            var imgholder = document.getElementById("imgholder");
            var img = document.createElement('img');
            img.onload = function (e) {
                dosomething_onload();
            }
            img.onerror = function(e) {
                report_error(e);
            }
            img.src = xmlHttp.responseText;
            imgholder.appendChild(img);
        }
      }
.........
 </script>

The ozodlik.org showed a different kind of problem which is a 'document.location' object. In theory location.href is a property of location object, but changing it
causes document to reload with the new URL.

Code:

<input id="keywords0_" name="keywords0_" onkeyp...

Read more...

Adam P (adam+)
Changed in psiphon:
status: Fix Released → Confirmed
Adam P (adam+)
Changed in psiphon:
importance: Unknown → Critical
milestone: none → release2.4
Revision history for this message
Rod (rod-psiphon) wrote :

Regarding the security concerns, yes,if arbitrary Javascript from a proxied site is included in the re-written web page, then it can do such things as grab the Psiphon cookie and send it anywhere (in other words, the same origin policy security measure breaks when sites are rewritten and returned in the Psiphon proxy domain). Another example risk is exposing user timezone via Javascript calls.

Here’s an idea. What if we support assigning dedicated in-proxies to the whitelisted-Javascript sites? So when a user enters e.g., suspectsite.com in the bluebar in their regular in-proxy, they are redirected to a sub-in-proxy (separate in-proxy, associated with the regular one, and dedicated to the suspectsite.com domain) with a new session cookie. As long as the new session cookie can’t be mapped back to the regular one, it doesn’t matter if suspectsite.com “steals” the sub-in-proxy Psiphon cookie, as it only grants access to the suspectsite.com domain cookies. So, basically, we reinstate the same origin policy defence, at the cost of expending additional IP addresses per whitelisted-site, per in-proxy.

Revision history for this message
Mark Miller (erights) wrote :

Hi,

I'm one of the architects of Caja. We at the Caja project are very pleased you are thinking of Caja for Psiphon. Except for the Yahoo! pages, the other documents you are looking at are quite old. Caja is presently much less restrictive than it would appear from those pages. If I understand your needs correctly, then yes, I think you need Caja to preserve your security while passing through *some* existing JavaScript.

To get a better sense of how well existing web content and websites survive being passed through Caja, please try the Caja sandbox at http://caja.appspot.com/ . URLs are indeed rewritten according to a policy that you (as a Caja container provider) would specify. Cajoled JavaScript would indeed be denied access to the user's Psiphon cookies, unless you explicitly choose to provide it with such access.

Please let us know what we can do to help. Best would be to subscribe to http://groups.google.com/group/google-caja-discuss and post questions there. Good luck with Psiphon!

Revision history for this message
Jasvir Nagra (from-launchpad-nagras) wrote :

Adam suggested above that:

> For example, with Caja, all JavaScript must be inline. This is very different from how web pages normally work. But perhaps we

Caja certainly limit webpages to a subset of html and javascript but its a fairly large subset. In particular, there's no need for all javascript to be inlined. One of the cajoler pipeline stages fetches outlined scripts (and css) before rewriting them safely.

Revision history for this message
Rod (rod-psiphon) wrote :

Hi Mark and Jasvir,

Thanks for the information. Yes, it would be beneficial for us to get some Javascript working (safely) and it's not necessary for everything to get through.

We tried a selection of our target sites in the Caja sandbox and I think all of them ended up with a compile error. For example, radiofarda.com. Is there an option in the compiler to simply remove the offending lines when there's an error and carry on? Since we don't control the site we want to "cajole", and there are many sites, we're interested in some sort of automated way of getting them into a compiling state. I'm not sure what we'll do about run time errors.

We'll review your docs and mailing list again.

e.fryntov (e-fryntov)
tags: added: category3
summary: - Site compatibility - investigate white-listing Javascript for certain
- sites
+ Generic site compatibility - investigate white-listing Javascript for
+ certain sites
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.