Comment 2 for bug 457466

Revision history for this message
Adam P (adam+) wrote : Re: Site compatibility - investigate white-listing Javascript for certain sites

__Report: Caja__

_Intro_

Caja enables sites that want to safely allow users to add their own rich content (or run site-specific plug-in "apps") using JavaScript+HTML+CSS to do so safely (i.e., prevent XSS, etc.). The site owner can specify restrictions on the allowable JavaScript, which the users/developers must then adhere to. Yahoo's documentation for how it uses Caja -- and what JavaScript restrictions it imposes -- can be found here: http://developer.yahoo.com/yap/guide/caja-support.html

Caja works by "transforming ordinary HTML and JavaScript into a restricted form of JavaScript". See the Yahoo link for details and example of what the rewritten JavaScript looks like.

_JS Function Blacklist_

Our initial intention in looking at Caja was to try to find a list of functions that Caja strips that we could then strip/neutralize in the sites we load. For example, here's a list of JavaScript functions and some handling guidelines:
http://google-caja.googlecode.com/svn/trunk/doc/domado/basicTaming.csv
http://google-caja.googlecode.com/svn/trunk/doc/domado/tamingIntroAppendix.html

However, the list isn't really usable. Some very commonly-used functions (like getElementById) are "ShutOff" and many properties are "readOnly", which we (probably) can't implement. Caja completely rewrites all JavaScript into an object-capability model, so the way they approach the language is much different than the way we would.

_Direct Use of Caja_

Using Caja directly in Psiphon is something that can be investigated, although it will take a lot of work to try, and there's a good chance that it won't be fruitful. The model that Caja was originally designed for (sites like Yahoo and Facebook that can dictate language restrictions to developers, and only for the user-content subsections of those sites) is very different than the environment Psiphon needs to operate in.

For example, with Caja, all JavaScript must be inline. This is very different from how web pages normally work. But perhaps we can separately read and inline all .js files. We might also have to rewrite some JavaScript before handing it off to Caja.

One mitigating factor is that we don't necessarily need all JavaScript to work perfectly all the time. We just need it to work well enough to make sites functional.