caching for file and image assets

Bug #100620 reported by Martijn Faassen on 2003-07-04
16
Affects Status Importance Assigned to Milestone
Silva
High
Sylvain Viollon
Nominated for 1.6 by Gerry C.
Nominated for 2.0 by Gerry C.
Nominated for 2.1 by Gerry C.
2.3
High
Sylvain Viollon

Bug Description

It should be possible to at least set HTTP cache headers for File and
Image assets in Silva, perhaps from the UI. It would also be useful
to explore how to integrate Zope's RAM caching with Silva.

Martijn Faassen (faassen) wrote :

Godefroid, could you look into this one?

Godefroid Chapelle (gotcha) wrote :

which user is targeted by setting the cache headers ?

IOW do we just want a string field to input the value of the Cache-control
header ?
or do we want an interface building it from multiple controls ?

Martijn Faassen (faassen) wrote :

Hm, this is just a preliminary investigation issue. I'm not entirely sure how
this is supposed to work. We would like to enable the use of various caching
headers, and I imagine some setting on the file object that can be done by
the author (or perhaps only editor and up?) would be useful.

Another option would instead to use a policy per publication, to be set
by the (chief) editor.

Yet another option would be such a policy to be set only by the manager. I
wouldn't mind if only managers could set cache policies.

This issue just needs to be explored. On the mailing list
someone was asking about it, but I cannot find quickly
now what this was on which list. :)

Martijn Faassen (faassen) wrote :

Moving along to 0.9.4.

Kit Blake (kitblake) wrote :

When checking the cacheability of several Silva sites, it seems Silva images don't validate properly for checking expiration. Images in Zope do validate properly, but those in Silva get a "validation returned same object" warning. Using LiveHTTPHeaders, this does seem to be the case. Instead of a 304, the whole image is returned. This should be corrected.

Use:
http://www.ircache.net/cgi-bin/cacheability.py

Changed in silva:
importance: Low → High
Flynt (flyntle) wrote :

I am positively surprised that this thread is taken up again. The thread is from 2003 and actually we have been complaining a lot about the problems with caching in Silva again and again, also later in 2005 ( https://issues.ethz.ch/silva/0366 ). I have been trying to connect Martijn and Bengt at that time, but that ended in ideologicall fighting and brought no practical remedy. There IS a big problem with caching in Silva, meaning that it would be nice to get at least suitable headers and behavement towards a reverse caching proxy.

Now, astonishing is the link given as a new relevation: this was made known to Infrae years ago during the arguments e.g. in
https://issues.ethz.ch/silva/0366 from 2005.
(see the link there to http://www.web-caching.com/mnot_tutorial/how.html from where you actually get to http://www.mnot.net/cacheability/ and where you also find the link to http://www.ircache.net/cgi-bin/cacheability.py). We used that caching test machine among others already in 2005 to tell Infrae our findings. We have actually installed one here at ETH, too.

As then there was not really a reaction to our complaints, we thought, well, not helpful enough to do something. However it seems to me now, that things have not been really checked.

So let's hope, that the subject is now taken seriously at last. Looking forward to the results,

Greetings, Flint

Changed in silva:
assignee: nobody → aaltepet
Andy Altepeter (aaltepet) wrote :

The solution for: https://bugs.edge.launchpad.net/silva/+bug/100765
I'm thinking just setting up Silva Files and Images to be tied to a 'service_asset_cache_manager' would be mostly sufficient. Policies could be managed "per publication" by placing a new "service_asset_cache_managers" in that publication.

Or, this could be managed by metadata...a metadata list field with a TALES default that lists all cache managers available up the tree. The default selected would be the asset cache mgr.
Pro's:
i) can still set policies per container
ii) has fine-grained "per-asset" capabilities, given the policies are defined
Con's:
x) Whenever the metadata set is rendered this TALES will walk up the tree looking for cache managers
xx) This walkup includes Silva xml exports (what are the ramifications of this?), and Silva xml exports include public rendering of content.
I think this Con is true, but it would need to be verified.

Eric Casteleijn (thisfred) wrote :

re: the cons:

We could look, in the trunk, at changing the way services are looked up. I think recent versions of Five would let us do this in the Zope 3 way, looking them up by interface. (not 100% sure) This would be an improvement over acquisition based look up, but it would require some changes to the core and most extensions, so if we want to do this, most likely we would need to do it in the context of a project that benefits from it.

xx I don't really understand. You mean the tales expression is evaluated during export, and whether that is a problem? I should be able to test this relatively easily, since I probably have the most intimate knowledge of the xml export/import system, so if you have an idea of how to implement this part of the behavior for testing on a branch without spending too much time, I suggest you do it and let me know, and I can look at possible implications.

Andy Altepeter (aaltepet) wrote :
Download full text (3.5 KiB)

OK, I've implemented caching support for Silva (using Zope Cache Managers)
There is an accelerated http cache manager added on silva install (or refresh),
called service_asset_cache_manager. This caching policy can be set per
container (and inherited by objects within that container), or set by
Silva objects implementing caching support (currently files and images). The
policies are set in the properties tab.

Since not all Silva objects implement caching support, I've created a new
metadata set "silva-caching", with one field "cache_manager". This set is
now mapped to Folder, Root, Publication, File and Image. I created a cacheable
mixin class that SilvaObject now extends from, which extends from Cacheable and
contains two helper methods:
validate_cache_manager: This needs to be called before the cache manager
 is resolved when rendering the content, so that the setting from the metadata
 can be applied. Changes the cache_manager property ARE NOT applied until
 this method is called!
get_cache_managers:
 This method returns the values for the cache_manager property.

I would have preferred not extending SilvaObject with another mixing class,
especially when not all SilvaObjects take advantage of this. Here are some
reasons why this was necessary:
1) In order to specify container policies, containers need to have the
   silva-caching metadata set. This set _needs_ a helper method in order
   to populate the fields. Adding this method to a mixin and using it
   to extend the base SilvaObject provides this functionality TO ALL containers
   without having to alter the definitions for each container
2) Putting this support in SilvaObject provides a path for any Silva object
   to easily support caching...
Had this not been used to extend SilvaObject, each class that is either a
container which could contain cacheable objects, or an object supporting caching
would need to extend this mixin class.

So, how to use:
1) You can specify a cache manager for a container, and this policy
   is then acquired / inherited by all contained objects
2) For an object supporting caching (i.e. one having the silva-caching
   metadata set): if no manager is set for the object, it inherits the
   manager setting from the conainer. It can specify not to use
   caching, or it can specify it's own cache
3) Since the cache manager is inherited, you can "override" a policy
   for a cache manager for a container by adding a cache manager with
   the same id in that container.
   E.g. The default "file and image cm" is added in the Silva root with
   id "service_asset_cache_manager". You could set this cm in the Silva root,
   so it is acquired by all objects implementing caching. In a sub-folder,
   if you want a different cache manager, or a different settings, you can
   add a cache manager with the same id, and objects within that container
   will use this nearer cache manager.

Eric: I tested the silva export, and it does not lookup the cache managers
      during the export, so this is no longer a 'con'

Also: there was mention in this issue about allowing only Editor+ to change
the cache polic...

Read more...

Changed in silva:
status: Confirmed → In Progress
Eric Casteleijn (thisfred) wrote :

Jasper: I've done some initial testing with this branch, and couldn't find anything wrong with it :) Can you maybe test this with the set-up you created for WUW, and see whether it would solve their problems?

Changed in silva:
assignee: aaltepet → jasper-infrae
Andy Altepeter (aaltepet) wrote :

Also added some functional tests for this, test_asset_caching

Andy Altepeter (aaltepet) wrote :

Kit says it's possible to make metadata sets appear in the settings sub-tab. I wonder if this caching metadata set should appear there? This isn't content metadata...

Kit Blake (kitblake) wrote :

Sounds like the caching settings should be in the settings screen. Add <category>layout</category> to the set xml.
Jasper, this is assigned to you for a test drive.

Andy Altepeter (aaltepet) wrote :

I changed the set category to layout, this now appears in the settings tab of images, files, and containers.

It occurs to me that:
1) we do want caching for files and images if the user is authenticated and (publically) viewing the site
BUT
2) we don't want caching for images (in particular) if you're in the SMI. E.G. if you are editing a Silva Document and have the web format of an image in the document...the web format will be cached in the browser by the cache manager settings.

This case may need to be resolved prior to the merge into trunk.

Wow, nice work Andy. this really works great.
When testing the branch I did find it a bit confusing that the Silva root object is set to
'acquire cache settings', since there is nothing to acquire from. Maybe this could be changed into 'no caching' to make it a bit more explicit.
Also, when you create an accelerator with the same id in a sub folder, the title is shown, but if there is no title you end up with an empty list item. Maybe it could show the id, if no title was found.
But these are just minor annoyances. I think this is a very nice addition to Silva.

Andy Altepeter (aaltepet) wrote :

Good suggestion Jasper. When silva is installed (or refreshed), the silva root's cache policy is set to "No Caching". Also, Cacheable.get_cache_managers will use the cache manager's ID if the Title==None.

What do you think about this comment:
https://bugs.launchpad.net/silva/+bug/100620/comments/15
?

I think that a new cache manager object would need to be created to support this, like an AcceleratedSilvaHTTPCacheManager.

Changed in silva:
assignee: jasper-infrae → aaltepet
Andy Altepeter (aaltepet) wrote :

I have thought of an even better way to handle this, and fully from the SMI! First, I implemented a SilvaHTTPCacheManager, to change the behavior of the AcceleratedHTTPCacheManager so that authenticated requests to the public view are cached, except when the HTTP_REFERER is the SMI.

I think I'm going to try adding a blueprint for this, as it may be more ambitious than 2.1. My idea is as follows:

The silva-caching metadata set shouldn't map make associations with Cache Managers, where the cache policy is set through the ZMI. Instead, the silva-caching set should contain properties for the cache policy and there should be one http_cache_manager for _all_ silva objects.

It should be possible to have a cache policy for each base Silva interface:
ISilvaObject (e.g. a 'global' policy)
IContainer
IAsset
IContent
IVersionedContent

The cache policy for each interface is the number of seconds to cache (-1 disables caching).

Additionally, there will be a "Notify URLs (via PURGE)" property, to support purging intermediate (e.g. SQUID) caching proxies.

The custom Silva HTTP Cache Manager I wrote will be adjusted to look at the metadata settings for the appropriate interface.

There is still the problem that some types of content (especially versionedcontent, e.g. SilvaDocuments) shouldn't be cached at all. An example is a SilvaDocument with a dynamic code source. There is already an 'is_cacheable" method defined on ISilvaObject, but this method's symantics seem to indicate 'in-memory' caching. HTTP Caching is different, in that the cache isn't stored in memory on the server (to reduce rendering time), but the result is stored in a user-agent's cache or a caching proxy.

I propose a new method is_http_cacheable be added to ISilvaObject. This method could by default fallback to returning the value if is_cacheable. But, this isn't always what we want...SilvaDocuments with non-cacheable external sources perhaps should be cacheable in the browser...I think more thought will need to go into this part.

So, I'm going to add the above as a blueprint, and after I merge the Silva-File-Image-Caching-100620 back into the Trunk, I will mark this as 'fix released' (for Silva 2.1)

Andy Altepeter (aaltepet) wrote :

on second thought, I think any merges back into trunk should wait until the blueprint is implemented (in this branch). Since the silva-content metadata will be changed (removing the cache-manager), I don't know how well the Silva Metadata system could handle it. So, it's prob. best to wait until a bigger component of the blueprint is implemented...at least to the extents that the notify urls (via purge) and ISilvaObject/IAsset settings work (so this particular 'bug', file and image assets, will be addressed.

Sylvain Viollon (thefunny) wrote :

Image and Files (assets) have now correct HTTP caches headers, and properly implement If-Last-Modified-Since. That is tested.

Changed in silva:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers