Web monitor should use modified headers

Bug #202243 reported by beerfan
2
Affects Status Importance Assigned to Milestone
specto (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: specto

I'm using Specto 0.2.2 in Ubuntu Gutsy.

According to the google specto docs, the web monitor uses file size (presumably the "content-length" HTTP header but perhaps it checks the string length internally) to know if a page has changed. Ideally, the "Last-Modified" HTTP header should be used to determine if a page has been modified. This is because pages which contain, for example, a table of data which doesn't change in size but changes in value may not have a different content length but should be returned with a proper "Last-Modified" header.

Further, it is recommended to use the "If-Modified-Since" header to minimize data transfer. Use of the ETag header may be necessary for cases involving proxies and caching servers.

References:
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.29 (Last-Modified)
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.25 (If-Modified-Since)

http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.3.4

Tags: web
Revision history for this message
Jeff Fortin Tam (kiddo) wrote :

HI !
For what it's worth, Specto actually already uses this, I *think*. In the 0.2.2 series, if you look at the code in watch_web_static.py, you can see in lines 109-121:

        if (self.cached == 1) or (os.path.exists(self.cacheFullPath_)):
            self.cached = 1
            f = file(self.cacheFullPath_, "r")# Load up the cached version
            self.infoB_ = HTTPMessage(f)
            if self.infoB_.has_key('last-modified'):
                request.add_header("If-Modified-Since", self.infoB_['last-modified'])
            if self.infoB_.has_key('ETag'):
                request.add_header("If-None-Match", self.infoB_['ETag'])
        try:
            response = urllib2.urlopen(request)
        except (urllib2.URLError, BadStatusLine), e:
            self.error = True
            self.specto.logger.log(_("Watch: \"%s\" has error: ") % self.name + str(e), "error", self.__class__)

However,
- the code might not be elegant
- the code/logic might be wrong (after all, I took a long time doing it and I'm not sure I did it properly)

I don't know if that is conforming to your suggestions, or if you meant that
- some piece is missing?
- something is not working properly?

Also, I think the etag headers may not work properly with websites that use advertising/dynamic content, so, if I remember correctly my own code, the "error margin" (difference percentage based on file sizes) would override it.

Revision history for this message
Daniel T Chen (crimsun) wrote :

Is this symptom still reproducible in 8.10 or 9.04?

Changed in specto:
status: New → Incomplete
Revision history for this message
Pedro Villavicencio (pedro) wrote :

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to New. Thanks again!.

Changed in specto:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.