Ibid

[url] Don't try and build an etree out of non-html

Bug #454768 reported by Stefano Rivera on 2009-10-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ibid	Fix Released	High	Stefano Rivera	Ibid 0.1.0 "Hazel"

Bug Description

Some digging through memory logs shows huge spikes of "Element" objects causing memory to be allocated.

These correlate strongly to the presence of .jpg URLs in the logs.

Probable culprit:
    def _get_title(self, url):
        "Gets the title of a page"
        try:
            headers = {'User-Agent': 'Mozilla/5.0'}
            etree = get_html_parse_tree(url, None, headers, 'etree')
            title = etree.findtext('head/title')
            return title