[url] Don't try and build an etree out of non-html

Bug #454768 reported by Stefano Rivera
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ibid
Fix Released
High
Stefano Rivera

Bug Description

Some digging through memory logs shows huge spikes of "Element" objects causing memory to be allocated.

These correlate strongly to the presence of .jpg URLs in the logs.

Probable culprit:
    def _get_title(self, url):
        "Gets the title of a page"
        try:
            headers = {'User-Agent': 'Mozilla/5.0'}
            etree = get_html_parse_tree(url, None, headers, 'etree')
            title = etree.findtext('head/title')
            return title

Related branches

Changed in ibid:
importance: Undecided → High
Changed in ibid:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.