news download crashes with TypeError: '<' not supported between instances of 'float' and 'str'

Bug #1956097 reported by Rom Bualdo
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Undecided
Unassigned

Bug Description

Hi there. Many news recipe in Argentina have been down for a while. Without any knowledge of programming, I have managed to make one of them work ("Clarín") by simply removing the pre and postprocesses lines. It worked (I published th updated recipe somewhere else, but I can´t remember now), but only for a few days. Now I´m getting a different error, and of course I can´t deal with it. The error seems to be the one in the title:
TypeError: '<' not supported between instances of 'float' and 'str'

I have updated calibre to the last version (5.34) and running it on Windows 10.

So, I´ll copy below the modified recipe, and attach the crash report in a txt file.
Thanks to all of you who make Calibre and keep it working.

UPDATED RECIPE

#!/usr/bin/env python
# -*- mode: python -*-
# -*- coding: utf-8 -*-

from __future__ import unicode_literals
__license__ = 'GPL v3'
__copyright__ = '2008-2016, Darko Miletic <darko.miletic at gmail.com>'
'''
clarin.com
'''

try:
    from urllib.parse import urlencode
except ImportError:
    from urllib import urlencode
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class Clarin(BasicNewsRecipe):
    title = 'Clarín'
    __author__ = 'Darko Miletic, updated by GGsalas'
    description = 'Clarin.com. Noticias de la Argentina y el mundo. Información actualizada las 24 horas y en español. Informate ya'
    publisher = 'Grupo Clarin'
    category = 'news, politics, Argentina'
    oldest_article = 1
    max_articles_per_feed = 100
    use_embedded_content = False
    no_stylesheets = True
    encoding = 'utf8'
    delay = 1
    language = 'es_AR'
    publication_type = 'newspaper'
    needs_subscription = 'optional'
    INDEX = 'http://www.clarin.com'
    LOGIN = 'https://app-pase.clarin.com/pase-registracion/app/pase/ingresarNavegable?execution=e1s1'
    masthead_url = 'http://www.clarin.com/images/logo_clarin.svg'
    cover_url = strftime('http://tapas.clarin.com/tapa/%Y/%m/%d/%Y%m%d_thumb.jpg')

    compress_news_images = True
    scale_news_images_to_device = True
    compress_news_images_max_size = 10 # kB
    scale_news_images = True
    handle_gzip = True

    # To get all the data (images)
    auto_cleanup = False

    extra_css = """
      h1#title {
        line-height: 1em;
        margin: 0 0 .5em 0;
      }
      p.volanta {
        font-size: .7em;
        margin-bottom: .5em;
      }
      .bajada h2 {
        font-size: 1em;
        line-height: 1em;
        color: #666666;
        margin: 0 0 1em 0;
      }
      .figcaption {
        font-style: italic;
        font-size: .9em;
        margin-bottom: .5em;
      }
    """

    conversion_options = {
      'comment': description, 'tags': category, 'publisher': publisher, 'language': language
    }

    keep_only_tags = [
      dict(name='p' , attrs={'class' : 'volanta'}),
      dict(name='h1' , attrs={'id': 'title'}),
      dict(name='div', attrs={'class' : 'bajada'}),
      dict(name='div', attrs={'id' : 'galeria-trigger'}),
      dict(name='div', attrs={'class' : 'body-nota'})

    ]

    remove_tags = [
        dict(name=['meta', 'base', 'link', 'iframe', 'embed', 'object']),
        dict(attrs={'class': ['tags-bar', 'breadcrumb', 'share-bar', 'share', 'sp__SM']}),
        dict(name='div', attrs={'class': lambda x: x and 'r-nota' in x.split()}),
        dict(attrs={'id': ['relacionadas']}),
        dict(name='a', attrs={'class':'content-new'})
    ]

    remove_tags_after = dict(name='div', attrs={'id': 'relacionadas'})

    remove_attributes = ['lang']

    # Images on hightlights view
    def populate_article_metadata(self, article, soup, first):
        if first and hasattr(self, 'add_toc_thumbnail'):
            picdiv = soup.find('img')
            if picdiv is not None:
                self.add_toc_thumbnail(article, picdiv['src'])

    feeds = [
      (u'Lo Ultimo', u'http://www.clarin.com/rss/lo-ultimo/'),
      (u'Politica', u'http://www.clarin.com/rss/politica/'),
      (u'Opinion', u'https://www.clarin.com/rss/opinion/'),
      (u'Cultura', u'https://www.clarin.com/rss/cultura/'),
      (u'Economia', u'https://www.clarin.com/rss/economia/'),
      (u'Tecnologia', u'https://www.clarin.com/rss/tecnologia/'),
      (u'RevistaN', u'https://www.clarin.com/rss/revista-enie/'),
      (u'Viva', u'https://www.clarin.com/rss/viva/'),
      (u'Deportes', u'http://www.clarin.com/rss/deportes/'),
      (u'Mundo', u'http://www.clarin.com/rss/mundo/'),
      (u'Espectaculos', u'http://www.clarin.com/rss/espectaculos/'),
      (u'Sociedad', u'http://www.clarin.com/rss/sociedad/'),
      (u'Ciudades', u'http://www.clarin.com/rss/ciudades/'),
      (u'Policiales', u'http://www.clarin.com/rss/policiales/'),
      (u'Internet', u'http://www.clarin.com/rss/internet/')
    ]

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        br.open(self.INDEX)
        return br

Revision history for this message
Rom Bualdo (rombualdo) wrote :
Rom Bualdo (rombualdo)
description: updated
Revision history for this message
Kovid Goyal (kovid) wrote : Fixed in master

Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.

 status fixreleased

Changed in calibre:
status: New → Fix Released
Revision history for this message
Rom Bualdo (rombualdo) wrote :

Hey, thanks for that quick fix!
I wanted to add that, unexpedtedly, two days later I was able again to download this recipe. Impossible to know what happened in between. However, I´m sure you repaired something that also needed to be repaired, so it´s good news anyway.

Thanks again for keeping this alive.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.