Comics: add screenscraping (XPath) support as alternative to feeds

Bug #427054 reported by Mark Lee
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Awn Extras
Fix Released
High
Gabor Karsay

Bug Description

I really, really want to get rid of the unmaintained comic applet, but in order to do so with as little complaints as possible, most, if not all, of the comics that are supported by the comic applet need to be supported by Comics! as well. Unfortunately, several of the comics do not have feeds, and require screen scraping in order to retrieve the image.

I propose that in order to keep roughly the same clean structure as the feed config files, the screen scraping configuration files should have a name, URL, and XPath to the <img/> tag containing the comic strip URL.

Tags: applet comics

Related branches

Revision history for this message
Moses Palmér (mosespalmer) wrote : Re: [Bug 427054] [NEW] Comics: add screenscraping (XPath) support as alternative to feeds

There is already support for this started.

The class feed.Feed is the abstract base class for feeds. Subclasses
need to implement the parse_file(self, file_name) method, where file
name is a local file name with a cached version of the data at the
specified URL.

Comics! decides which Feed subclass to use by reading the plug-in key of
the feed descriptor file, and uses RSSFeed as a fall-back. To implement
a new feed class, one would create a new module with two functions:
get_class(), which returns the Feed subclass that the module provides,
and matches_url(url), which returns whether the Feed is able to read
comics from a specified url.

When looking through the code, I notice that all parts are implemented,
except for saving the plug-in name in comics_add.py.

At the moment, I do not really have any spare time. If anybody would
like to start with the implementation of the different plug-ins / has a
comment on the current system, I will be able to give guidance / revise.

ons 2009-09-09 klockan 21:50 +0000 skrev Mark Lee:
> Public bug reported:
>
> I really, really want to get rid of the unmaintained comic applet, but
> in order to do so with as little complaints as possible, most, if not
> all, of the comics that are supported by the comic applet need to be
> supported by Comics! as well. Unfortunately, several of the comics do
> not have feeds, and require screen scraping in order to retrieve the
> image.
>
> I propose that in order to keep roughly the same clean structure as the
> feed config files, the screen scraping configuration files should have a
> name, URL, and XPath to the <img/> tag containing the comic strip URL.
>
> ** Affects: awn-extras
> Importance: High
> Assignee: Moses Palmér (mosespalmer)
> Status: New
>
>
> ** Tags: applet comics
>

Changed in awn-extras:
assignee: Moses Palmér (mosespalmer) → Gabor Karsay (gabor-karsay)
status: New → In Progress
Revision history for this message
Gabor Karsay (gabor-karsay) wrote :

Fix committed in rev. 1494.

Changed in awn-extras:
milestone: none → 0.4.2
status: In Progress → Fix Committed
Povilas Kanapickas (p12)
Changed in awn-extras:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.