Awn Extras

Comics: add screenscraping (XPath) support as alternative to feeds

Bug #427054 reported by Mark Lee on 2009-09-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Awn Extras	Fix Released	High	Gabor Karsay	Awn Extras 0.4.2

Bug Description

I really, really want to get rid of the unmaintained comic applet, but in order to do so with as little complaints as possible, most, if not all, of the comics that are supported by the comic applet need to be supported by Comics! as well. Unfortunately, several of the comics do not have feeds, and require screen scraping in order to retrieve the image.

I propose that in order to keep roughly the same clean structure as the feed config files, the screen scraping configuration files should have a name, URL, and XPath to the <img/> tag containing the comic strip URL.

Tags:

Related branches

lp:~gabor-karsay/awn-extras/comics

Merged into lp:awn-extras at revision 1494

Awn Extras Developers: Pending requested 2010-12-10

Revision history for this message

Moses Palmér (mosespalmer) wrote on 2009-09-11: Re: [Bug 427054] [NEW] Comics: add screenscraping (XPath) support as alternative to feeds

There is already support for this started.

The class feed.Feed is the abstract base class for feeds. Subclasses
need to implement the parse_file(self, file_name) method, where file
name is a local file name with a cached version of the data at the
specified URL.

Comics! decides which Feed subclass to use by reading the plug-in key of
the feed descriptor file, and uses RSSFeed as a fall-back. To implement
a new feed class, one would create a new module with two functions:
get_class(), which returns the Feed subclass that the module provides,
and matches_url(url), which returns whether the Feed is able to read
comics from a specified url.

When looking through the code, I notice that all parts are implemented,
except for saving the plug-in name in comics_add.py.

At the moment, I do not really have any spare time. If anybody would
like to start with the implementation of the different plug-ins / has a
comment on the current system, I will be able to give guidance / revise.

ons 2009-09-09 klockan 21:50 +0000 skrev Mark Lee:
> Public bug reported:
>
> I really, really want to get rid of the unmaintained comic applet, but
> in order to do so with as little complaints as possible, most, if not
> all, of the comics that are supported by the comic applet need to be
> supported by Comics! as well. Unfortunately, several of the comics do
> not have feeds, and require screen scraping in order to retrieve the
> image.
>
> I propose that in order to keep roughly the same clean structure as the
> feed config files, the screen scraping configuration files should have a
> name, URL, and XPath to the <img/> tag containing the comic strip URL.
>
> ** Affects: awn-extras
> Importance: High
> Assignee: Moses Palmér (mosespalmer)
> Status: New
>
>
> ** Tags: applet comics
>

Gabor Karsay (gabor-karsay) on 2010-11-13

Changed in awn-extras:
assignee:	Moses Palmér (mosespalmer) → Gabor Karsay (gabor-karsay)
status:	New → In Progress

Revision history for this message

Gabor Karsay (gabor-karsay) wrote on 2011-01-09:

Fix committed in rev. 1494.

Changed in awn-extras:
milestone:	none → 0.4.2
status:	In Progress → Fix Committed

Povilas Kanapickas (p12) on 2013-11-17

Changed in awn-extras:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.