Comics: add screenscraping (XPath) support as alternative to feeds
Bug #427054 reported by
Mark Lee
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Awn Extras |
Fix Released
|
High
|
Gabor Karsay |
Bug Description
I really, really want to get rid of the unmaintained comic applet, but in order to do so with as little complaints as possible, most, if not all, of the comics that are supported by the comic applet need to be supported by Comics! as well. Unfortunately, several of the comics do not have feeds, and require screen scraping in order to retrieve the image.
I propose that in order to keep roughly the same clean structure as the feed config files, the screen scraping configuration files should have a name, URL, and XPath to the <img/> tag containing the comic strip URL.
Related branches
lp:~gabor-karsay/awn-extras/comics
- Awn Extras Developers: Pending requested
-
Diff: 3450 lines (+1084/-1555) (has conflicts)43 files modifiedapplets/maintained/comics/Makefile.am (+3/-0)
applets/maintained/comics/comics.py (+78/-30)
applets/maintained/comics/comics_add.py (+150/-34)
applets/maintained/comics/comics_manage.py (+87/-45)
applets/maintained/comics/comics_view.py (+123/-84)
applets/maintained/comics/feed/__init__.py (+9/-8)
applets/maintained/comics/feed/basic.py (+94/-13)
applets/maintained/comics/feed/plugins/__init__.py (+18/-0)
applets/maintained/comics/feed/plugins/simple_screen_scraper.py (+112/-0)
applets/maintained/comics/feed/rss.py (+2/-2)
applets/maintained/comics/feeds/ben.feed (+7/-0)
applets/maintained/comics/feeds/buttersafe.feed (+3/-3)
applets/maintained/comics/feeds/calvinandhobbes.feed (+6/-0)
applets/maintained/comics/feeds/ferdnand.feed (+7/-0)
applets/maintained/comics/feeds/garfield.feed (+6/-0)
applets/maintained/comics/feeds/nancy.feed (+7/-0)
applets/maintained/comics/feeds/pcnp.feed (+7/-0)
applets/maintained/comics/feeds/peanuts.feed (+0/-2)
applets/maintained/comics/feeds/pearls.feed (+7/-5)
applets/maintained/comics/feeds/pickles.feed (+6/-0)
applets/maintained/comics/feeds/userfriendly.feed (+7/-0)
applets/maintained/comics/feeds/wizardofid.feed (+6/-3)
applets/maintained/comics/ui/add.ui (+281/-57)
applets/maintained/comics/ui/manage.ui (+58/-32)
applets/maintained/comics/ui/view.ui (+0/-154)
applets/unmaintained/comic/Makefile.am (+0/-24)
applets/unmaintained/comic/comic.desktop.in (+0/-11)
applets/unmaintained/comic/comic.py (+0/-226)
applets/unmaintained/comic/comicdialog.py (+0/-46)
applets/unmaintained/comic/getben.py (+0/-75)
applets/unmaintained/comic/getborn.py (+0/-75)
applets/unmaintained/comic/getdilbert.py (+0/-75)
applets/unmaintained/comic/getferdnand.py (+0/-75)
applets/unmaintained/comic/getgarfield.py (+0/-56)
applets/unmaintained/comic/getnancy.py (+0/-75)
applets/unmaintained/comic/getpcnp.py (+0/-75)
applets/unmaintained/comic/getpeanuts.py (+0/-75)
applets/unmaintained/comic/getpickles.py (+0/-75)
applets/unmaintained/comic/getwiz.py (+0/-75)
applets/unmaintained/comic/getxkcd.py (+0/-40)
debian/awn-applets-python-extras-trunk.install (+0/-3)
po/POTFILES.in (+0/-1)
po/POTFILES.skip (+0/-1)
Changed in awn-extras: | |
assignee: | Moses Palmér (mosespalmer) → Gabor Karsay (gabor-karsay) |
status: | New → In Progress |
Changed in awn-extras: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
There is already support for this started.
The class feed.Feed is the abstract base class for feeds. Subclasses
need to implement the parse_file(self, file_name) method, where file
name is a local file name with a cached version of the data at the
specified URL.
Comics! decides which Feed subclass to use by reading the plug-in key of
the feed descriptor file, and uses RSSFeed as a fall-back. To implement
a new feed class, one would create a new module with two functions:
get_class(), which returns the Feed subclass that the module provides,
and matches_url(url), which returns whether the Feed is able to read
comics from a specified url.
When looking through the code, I notice that all parts are implemented,
except for saving the plug-in name in comics_add.py.
At the moment, I do not really have any spare time. If anybody would
like to start with the implementation of the different plug-ins / has a
comment on the current system, I will be able to give guidance / revise.
ons 2009-09-09 klockan 21:50 +0000 skrev Mark Lee:
> Public bug reported:
>
> I really, really want to get rid of the unmaintained comic applet, but
> in order to do so with as little complaints as possible, most, if not
> all, of the comics that are supported by the comic applet need to be
> supported by Comics! as well. Unfortunately, several of the comics do
> not have feeds, and require screen scraping in order to retrieve the
> image.
>
> I propose that in order to keep roughly the same clean structure as the
> feed config files, the screen scraping configuration files should have a
> name, URL, and XPath to the <img/> tag containing the comic strip URL.
>
> ** Affects: awn-extras
> Importance: High
> Assignee: Moses Palmér (mosespalmer)
> Status: New
>
>
> ** Tags: applet comics
>