Dosage

Alien Loves Predator Scraper Fix

Bug #492143 reported by Ged Walsh on 2009-12-04

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Dosage	Fix Committed	Medium	Tristan Seligmann	Dosage 1.7.0 "Do you have anything else horrible?"

Bug Description

class AlienLovesPredator(BasicScraper):
imageUrl = 'http://alienlovespredator.com/%s'
imageSearch = compile(r'<img src="(.+?)"[^>]+>(<center>\n|\n|</center>\n)<div style="height: 2px;"> </div>', MULTILINE)
prevSearch = compile(r'<a href="(.+?)"><img src="/images/nav_previous.jpg"')
help = 'Index format: nnn'
starter = indirectStarter('http://alienlovespredator.com/index.php', compile(r'<a href="(.+?)"><img src="/images/nav_previous.jpg"'))
def namer(cls, imageUrl, pageUrl):
     vol = pageUrl.split('/')[-5]
     num = pageUrl.split('/')[-4]
     ccc = pageUrl.split('/')[-3]
        ddd = pageUrl.split('/')[-2]
     return '%s-%s-%s-%s' % (vol, num, ccc, ddd)

They use random image names now so this deliberately misses the latest strip so the rest can be named against the page url. It will catch it next update.

Related branches

lp:~dosage-dev/dosage/bunch-of-comics-4 (Merged)

Revision history for this message

Tristan Seligmann (mithrandi) wrote on 2009-12-04:

By using bounceStarter instead of indirectStarter, you can still fetch the latest strip. Essentially, bounceStarter follows the "previous" link (like you're doing with indirectStarter), but then follows the "next" link in order to get back to the very latest comic.

Revision history for this message

Ged Walsh (bleedingheart) wrote on 2009-12-05:

Thanks, fix using bouncestarter;

class AlienLovesPredator(BasicScraper):
imageUrl = 'http://alienlovespredator.com/%s'
imageSearch = compile(r'<img src="(.+?)"[^>]+>(<center>\n|\n|</center>\n)<div style="height: 2px;"> </div>', MULTILINE)
prevSearch = compile(r'<a href="(.+?)"><img src="/images/nav_previous.jpg"')
help = 'Index format: nnn'
starter = bounceStarter('http://alienlovespredator.com/index.php', compile(r'<a href="(.+?)"><img src="/images/nav_next.jpg"'))
def namer(cls, imageUrl, pageUrl):
     vol = pageUrl.split('/')[-5]
     num = pageUrl.split('/')[-4]
     ccc = pageUrl.split('/')[-3]
        ddd = pageUrl.split('/')[-2]
     return '%s-%s-%s-%s' % (vol, num, ccc, ddd)

Tristan Seligmann (mithrandi) on 2010-01-12

Changed in dosage:
assignee:	nobody → Tristan Seligmann (mithrandi)
importance:	Undecided → Medium
milestone:	none → 1.7.0
status:	New → In Progress

Tristan Seligmann (mithrandi) on 2010-01-15

Changed in dosage:
status:	In Progress → Fix Committed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.