Alien Loves Predator Scraper Fix
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Dosage |
Fix Committed
|
Medium
|
Tristan Seligmann |
Bug Description
class AlienLovesPreda
imageUrl = 'http://
imageSearch = compile(r'<img src="(.
prevSearch = compile(r'<a href="(.+?)"><img src="/images/
help = 'Index format: nnn'
starter = indirectStarter('http://
def namer(cls, imageUrl, pageUrl):
vol = pageUrl.
num = pageUrl.
ccc = pageUrl.
ddd = pageUrl.
return '%s-%s-%s-%s' % (vol, num, ccc, ddd)
They use random image names now so this deliberately misses the latest strip so the rest can be named against the page url. It will catch it next update.
Related branches
Changed in dosage: | |
assignee: | nobody → Tristan Seligmann (mithrandi) |
importance: | Undecided → Medium |
milestone: | none → 1.7.0 |
status: | New → In Progress |
Changed in dosage: | |
status: | In Progress → Fix Committed |
By using bounceStarter instead of indirectStarter, you can still fetch the latest strip. Essentially, bounceStarter follows the "previous" link (like you're doing with indirectStarter), but then follows the "next" link in order to get back to the very latest comic.