Amazon metadata scraper does not work due to recent site layout changes

Bug #1379305 reported by Thomas Voigt on 2014-10-09
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
calibre
Undecided
Unassigned

Bug Description

Hi there!

Due to recent site layout changes on Amazon.com and Amazon.de the metadata plugin will not find any book results.

E.g., Amazon changed <div id="result_X" ...> to <li id="result_X" ...>, the details page links do not have class="title" anymore and use <h2> instead of <h3>...

The following XPath expression works with the newest layout and finds all details page links:

//li[starts-with(@id, "result_")]//a[contains(@class,"s-access-detail-page")]

Amazon seems to be pretty consistent in using descriptive class attributes to mark specific items. So maybe this expression is more reliable:

//*[contains(@class, "s-result-item")]//a[contains(@class,"s-access-detail-page")]

Please see the attached HTML file (http://www.amazon.com/s/?field-title=Darwin&search-alias=stripbooks) as Amazon may be delivering different layouts depending on user-agent, region or server load balancing.

Thanks and best regards,
Thomas

Calibre 2.4.0 or 2.5.0
Amazon Metadata Plugin 1.0.0

Thomas Voigt (tvoigt) wrote :

Fixed in branch master. The fix will be in the next release. calibre is usually released every Friday.

 status fixreleased

Changed in calibre:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Bug attachments