downloading covers from Google Images ans open library fails

Bug #2043415 reported by Roy Kroeze
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Fix Released
Undecided
Unassigned

Bug Description

When trying to download covers, I do not get results from Google Images or Open Library anymore. I am using calibre 6.29 on Windows 11. I have set it up to download from Google images, Amazon and Open Library. Amazon still works, but the other two do not give any results anymore on any book I try. I go into the "Edit metadata" for a specific book, and press the button "download cover".

As an example, I tried downloading a cover for "Pride and prejudice", there were no results, and the log reads as follows (deleted the Amazon-part as that one does still work):

Starting cover download for: Pride and Prejudice
Query: Pride and Prejudice ['Jane Austen'] {'uri': 'http://www.gutenberg.org/1342'}

****************************** Google Images Covers ******************************
Request extra headers: [('User-agent', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36')]
Failed to download valid cover
Took 0.26390886306762695 seconds
Search URL: https://www.google.com/search?as_st=y&tbm=isch&as_q=Pride+Prejudice+Jane+Austen&as_epq=&as_oq=&as_eq=&cr=&as_sitesearch=&safe=images&tbs=isz:lt,islt:svga,iar:t,ift:jpg
No images found for, title: 'Pride Prejudice' and authors: ['Jane Austen']

********************************************************************************

****************************** Open Library Covers ******************************
Request extra headers: [('User-agent', 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko')]
Failed to download valid cover
Took 0.0 seconds

********************************************************************************

Revision history for this message
Kovid Goyal (kovid) wrote :

Its working fine for me. See screenshot. Probably Google has temporarily blocked/captchaed your IP. Try using a VPN or wait a day or so for it to be unblocked. The other possibility is the blocking is happening on your own computer that can usually be fixed by rebooting into safe mode ito test if you are on windows.

Changed in calibre:
status: New → Invalid
Revision history for this message
Kovid Goyal (kovid) wrote :

Oh and there have been some reports of google searches failing from the EU because of GDPR cookies so if you ar ein the EU that might be the issue as well.

Revision history for this message
Roy Kroeze (bookswurm) wrote :

Hi Kovid, thanks for the replies.

I am located in the Netherlands, so in the EU. have used my VPN to switch to another EU-country, and got the same result (but obviously with a different IP-address, so that eliminates the temporarily blocking of my IP.

When I use the VPN to connect through an endpoint in USA, I do get results back. So it looks like the EU-specific GDPR-cookies are the cause. Is this issue being worked on by any chance?

Revision history for this message
Kovid Goyal (kovid) wrote :

Sadly I dont have access to an EU IP address so its waiting for someone
form the EU to do the work. Relevant code is in google_images.py

Revision history for this message
Kovid Goyal (kovid) wrote :

If you can do a google search in your browser and tell me what the
needed cookie is, it might help.

Revision history for this message
Roy Kroeze (bookswurm) wrote :

I don't have developer-skills myself, but I certainly want to help!

Do you need me to run a search-query, and then show you which cookies are mentioned in the webdeveloper-tools? Or do you have a certain method for me to know which cookies aare used?

Revision history for this message
Kovid Goyal (kovid) wrote :

I'm afraid it's going to need some developer skills. It's basically this
issue:
https://github.com/benbusby/whoogle-search/issues/1053

I have committed a fix based on the solution there, but I have no way to
test it.

Revision history for this message
Kovid Goyal (kovid) wrote :

Fixed in branch master. The fix will be in the next release. calibre is usually released every alternate Friday.

Changed in calibre:
status: Invalid → Fix Released
Revision history for this message
Roy Kroeze (bookswurm) wrote :

You're a legend. Thank you so much.

Revision history for this message
Charles Haley (cbhaley) wrote :

@kovid: I tried it in the UK. I get the GDPR cookie dialog. Before your push, google images returned nothing. After, it works.

I am a bit nervous about the long hash-looking number hiding something they will change from time to time, but at the moment it clearly isn't dependent on something local to the machine.

Revision history for this message
Kovid Goyal (kovid) wrote :

Good to know, thanks. That's base64 encoded text containing a date and a
language preference with a bunch of other bytes. It might make sense to
update the date and regenerate but given the other unknown bytes I am
not sure.

echo CAESOAgUEitib3FfaWRlbnRpdHlmcm9udGVuZHVpc2VydmVyXzIwMjMxMTA3LjA1X3AwGgVlbi1VUyADGgYIgPHKqgY= | base64 -d | cat -v
^H^A^R8^H^T^R+boq_identityfrontenduiserver_20231107.05_p0^Z^Een-US ^C^Z^F^HM-^@M-qM-JM-*^F

Revision history for this message
Kovid Goyal (kovid) wrote :

I added a commit to update the date in the cookie see if it works.

Revision history for this message
Charles Haley (cbhaley) wrote :

It still works.

I don't know if it will help but I captured the cookie sequence. In the attached file you will find the cookies that show when the consent screen is displayed, the cookies after rejected consent, the cookies after accepted consent, and the HTML of the consent dialog. My thought is that it might be more reliable to simulate clicking a button than to "guess" at the cookie contents.

Revision history for this message
Kovid Goyal (kovid) wrote :

Thanks, that's helpful.

The problem with "clicking the button" is that's very slow requiring three whole roundtrips to setup the browser object. And I am fairly sure looking at the HTML this is implemented using javascript (search the html for submit and you will see its implemented in JS) which means it wont work with mechanize so one would have to reverse engineer the JS to figure out the actual request to send. That's a fair bit of work.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.