Plot isn't being pulled from IMDB for some movies
Bug #224546 reported by
Neil Burlock
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Griffith |
Fix Released
|
Medium
|
Michael |
Bug Description
Get From Web skips the plot for some movies on the final release version of Hardy, Griffith 0.9.6 with the following error in the terminal, while attempting to fetch info on a movie:
/usr/share/
plot_
Example movies that are causing this problem:
Corpse Bride
The Hitcher
The Departed
My locale is set to en_AU.UTF-8
To post a comment you must log in.
I've tried the latest release and this problem still exists, so I did some investigating. What I've found is that the gutils. convert_ entities function isn't working the way I think it's supposed to.
Certain IMDB plot summaries contain non-ascii characters, for example Aeon Flux, which has a latin "AE" charcter at index 3399/3400. The plot is being read correctly, then it's passed to convert_entities which returns it unchanged, then assigns it to the plot_buffer via set_text causing the error.
I've been able to get plots to import correctly on my system by changing the code to no longer call convert_entities - this is what I've done.
In populate_ with_results, around line 207, I changed it from:
if 'plot' in fields_to_fetch: ].get_buffer( )
plot_buffer. set_text( gutils. convert_ entities( self.movie. plot))
fields_ to_fetch. pop(fields_ to_fetch. index(' plot'))
plot_buffer = w['plot'
to (adding import unicodedata to the top of the file): ].get_buffer( )
plot_buffer. set_text( unicodedata. normalize( 'NFD',self. movie.plot. decode( 'latin- 1')).encode( 'utf-8' ))
fields_ to_fetch. pop(fields_ to_fetch. index(' plot'))
if 'plot' in fields_to_fetch:
plot_buffer = w['plot'
This works without error on my system and correctly converts the AE character so the plot is now saved in the DB. I'm not really familiar with Python, so I can't decipher what convert_entities is doing, but after sticking in some print statements to trace the flow I can see that for some reason the function thinks that nothing needs to be done to the plot so returns it unchanged, in what would appear to be it's original latin-1 format, which is why the set_text function fails whenever there are non-ascii characters in the string, since it is expecting utf-8.
I found the fix on the following page, where it is explained in more detail why this works:
http:// blog.magnetk. com/2008/ 05/06/finessing -international- characters- out-of- python
I don't have any idea why this is happening, but it has affected 3 different 64 bit Ubuntu, running Feisty and Hardy since I started using Griffith early this year. The only thing I can think of is that the way convert_entities currently works, it doesn't handle my Australian locale.