Griffith

Plot isn't being pulled from IMDB for some movies

Bug #224546 reported by Neil Burlock on 2008-04-30

Affects		Status	Importance	Assigned to	Milestone
	Griffith	Fix Released	Medium	Michael

Bug Description

Get From Web skips the plot for some movies on the final release version of Hardy, Griffith 0.9.6 with the following error in the terminal, while attempting to fetch info on a movie:

/usr/share/griffith/lib/add.py:516: GtkWarning: gtk_text_buffer_emit_insert: assertion `g_utf8_validate (text, len, NULL)' failed
plot_buffer.set_text(gutils.convert_entities(self.movie.plot))

Example movies that are causing this problem:

Corpse Bride
The Hitcher
The Departed

My locale is set to en_AU.UTF-8

Revision history for this message

Neil Burlock (malone) wrote on 2008-08-17:

I've tried the latest release and this problem still exists, so I did some investigating. What I've found is that the gutils.convert_entities function isn't working the way I think it's supposed to.

Certain IMDB plot summaries contain non-ascii characters, for example Aeon Flux, which has a latin "AE" charcter at index 3399/3400. The plot is being read correctly, then it's passed to convert_entities which returns it unchanged, then assigns it to the plot_buffer via set_text causing the error.

I've been able to get plots to import correctly on my system by changing the code to no longer call convert_entities - this is what I've done.

In populate_with_results, around line 207, I changed it from:

    if 'plot' in fields_to_fetch:
        plot_buffer = w['plot'].get_buffer()
        plot_buffer.set_text(gutils.convert_entities(self.movie.plot))
        fields_to_fetch.pop(fields_to_fetch.index('plot'))

to (adding import unicodedata to the top of the file):
    if 'plot' in fields_to_fetch:
        plot_buffer = w['plot'].get_buffer()
        plot_buffer.set_text( unicodedata.normalize('NFD',self.movie.plot.decode('latin-1')).encode('utf-8'))
        fields_to_fetch.pop(fields_to_fetch.index('plot'))

This works without error on my system and correctly converts the AE character so the plot is now saved in the DB. I'm not really familiar with Python, so I can't decipher what convert_entities is doing, but after sticking in some print statements to trace the flow I can see that for some reason the function thinks that nothing needs to be done to the plot so returns it unchanged, in what would appear to be it's original latin-1 format, which is why the set_text function fails whenever there are non-ascii characters in the string, since it is expecting utf-8.

I found the fix on the following page, where it is explained in more detail why this works:

http://blog.magnetk.com/2008/05/06/finessing-international-characters-out-of-python

I don't have any idea why this is happening, but it has affected 3 different 64 bit Ubuntu, running Feisty and Hardy since I started using Griffith early this year. The only thing I can think of is that the way convert_entities currently works, it doesn't handle my Australian locale.

I've tried the latest release and this problem still exists, so I did some investigating.  What I've found is that the gutils.convert_entities function isn't working the way I think it's supposed to.

Certain IMDB plot summaries contain non-ascii characters, for example Aeon Flux, which has a latin "AE" charcter at index 3399/3400.  The plot is being read correctly, then it's passed to convert_entities which returns it unchanged, then assigns it to the plot_buffer via set_text causing the error.

I've been able to get plots to import correctly on my system by changing the code to no longer call convert_entities - this is what I've done.

In populate_with_results, around line 207, I changed it from:

This works without error on my system and correctly converts the AE character so the plot is now saved in the DB.  I'm not really familiar with Python, so I can't decipher what convert_entities is doing, but after sticking in some print statements to trace the flow I can see that for some reason the function thinks that nothing needs to be done to the plot so returns it unchanged, in what would appear to be it's original latin-1 format, which is why the set_text function fails whenever there are non-ascii characters in the string, since it is expecting utf-8.

I found the fix on the following page, where it is explained in more detail why this works:

http://blog.magnetk.com/2008/05/06/finessing-international-characters-out-of-python

I don't have any idea why this is happening, but it has affected 3 different 64 bit Ubuntu, running Feisty and Hardy since I started using Griffith early this year.  The only thing I can think of is that the way convert_entities currently works, it doesn't handle my Australian locale.

Revision history for this message

Owyn (i-leacy) wrote on 2009-03-29:

Problem also occurs on Windows 0.9.9, e.g. Match Point (2005)

Revision history for this message

Michael (mikej06) wrote on 2009-03-29:

Fixed in revision 1178.
Web page encoding was wrong for the plugin, changed from utf8 to iso8859-1.

Changed in griffith:
assignee:	nobody → mikej06
importance:	Undecided → Medium
status:	New → Fix Committed

Revision history for this message

Owyn (i-leacy) wrote on 2009-04-03:

Thanks. Pulled 1179 plugin (to get country newline fix as well) and moved to 0.10-beta2 plugins.

Fixed this and several other lookup problems.

Revision history for this message

Piotr Ożarowski (piotr) wrote on 2009-07-12:

0.10-rc1 released

Changed in griffith:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.