Exceptions when IMDB lacks plot summary, cast, cover url

Bug #633326 reported by Peter on 2010-09-08
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Entertainer Media Center
Undecided
Unassigned

Bug Description

The following is several very similar bugs in the video meta data scanner,
entertainerlib/backend/components/mediacache/video_metadata_search.py

Basically it doesn't cope well with missing data, showing as exceptions at the terminal, and the failure to add these videos to the library.

You can test this with the free short animated film "Big Buck Bunny", see www.bigbuckbunny.org

As you can see, (currently) the IMDB don't have a plot summary for this:
http://akas.imdb.com/title/tt1254207/plotsummary

I have attached a patch for the changes described below.

===================================================================================

I saw the following exception in the terminal output running entertainer 0.5.1 on Ubuntu Lucid,

2010-09-08 16:38:36,547 DEBUG [imdbpy.parser.http] /usr/lib/pymodules/python2.6/imdb/parser/http/__init__.py:412: fetching url http://akas.imdb.com/title/tt1254207/plotsummary (size: -1)
2010-09-08 16:38:37+0100 [-] Exception in thread Video metadata search thread:
2010-09-08 16:38:37+0100 [-] Traceback (most recent call last):
2010-09-08 16:38:37+0100 [-] File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
2010-09-08 16:38:37+0100 [-] self.run()
2010-09-08 16:38:37+0100 [-] File "/home/peterjc/Downloads/entertainer-0.5.1/entertainerlib/backend/components/mediacache/video_metadata_search.py", line 121, in run
2010-09-08 16:38:37+0100 [-] plot_string = movie['plot'][0]
2010-09-08 16:38:37+0100 [-] File "/usr/lib/pymodules/python2.6/imdb/utils.py", line 1366, in __getitem__
2010-09-08 16:38:37+0100 [-] rawData = self.data[key]
2010-09-08 16:38:37+0100 [-] KeyError: 'plot'
2010-09-08 16:38:37+0100 [-]

I would replace these line in /entertainerlib/backend/components/mediacache/video_metadata_search.py

            plot_string = movie['plot'][0]
            plot = plot_string[plot_string.rfind("::")+2:].lstrip()

with:

            try:
                plot_string = movie['plot'][0]
                plot = plot_string[plot_string.rfind("::")+2:].lstrip()
            except KeyError:
                plot = ''

i.e. Use the dictionary get method's default value feature to avoid the KeyError.

===================================================================================

Once the above is fixed, you see new errors about actors, writers and directors:

2010-09-08 16:47:36,510 DEBUG [imdbpy.parser.http] /usr/lib/pymodules/python2.6/imdb/parser/http/__init__.py:412: fetching url http://akas.imdb.com/title/tt1254207/plotsummary (size: -1)
Exception in thread Video metadata search thread:
Traceback (most recent call last):
  File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
    self.run()
  File "/home/peterjc/Downloads/entertainer-0.5.1/entertainerlib/backend/components/mediacache/video_metadata_search.py", line 136, in run
    p = self._get_persons(movie)
  File "/home/peterjc/Downloads/entertainer-0.5.1/entertainerlib/backend/components/mediacache/video_metadata_search.py", line 217, in _get_persons
    a1 = movie['actors'][0]['name']
  File "/usr/lib/pymodules/python2.6/imdb/utils.py", line 1366, in __getitem__
    rawData = self.data[key]
KeyError: 'cast'

This requires more substantial changes in _get_persons which currently makes a lot of big assumptions!
I suggest replacing the current method:

    def _get_persons(self, movie):
        """
        Get a list of persons. First five names are actors, then comes two
        directors and two writers.
        @param movie: Movie name
        @return: List of strings containing actors, directors and writers
        """
        a1 = movie['actors'][0]['name']
        a2 = movie['actors'][1]['name']
        a3 = movie['actors'][2]['name']
        a4 = movie['actors'][3]['name']
        a5 = movie['actors'][4]['name']
        w1 = movie['writer'][0]['name']
        w2 = movie['writer'][1]['name']
        d1 = movie['director'][0]['name']
        d2 = movie['director'][1]['name']
        return [a1, a2, a3, a4, a5, w1, w2, d1, d2]

with:

    def _get_role(self, movie, role, count=None, default=''):
        """
        Get a list of actors (expect up to five), writers (expect up to two)
        or directors (expect up to two)
        @param movie: Movie name
        @param role: string actors, writer, or director
        @param count: optional integer, force return of this many strings
        @param default: value to pad return list with if using count
        @return: List of strings containing people in that role
        """
        actors = movie.get(role, [])
        names = [a['name'] for a in actors]
        if count:
            names[:count] #truncate list
            names += [default] * (count - len(names))
        return names

    def _get_persons(self, movie):
        """
        Get a list of persons. First five names are actors, then comes two
        directors and two writers.
        @param movie: Movie name
        @return: List of strings containing actors, directors and writers
        """
        a1, a2, a3, a4, a5 = self._get_role(movie, 'actors', count=5)
        w1, w2 = self._get_role(movie, 'writer', count=2)
        d1, d2 = self._get_role(movie, 'director', count=2)
        return [a1, a2, a3, a4, a5, w1, w2, d1, d2]

===================================================================================

Once those are fixed, you get a similar error for missing cover URLs,

2010-09-08 17:04:05,546 DEBUG [imdbpy.parser.http] /usr/lib/pymodules/python2.6/imdb/parser/http/__init__.py:412: fetching url http://akas.imdb.com/title/tt1254207/plotsummary (size: -1)
c2010-09-08 17:04:06+0100 [-] Exception in thread Video metadata search thread:
2010-09-08 17:04:06+0100 [-] Traceback (most recent call last):
2010-09-08 17:04:06+0100 [-] File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner
2010-09-08 17:04:06+0100 [-] self.run()
2010-09-08 17:04:06+0100 [-] File "/home/peterjc/Downloads/entertainer-0.5.1/entertainerlib/backend/components/mediacache/video_metadata_search.py", line 143, in run
2010-09-08 17:04:06+0100 [-] self._download_cover_art(movie['cover url'], title)
2010-09-08 17:04:06+0100 [-] File "/usr/lib/pymodules/python2.6/imdb/utils.py", line 1366, in __getitem__
2010-09-08 17:04:06+0100 [-] rawData = self.data[key]
2010-09-08 17:04:06+0100 [-] KeyError: 'cover url'
2010-09-08 17:04:06+0100 [-]

I would replace these calls (two cases) in the run method:

    self._download_cover_art(movie['cover url'], title)

with:

    if 'cover url' in movie:
        self._download_cover_art(movie['cover url'], title)

and:

    self._download_cover_art(series['cover url'], series_title)

with:

    if 'cover url' in series:
        self._download_cover_art(series['cover url'], series_title)

Revision history for this message
Peter (maubp) wrote :
Revision history for this message
Peter (maubp) wrote :
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers