oaibackup script error and issues

Bug #583365 reported by Richard H.
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Document Library
Fix Committed
Medium
Sylvain Viollon
1.6
Invalid
Undecided
Unassigned

Bug Description

We're running the oaibackup script to see the output. We run it with the following:

/webs/documentlibrary/oaibackup/bin/oaibackup --m=dl --doclib -o /webs/dl-backup-test -u XXXX -p XXXX -f 2009-09-20 http://URL/oaipmh_private

This produces the following:

Traceback (most recent call last):
  File "/webs/documentlibrary/oaibackup/bin/oaibackup", line 15, in ?
    oaibackup.main.main()
  File "/webs/documentlibrary/oaibackup/src/oaibackup/main.py", line 84, in main
    from_dt, options.incremental, credentials, hooks)
  File "/webs/documentlibrary/oaibackup/src/oaibackup/backup.py", line 54, in backup
    dt = _harvest(c, output_dir, prefix, from_dt, hooks)
  File "/webs/documentlibrary/oaibackup/src/oaibackup/backup.py", line 91, in _harvest
    hook(output_dir, identifier, m)
  File "/webs/documentlibrary/oaibackup/src/oaibackup/documentlibrary.py", line 26, in documentlibrary_filesave_hook
    f = urllib2.urlopen(url)
  File "/webs/documentlibrary/lib/python2.4/urllib2.py", line 130, in urlopen
    return _opener.open(url, data)
  File "/webs/documentlibrary/lib/python2.4/urllib2.py", line 364, in open
    response = meth(req, response)
  File "/webs/documentlibrary/lib/python2.4/urllib2.py", line 471, in http_response
    response = self.parent.error(
  File "/webs/documentlibrary/lib/python2.4/urllib2.py", line 402, in error
    return self._call_chain(*args)
  File "/webs/documentlibrary/lib/python2.4/urllib2.py", line 337, in _call_chain
    result = func(*args)
  File "/webs/documentlibrary/lib/python2.4/urllib2.py", line 480, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 401: Unauthorized

There's an 'Unauthorized' error at the very end, but we do get a collection of files accompanied by corresponding .dl ascii text files (which contain 'xmlns' statements). These .dl files are the metadata for each document included in the backup, but it is missing the 'category' information! e.g. where a document resides within the library (e.g. Human Resources/Staff Site/policies). This is vital info!

So, in summary:

(1) Why do we get an Unauthorized error? This might have interrupted the backup. We have no way to tell if the backup complete dor not.

(2) Why is the category info missing from the .dl files that contain the metadata?

Tags: summer10
Revision history for this message
Kit Blake (kitblake) wrote :

This needs investigation.

Revision history for this message
Richard H. (richard-hewison) wrote :

I should point out that the oaibackup script was run on our test server rather than the live. I don't see how that would make a difference, but I'd best mention it just in case.

There was also a few other oddities in the resulting .dl text files where it lists other user's names within <dl:group> tags and I can't see the relevance when looking at the document in the actual library.

I will see if we can officially ask for time to be spent on this and I willo get back to you asap.

Kit Blake (kitblake)
tags: added: summer10
Revision history for this message
Sylvain Viollon (thefunny) wrote :

I improved the oaibackup script:

- The unauthorized error happened while downloading a file referenced in a record of the OAI feed. I improve the download step not to fail, but to collect errors and display them at the end of the script, with more information (which URL was concerned about the problem, what is the problem). With this feature you will be able to process all the feed, i.e. make a full backup, and have a summary of the files you could not download.

- Categories are implemented in the document library as OAI sets (standard OAI feature). Those were not saved before. I added the backup of the sets in a file called sets.oai, which list all available categories associated a unique identifier. After each record dump include in the header to which sets they belongs to. This set identifier stays unique and make possible to rename / re-title categories without problems on the client side. All those files (sets, record dump) are valid XML (that means it is easy for you to reuse information from it after).

After you tested the new version of this script, we will know which record (file) in the Document Library triggered the unauthorized error you got, and check why you don't have access rights to that file. (We can't do it now, since we don't know which file triggered that error).

Changed in documentlibrary:
assignee: nobody → Sylvain Viollon (thefunny)
importance: Undecided → Medium
milestone: none → 1.6
status: New → Fix Committed
Revision history for this message
Sylvain Viollon (thefunny) wrote :

Not here

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.