KeyboardInterrupt thrown by urlgrab

Bug #776555 reported by Andrew
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
urlgrabber (Fedora)
Won't Fix
Undecided
urlgrabber (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

TypeError: start() got an unexpected keyword argument 'size'
Exception in thread Thread-2:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
    self.run()
  File "/home/anrobins/development/jdev-manager/downloadUi.py", line 314, in run
    self._control._downloader.start()
  File "/home/anrobins/development/jdev-manager/download.py", line 53, in start
    progress_obj = self._progress_obj)
  File "/usr/lib/pymodules/python2.7/urlgrabber/grabber.py", line 618, in urlgrab
    return default_grabber.urlgrab(url, filename, **kwargs)
  File "/usr/lib/pymodules/python2.7/urlgrabber/grabber.py", line 982, in urlgrab
    return self._retry(opts, retryfunc, url, filename)
  File "/usr/lib/pymodules/python2.7/urlgrabber/grabber.py", line 886, in _retry
    r = apply(func, (opts,) + args, {})
  File "/usr/lib/pymodules/python2.7/urlgrabber/grabber.py", line 968, in retryfunc
    fo = PyCurlFileObject(url, filename, opts)
  File "/usr/lib/pymodules/python2.7/urlgrabber/grabber.py", line 1063, in __init__
    self._do_open()
  File "/usr/lib/pymodules/python2.7/urlgrabber/grabber.py", line 1351, in _do_open
    self._do_grab()
  File "/usr/lib/pymodules/python2.7/urlgrabber/grabber.py", line 1481, in _do_grab
    self._do_perform()
  File "/usr/lib/pymodules/python2.7/urlgrabber/grabber.py", line 1276, in _do_perform
    raise KeyboardInterrupt
KeyboardInterrupt

This code was working in maverick with python 2.6. It looks like grabber.py is calling the curl_obj with incorrect arguments.

This could be related to bug 776500:
https://bugs.launchpad.net/bugs/776500

Before I was using "check_timestamp", now I am trying "simple"

ProblemType: Bug
DistroRelease: Ubuntu 11.04
Package: python-urlgrabber 3.9.1-4
ProcVersionSignature: Ubuntu 2.6.38-9.43-generic 2.6.38.4
Uname: Linux 2.6.38-9-generic x86_64
Architecture: amd64
Date: Tue May 3 10:47:55 2011
InstallationMedia: Ubuntu 11.04 "Natty Narwhal" - Release amd64 (20110427.1)
PackageArchitecture: all
ProcEnviron:
 LANGUAGE=en_US:en
 PATH=(custom, user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: urlgrabber
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Andrew (andrew-rw-robinson) wrote :
Revision history for this message
Andrew (andrew-rw-robinson) wrote :

Update:

The problem is that the code is not matching the documentation.

In the python docs for grabber (pydoc urlgrabber.grabber), the progress_obj is clearly documented as to what arguments are supported:

      progress_obj = None

        a class instance that supports the following methods:
          po.start(filename, url, basename, length, text)
          # length will be None if unknown
          po.update(read) # read == bytes read so far
          po.end()

But, in the actual source shipped with natty, this is the code that calls this method:

    def _retrieve(self, buf):
        try:
            if not self._prog_running:
                if self.opts.progress_obj:
                    size = self.size + self._reget_length
                    self.opts.progress_obj.start(self._prog_reportname,
                                                 urllib.unquote(self.url),
                                                 self._prog_basename,
                                                 size=size,
                                                 text=self.opts.text)
                    self._prog_running = True
                    self.opts.progress_obj.update(self._amount_read)

            self._amount_read += len(buf)
            self.fo.write(buf)
            return len(buf)
        except KeyboardInterrupt:
            return -1

Note how "size" and "text" are passed as named arguments instead of positional arguments?

The correct code should be:

                    self.opts.progress_obj.start(self._prog_reportname,
                                                 urllib.unquote(self.url),
                                                 self._prog_basename,
                                                 size,
                                                 self.opts.text)

This would correctly match the documentation.

Alternative would be to use length=size and not size=size in the code as the documentation shows the parameter name as "length", not "size".

Revision history for this message
Andrew (andrew-rw-robinson) wrote :

Work-around is to define the start method as:
  def start(self, filename, url, basename, size, text):
    length = size
    ...
instead of:
  def start(self, filename, url, basename, length, text):
    ...

Revision history for this message
Andrew (andrew-rw-robinson) wrote :

Another work-around that should work with the correct code and the broken code:

  def start(self, filename, url, basename, *args, **kwargs):
    if len(args) >= 1:
      length = args[0]
      if len(args) >= 2:
        text = args[1]
      else:
        text = kwargs.get('text')
    else:
      length = kwargs.get('size') if kwargs.has_key('size') else kwargs.get('length')
      text = kwargs.get('text')

Revision history for this message
In , Andrew (andrew-redhat-bugs) wrote :

Created attachment 497172
Patch to fix the bug

Description of problem:
I found the bug in Ubuntu Natty:
https://bugs.launchpad.net/ubuntu/+source/urlgrabber/+bug/776555

The progress_obj is documented as:
po.start(filename, url, basename, length, text)

But it is called as:
po.start(filename, url, basename, size=..., text=...)

Regression (worked fine in Ubuntu Maverick)

Version-Release number of selected component (if applicable):
Ubuntu version of python-urlgrabber: 3.9.1-4

How reproducible:
100% reproducible

Steps to Reproduce:
1. Pass a progress_obj to the grabber
2. Implement the documented API
3. Code fails when called incorrectly by grabber.py

Actual results:
Code fails with a KeyboardInterrupt

Expected results:
Should work as documented, and as it used to work

Additional info:
See details in ubuntu bug:
https://bugs.launchpad.net/ubuntu/+source/urlgrabber/+bug/776555

Revision history for this message
Andrew (andrew-rw-robinson) wrote :

Since there is no activity from the maintainer yet, I filed an upstream bug:

https://bugzilla.redhat.com/show_bug.cgi?id=702457

tags: added: patch
Revision history for this message
In , James (james-redhat-bugs) wrote :

This code has been like this pretty much forever (2005 the text param. was added), I can see how you might be confused if you just looked at the documentation though. Alas. we can't just remove the param name from the call, as the API has other params:

    def start(self, filename=None, url=None, basename=None,
              size=None, now=None, text=None):

...so our only sane option is to treat it as a documentation bug, and fix that. Sorry, if that doesn't help you much.

Daniel T Chen (crimsun)
Changed in urlgrabber (Ubuntu):
status: New → Triaged
Revision history for this message
Daniel T Chen (crimsun) wrote :

Upstream has decided to fix the documentation instead; see changeset 674d545ee303aa99701ffb982536851572d8db77.

Changed in urlgrabber (Ubuntu):
status: Triaged → Won't Fix
Revision history for this message
In , James (james-redhat-bugs) wrote :

The documentation was fixed upstream.

This request was evaluated by Red Hat Engineering for inclusion in a Red
Hat Enterprise Linux maintenance release.

Red Hat does not currently plan to provide this change in a Red Hat
Enterprise Linux update release for currently deployed products.

With the goal of minimizing risk of change for deployed systems, and in
response to customer and partner requirements, Red Hat takes a
conservative approach when evaluating enhancements for inclusion in
maintenance updates for currently deployed products. The primary
objectives of update releases are to enable new hardware platform
support and to resolve critical defects.

Changed in urlgrabber (Fedora):
importance: Unknown → Undecided
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.