MemoryError - recv for HTTP through Proxy

Bug #215426 reported by Eric Holmberg
36
Affects Status Importance Assigned to Milestone
Bazaar
Fix Released
Medium
Eric Holmberg

Bug Description

When using a repository with a 4MB pack where the traffic has to go through a proxy server, the proxy server fragments the HTTP packets into 1460-byte blocks (as sniffed with Ethereal/Wireshark). This results in a very large number of HTTP Continuation packets when bzr is downloading the 4MB pack. Eventually, bzr runs out of memory. Actual memory by bzr is only 24MB with more than 500MB available (total of 2GB in system), so the memory error seems to be more of a fixed-size buffer issue.

OS: Windows XP
BZR versions: All (have tried >= 1.2 and most recently 1.3)
Python version: 2.5

bzr: ERROR: exceptions.MemoryError:

Traceback (most recent call last):
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\commands.
py", line 834, in run_bzr_catch_errors
    return run_bzr(argv)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\commands.
py", line 790, in run_bzr
    ret = run(*run_argv)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\commands.
py", line 492, in run_argv_aliases
    return self.run(**all_cmd_args)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\builtins.
py", line 927, in run
    hardlink=hardlink)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\bzrdir.py
", line 941, in sprout
    revision_id=revision_id)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\decorator
s.py", line 127, in read_locked
    return unbound(self, *args, **kwargs)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\repositor
y.py", line 1036, in sprout
    dest_repo.fetch(self, revision_id=revision_id)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\repositor
y.py", line 949, in fetch
    return inter.fetch(revision_id=revision_id, pb=pb, find_ghosts=find_ghosts)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\decorator
s.py", line 165, in write_locked
    return unbound(self, *args, **kwargs)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\repositor
y.py", line 2759, in fetch
    revision_ids).pack()
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\repofmt\p
ack_repo.py", line 589, in pack
    return self._create_pack_from_packs()
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\repofmt\p
ack_repo.py", line 722, in _create_pack_from_packs
    self._copy_text_texts()
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\repofmt\p
ack_repo.py", line 686, in _copy_text_texts
    self.new_pack.text_index, readv_group_iter, total_items))
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\repofmt\p
ack_repo.py", line 807, in _copy_nodes_graph
    write_index, output_lines, pb, readv_group_iter, total_items):
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\repofmt\p
ack_repo.py", line 830, in _do_copy_nodes_graph
    izip(reader.iter_records(), node_vector):
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\pack.py",
 line 272, in _iter_records
    for record in self._iter_record_objects():
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\pack.py",
 line 277, in _iter_record_objects
    record_kind = self.reader_func(1)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\pack.py",
 line 218, in reader_func
    return self._source.read(length)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\pack.py",
 line 177, in read
    self._next()
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\pack.py",
 line 172, in _next
    length, data = self.readv_result.next()
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\transport
\http\__init__.py", line 250, in _readv
    data = rfile.read(size)
  File "c:\python25\lib\site-packages\bzr-1.3.1-py2.5-win32.egg\bzrlib\transport
\http\response.py", line 209, in read
    data = self._file.read(limited)
  File "C:\Python25\lib\socket.py", line 308, in read
    data = self._sock.recv(recv_size)
  File "C:\Python25\lib\httplib.py", line 529, in read
    s = self.fp.read(amt)
  File "C:\Python25\lib\socket.py", line 308, in read
    data = self._sock.recv(recv_size)
MemoryError

Revision history for this message
Andrew Bennetts (spiv) wrote :

This *might* be similar to bug 115781, specifically <https://bugs.edge.launchpad.net/bzr/+bug/115781/comments/10>? i.e. maybe we should never try to read more than than 64k from a socket on windows? The fact that the bzr process is only taking 24MB does make it sound like it might be a kernel buffer issue. This is just guesswork, though.

Revision history for this message
Eric Holmberg (eholmberg) wrote :

That's a good idea -- it does look similar. I'll take a look tomorrow (Friday) and see if I find anything.

Another piece of information is that I'm using NTLM Proxy (http://ntlmaps.sourceforge.net/) and if I enabled debugging, the problem goes away. I am serving the files through a Pylons app, but also tried Apache 2.x for a sanity check. Pulling the same repository through Linux seems to be fine.

Revision history for this message
Eric Holmberg (eholmberg) wrote :

There is another bug that looks related - http://bugs.launchpad.net/bzr/+bug/198727.

In addition, Python seems to have a few bugs that look like they have been resolved, but not yet released.

 * http://bugs.python.org/issue1092502
 * http://bugs.python.org/issue1389051

Version tried:
 * Fails: Windows Python 2.5 (r25:51908, Sep 19 2006, 09:52:17)
 * Fails: Windows Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45)
 * OK: Linux Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02)
 * OK: Linux Python 2.5.2 (r252:60911, Mar 12 2008, 13:36:25)

Here is a temporary-work-around to the that I'm using (I have also attached the patch to this posting). Please let me know if you would like me do to anything.

Regards,

Eric

--- bzrlib\transport\http\response.original.py Fri Apr 11 14:05:55 2008
+++ bzrlib\transport\http\response.py Fri Apr 11 14:07:20 2008
@@ -206,7 +206,18 @@
             limited = self._start + self._size - self._pos
             if size >= 0:
                 limited = min(limited, size)
- data = self._file.read(limited)
+
+ lst = []
+ while limited > 0:
+ # limit reads to 4 MB for Windows problem
+ # See bug: http://bugs.launchpad.net/bzr/+bug/215426
+ # nBytesToRead = min(limited,1024*1024*4 + 1024*100) OK
+ # nBytesToRead = min(limited,1024*1024*4 + 1024*128) Fails
+ nBytesToRead = min(limited,1024*1024*4)
+ lst.append(self._file.read(nBytesToRead))
+ limited -= nBytesToRead
+ data = ''.join(lst)
+
         else:
             # Size of file unknown, the user may have specified a size or not,
             # we delegate that to the filesocket object (-1 means read until

Revision history for this message
Alexander Belchenko (bialix) wrote : Re: [Bug 215426] Re: MemoryError - recv for HTTP through Proxy

Please, send your patch to Bazaar ML with prefix [MERGE] in the subject line.

Eric Holmberg пишет:
> There is another bug that looks related -
> http://bugs.launchpad.net/bzr/+bug/198727.
>
> In addition, Python seems to have a few bugs that look like they have
> been resolved, but not yet released.
>
> * http://bugs.python.org/issue1092502
> * http://bugs.python.org/issue1389051
>
> Version tried:
> * Fails: Windows Python 2.5 (r25:51908, Sep 19 2006, 09:52:17)
> * Fails: Windows Python 2.5.2 (r252:60911, Feb 21 2008, 13:11:45)
> * OK: Linux Python 2.4.4c1 (#2, Oct 11 2006, 21:51:02)
> * OK: Linux Python 2.5.2 (r252:60911, Mar 12 2008, 13:36:25)
>
> Here is a temporary-work-around to the that I'm using (I have also
> attached the patch to this posting). Please let me know if you would
> like me do to anything.
>
> Regards,
>
> Eric
>
> --- bzrlib\transport\http\response.original.py Fri Apr 11 14:05:55 2008
> +++ bzrlib\transport\http\response.py Fri Apr 11 14:07:20 2008
> @@ -206,7 +206,18 @@
> limited = self._start + self._size - self._pos
> if size >= 0:
> limited = min(limited, size)
> - data = self._file.read(limited)
> +
> + lst = []
> + while limited > 0:
> + # limit reads to 4 MB for Windows problem
> + # See bug: http://bugs.launchpad.net/bzr/+bug/215426
> + # nBytesToRead = min(limited,1024*1024*4 + 1024*100) OK
> + # nBytesToRead = min(limited,1024*1024*4 + 1024*128) Fails
> + nBytesToRead = min(limited,1024*1024*4)
> + lst.append(self._file.read(nBytesToRead))
> + limited -= nBytesToRead
> + data = ''.join(lst)
> +
> else:
> # Size of file unknown, the user may have specified a size or not,
> # we delegate that to the filesocket object (-1 means read until
>
>
> ** Attachment added: "BZR-1.4rc1 Bug 215426 Patch"
> http://launchpadlibrarian.net/13364915/bzr-1.4rc1-bug215426.patch.txt
>

Revision history for this message
Eric Holmberg (eholmberg) wrote :

Patch has been submitted with subject line "[MERGE] [Bug 215426] Re: MemoryError - recv for HTTP through Proxy".

Revision history for this message
Eric Holmberg (eholmberg) wrote :

Patch #3 which includes a full unit test of the functionality (and tests that can be uncommented to reproduce the problem) have been submitted to the mailing list. Here's the link for those interested: http://bundlebuggy.aaronbentley.com/request/%3C6910002A52F85D46AF3E35ED5B254FE103FD2342%40wmhex005p.arrownao.corp.arrow.com%3E

Revision history for this message
Mark Hammond (mhammond) wrote :

FYI, it appears this problem isn't related to Windows at all - http://bugs.python.org/issue1092502 mentions MacOS and I believe another dupe (#1389051) implies Linux (but I can't explain why your Linux 2.5 test didn't see it). Also FYI, http://bugs.python.org/issue2632 is tracking a regression caused by the fix to 1092502, so the issue really isn't fixed upstream yet at all :( It might be worth updating the comments to reflect the above, and mention it can (theoretically) be removed once python bug 2632 is fixed and suitably old enough...

Revision history for this message
Eric Holmberg (eholmberg) wrote :

Thanks for the update Mark.

I wrote a quick binary search for the failure buffer size and it depends upon the load on the TCP/IP stack at the moment the test is run. The failure on Windows seems to be related to the fact that I'm running the web server (Pylons), proxy client (NTLM Proxy), and the bzr client all on the same machine which definitely seems like a worst-case scenario since the packets end up going from bzr -> NTLM Proxy -> External Proxy Server -> NTLM Proxy -> Pylons and then back again.

I was unable to reproduce the problem under a Linux virtual-machine (running Ubuntu with the 2.6.24 i686kernel and Python 2.5.2)., but the Linux VM isn't using the proxy server, so it's not a completely fair test.

Revision history for this message
Vincent Ladeuil (vila) wrote :

Eric has a patch under review, so I assign him the bug

Changed in bzr:
importance: Undecided → Medium
status: New → In Progress
assignee: nobody → eholmberg
Revision history for this message
S. Dorscht (stdoonline) wrote :

If you need somebody to test the patch, just send it to me. Maybe it fixes https://bugs.launchpad.net/bzr/+bug/226541 shown under MacOSX.

Revision history for this message
Vincent Ladeuil (vila) wrote :

The latest version of the patch is available from:

http://bundlebuggy.aaronbentley.com/request/%<email address hidden>%3E

Revision history for this message
Andrew Bennetts (spiv) wrote :

Eric's fix has landed in bzr.dev, and will be in the 1.6 release.

Thanks Eric!

Changed in bzr:
milestone: none → 1.6
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.