imports can hang when talking to free.hands.com

Bug #589534 reported by Robert Collins on 2010-06-04
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Distributed Development
Critical
Unassigned

Bug Description

1000 26624 0.0 0.0 3944 564 ? S May30 0:00 \_ /bin/sh -c import_package.py exactimage
1000 26625 0.0 0.1 185336 43380 ? S May30 0:00 | \_ /usr/bin/python /srv/package-import.canonical.com/new/scripts/import_package.py exactimage

 strace -p 26625
Process 26625 attached - interrupt to quit
recvfrom(16,

import_pa 26625 pkg_import 16u IPv4 194966901 TCP jubany.canonical.com:40179->free.hands.com:www (ESTABLISHED)

1000 26685 0.0 0.0 3944 564 ? S May30 0:00 \_ /bin/sh -c import_package.py otags
1000 26686 0.0 0.1 184852 42940 ? S May30 0:00 | \_ /usr/bin/python /srv/package-import.canonical.com/new/scripts/import_package.py otags

import_pa 26686 pkg_import 13u IPv4 194967021 TCP jubany.canonical.com:40182->free.hands.com:www (ESTABLISHED)

On Fri, 04 Jun 2010 05:25:39 -0000, Robert Collins <email address hidden> wrote:
> Public bug reported:
>
> 1000 26624 0.0 0.0 3944 564 ? S May30 0:00 \_ /bin/sh -c import_package.py exactimage
> 1000 26625 0.0 0.1 185336 43380 ? S May30 0:00 | \_ /usr/bin/python /srv/package-import.canonical.com/new/scripts/import_package.py exactimage
>
> strace -p 26625
> Process 26625 attached - interrupt to quit
> recvfrom(16,
>
> import_pa 26625 pkg_import 16u IPv4 194966901 TCP
> jubany.canonical.com:40179->free.hands.com:www (ESTABLISHED)
>
> 1000 26685 0.0 0.0 3944 564 ? S May30 0:00 \_ /bin/sh -c import_package.py otags
> 1000 26686 0.0 0.1 184852 42940 ? S May30 0:00 | \_ /usr/bin/python /srv/package-import.canonical.com/new/scripts/import_package.py otags
>
> import_pa 26686 pkg_import 13u IPv4 194967021 TCP
> jubany.canonical.com:40182->free.hands.com:www (ESTABLISHED)

In particular this is the machine that we pull Debian packages from, so
it is either getting the Sources files, or a particular source package
when this happens.

What strategies would be good for preventing this from happening aside
from threads/similar and timeouts?

We should also look to see if we can work out what went wrong in this
case and fix that cause.

Thanks,

James

Well, for recv to be hung, we need the tcp session to appear live and
the sender to not send us data.

Some possibilities:
 - its not hung, they are deliberately drip feeding us
 - the sender has died but some networking config has prevented us
noticing (e.g. tcp keepalive is off and the other machine rebooted
without shutting down the links, or a firewall in the middle was
rebooted and lost its state, and because we're only receiving, we
don't get notified that its rejecting packets.)

For the former, asking them is best, for all of the latter cases,
making sure we have a tcp keepalive that is set nice and low would be
a good strategy: we don't want any idle connections in this service,
so setting it to (say) 5 minutes might help.

On Fri, 04 Jun 2010 19:52:50 -0000, Robert Collins <email address hidden> wrote:
> Well, for recv to be hung, we need the tcp session to appear live and
> the sender to not send us data.
>
> Some possibilities:
> - its not hung, they are deliberately drip feeding us
> - the sender has died but some networking config has prevented us
> noticing (e.g. tcp keepalive is off and the other machine rebooted
> without shutting down the links, or a firewall in the middle was
> rebooted and lost its state, and because we're only receiving, we
> don't get notified that its rejecting packets.)
>
> For the former, asking them is best, for all of the latter cases,
> making sure we have a tcp keepalive that is set nice and low would be
> a good strategy: we don't want any idle connections in this service,
> so setting it to (say) 5 minutes might help.

That's at the software level, or configuration?

It's using urllib2 to get the lists, and bzr transports for the
packages.

Thanks,

James

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers