supermirror-pull mirror sometimes hangs
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Launchpad itself |
Fix Released
|
High
|
Michael Hudson-Doyle |
Bug Description
We thought that not using pycurl any more might fix this, but it seems not.
Here's what gdb thinks we're doing:
(gdb) pystack
/usr/lib/
/usr/lib/
/usr/lib/
/usr/lib/
/usr/lib/
/usr/lib/
/srv/sm-
/srv/sm-
/usr/lib/
/usr/lib/
/usr/lib/
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
/srv/sm-
(gdb) bt
#0 0xb7fca1ee in __read_nocancel () from /lib/tls/
#1 0xb7c5936b in BIO_sock_
#2 0xb7c57013 in BIO_read () from /usr/lib/
#3 0xb7d14f0e in ssl23_read_bytes () from /usr/lib/
#4 0xb7d1426d in ssl23_connect () from /usr/lib/
#5 0xb7d205f4 in SSL_connect () from /usr/lib/
#6 0xb7d39c17 in PySocket_ssl (self=0x0, args=0xfffffe00) at /build/
#7 0x080b63c7 in PyEval_EvalFrame (f=0x841cf1c) at ../Python/
[...]
lsof says we're connected to i5387A6BB.
I don't really know how to go about fixing this, or even diagnosing it properly. Some kind of watchdog script that killed any supermirror-pull processes older than an hour or something might do it.
Changed in launchpad-bazaar: | |
assignee: | nobody → jml |
importance: | Undecided → High |
status: | New → Incomplete |
Changed in launchpad-bazaar: | |
status: | Triaged → In Progress |
Changed in launchpad-bazaar: | |
milestone: | 1.1.11 → 1.1.12 |
Changed in launchpad-bazaar: | |
assignee: | nobody → mwhudson |
Changed in launchpad-bazaar: | |
status: | Fix Committed → Fix Released |
"I said it so", when we wrote the branch-puller script I argued that the actual bzr pull should be done in a subprocess, so we can 1. easily kill it if it hangs 2. happily go forward if it fail with any exception whatsoever.
I still think this would be the right way to handle this problem.