add timeout for smart connections
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Landscape Client |
Fix Released
|
High
|
Gustavo Niemeyer | ||
Smart Package Manager |
Fix Released
|
Undecided
|
Gustavo Niemeyer | ||
smart (Ubuntu) |
Fix Released
|
Undecided
|
Unassigned | ||
Intrepid |
Fix Released
|
Undecided
|
Unassigned | ||
Jaunty |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
We got smart hanging on two machines (ls2 and ls5) with an open connection (smart update run via cron). It is still being debugged, but it looks like adding some sort of timeout for the connections would help to avoid this situation.
The Landscape team has proposed an SRU to fix this bug in intrepid and jaunty:
Statement explaining the impact
=======
This fix is critical, as we had customer installations being bitten by this bug before. Smart would just hang there, doing nothing but blocking itself. As a result, landscape would not be able to manage the packages of this machine.
How the bug has been addressed
=======
Introducing a timeout for the libcurl call.
Detailed instructions how to reproduce the bug
=======
It can be reproduced artificially by inducing a timeout by means of traffic shaping. It has been done by the Landscape QA engineer and the fix is confirmed to work as expected.
Changed in landscape: | |
assignee: | nobody → niemeyer |
importance: | Undecided → High |
milestone: | none → thames-pre-8 |
Changed in smart: | |
assignee: | nobody → niemeyer |
Changed in landscape: | |
status: | New → Fix Committed |
Changed in smart: | |
status: | Fix Committed → Fix Released |
Changed in landscape: | |
status: | Fix Committed → Fix Released |
description: | updated |
affects: | landscape → landscape-client |
Changed in landscape-client: | |
milestone: | mountainview → none |
milestone: | none → 1.3.2.1 |
So, I managed to kill the stale connection using tcpkill and some tricks, but smart update is still running: hourly/ smartpm- core
16959 ? S 0:00 \_ /USR/SBIN/CRON
16960 ? Ss 0:00 \_ /bin/sh -c cd / && run-parts --report /etc/cron.hourly
16961 ? S 0:00 \_ run-parts --report /etc/cron.hourly
16962 ? Ss 0:00 \_ /bin/sh /etc/cron.
16963 ? S 0:03 \_ /usr/bin/python /usr/bin/smart update
netstat doesn't show the connection anymore. In fact, there is no connection coming or going to the smart process.
I did it locally using smart update in one terminal and netcat in another just listening. It was hung, as expected, but when tcpkill killed the connection, smart update just finished. So perhaps there is something else going on, or in addition to.