Loosing connections with "Connection reset by peer" message
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openssh (Ubuntu) |
Invalid
|
Undecided
|
Unassigned |
Bug Description
lsb_release -rd
Description: Ubuntu 9.10
Release: 9.10
ssh (openssh 1:5.1p1-6ubuntu2)
What should happen:
I log into a remote host from my Ubuntu machine via ssh. That login should remain - the connection should be maintained, and I should be able to return to it later and type commands into the shell, see their output, etc. That is, the login shell should remain viable over time.
What does happen:
After some time - minutes to hours - the connection is lost, with an error message like:
Read from remote host isildur: Connection reset by peer
Connection to isildur closed.
This is apparently at random. It does not happen to all the ssh logins at once. But it does happen to all of them eventually - and the connections are to many different machines, so I know its not a problem specific to a single remote host that I am ssh-ing into. Two of the remote machines I've seen this happen with are other machines I own and control, and they are on a local network merely 3 feet from the Ubuntu machine. Its clear that the problem lies with the Ubuntu machine. This is a new problem also - note that it only started recently - after the most recent update manager update of the Ubuntu box.
I know this might not be a ssh bug, but rather a bug in the underlying networking infrastructure - but I don't know how to track it down. I've looked at all the log files in /var/log, and see no error messages that indicate why this is happening.
I may have not given you enough info to solve this problem, but I need to be told how and where to collect further info - like where the applicable log file can be found.
ProblemType: Bug
Architecture: amd64
Date: Tue Feb 16 11:36:19 2010
DistroRelease: Ubuntu 9.10
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
NonfreeKernelMo
Package: ssh (not installed)
ProcEnviron:
PATH=(custom, no user)
LANG=en_US.UTF-8
SHELL=/bin/bash
ProcVersionSign
SourcePackage: openssh
Uname: Linux 2.6.31-19-generic x86_64
I have more info to add. 30855]: WPS-AP-AVAILABLE 30855]: CTRL-EVENT- SCAN-RESULTS 30855]: WPS-AP-AVAILABLE 30855]: CTRL-EVENT- SCAN-RESULTS 30855]: WPS-AP-AVAILABLE 30855]: CTRL-EVENT- SCAN-RESULTS 30855]: WPS-AP-AVAILABLE 30855]: CTRL-EVENT- SCAN-RESULTS 30855]: WPS-AP-AVAILABLE 30855]: CTRL-EVENT- SCAN-RESULTS 30855]: WPS-AP-AVAILABLE 30855]: CTRL-EVENT- SCAN-RESULTS 30855]: WPS-AP-AVAILABLE 30855]: CTRL-EVENT- SCAN-RESULTS 30855]: WPS-AP-AVAILABLE
This bug seems to involve wpa_supplicant. I have wpa_supplicant version 0.6.9-3ubuntu1 installed on the Ubuntu box in question. Internet traffic is over wired ethernet (eth0). ifconfig eth0 and route -n both report what one expects. However, I noticed wpa_supplicant was running. Why is wpa_supplicant running when the wireless device is not active? (I am using _only_ wired ethernet, eth0, at the moment). I noticed the following in /var/log/syslog :
Feb 16 15:44:43 fingon wpa_supplicant[
Feb 16 15:45:43 fingon wpa_supplicant[
Feb 16 15:45:43 fingon wpa_supplicant[
Feb 16 15:46:43 fingon wpa_supplicant[
Feb 16 15:46:43 fingon wpa_supplicant[
Feb 16 15:47:43 fingon wpa_supplicant[
Feb 16 15:47:43 fingon wpa_supplicant[
Feb 16 15:48:43 fingon wpa_supplicant[
Feb 16 15:48:43 fingon wpa_supplicant[
Feb 16 15:49:43 fingon wpa_supplicant[
Feb 16 15:49:43 fingon wpa_supplicant[
Feb 16 15:50:43 fingon wpa_supplicant[
Feb 16 15:50:43 fingon wpa_supplicant[
Feb 16 15:51:43 fingon wpa_supplicant[
Feb 16 15:51:43 fingon wpa_supplicant[
ps -auxwww | grep wpa_supplicant yields: supplicant -u -s
root 30855 0.0 0.0 28328 2676 ? S 13:32 0:00 /sbin/wpa_
So why is it running?
I did a kill -9 30855, and that process died, but a new one was created (by NetworkManager ?) This happened each time I killed the current wpa_supplicant process, until I killed them in rapid succession (with many killall -9 wpa_supplicant). That does finally kill wpa_supplicant _without_ it getting immediately restarted, but now the NetworkManager icon is gone from the toolbar at the botton of the screen (however, a ps -auxwww | grep NetworkManager reveals it is still running). The network device, eth0, was shut down, and the route table flushed. So, I manually restarted eth0 with ifconfig, and recreated the route table. I had network access. I logged into other machines from the Ubuntu box in question, and the remote sessions did not die as before. So, getting rid of wpa_supplicant apparently solved one problem, but now NetworkManager is screwed - here is part of /var/log/messages:
Feb 16 13:19:39 fingon kernel: [907935.419193] Registered led device: iwl-phy0::radio NETDEV_ UP): wlan0: link is not ready
Feb 16 13:19:39 fingon kernel: [907935.419219] Registered led device: iwl-phy0::assoc
Feb 16 13:19:39 fingon kernel: [907935.419242] Registered led device: iwl-phy0::RX
Feb 16 13:19:39 fingon kernel: [907935.419262] Registered led device: iwl-phy0::TX
Feb 16 13:19:39 fingon kernel: [907935.472428] ADDRCONF(
Feb 16 13:19:40 fingon kernel: [907936.785985] Registered led device: iwl-phy0::radio
Feb 16 13:19:40 fingon kernel: [907936.786016] Registered led device: iwl-phy0::assoc
Feb 16 13:19:40 fingon kernel: [907936.786038] Register...