Loosing connections with "Connection reset by peer" message

Bug #522819 reported by GregoryHuey
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openssh (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

lsb_release -rd
Description: Ubuntu 9.10
Release: 9.10

ssh (openssh 1:5.1p1-6ubuntu2)

What should happen:
I log into a remote host from my Ubuntu machine via ssh. That login should remain - the connection should be maintained, and I should be able to return to it later and type commands into the shell, see their output, etc. That is, the login shell should remain viable over time.

What does happen:
After some time - minutes to hours - the connection is lost, with an error message like:

Read from remote host isildur: Connection reset by peer
Connection to isildur closed.

This is apparently at random. It does not happen to all the ssh logins at once. But it does happen to all of them eventually - and the connections are to many different machines, so I know its not a problem specific to a single remote host that I am ssh-ing into. Two of the remote machines I've seen this happen with are other machines I own and control, and they are on a local network merely 3 feet from the Ubuntu machine. Its clear that the problem lies with the Ubuntu machine. This is a new problem also - note that it only started recently - after the most recent update manager update of the Ubuntu box.

I know this might not be a ssh bug, but rather a bug in the underlying networking infrastructure - but I don't know how to track it down. I've looked at all the log files in /var/log, and see no error messages that indicate why this is happening.

I may have not given you enough info to solve this problem, but I need to be told how and where to collect further info - like where the applicable log file can be found.

ProblemType: Bug
Architecture: amd64
Date: Tue Feb 16 11:36:19 2010
DistroRelease: Ubuntu 9.10
InstallationMedia: Ubuntu 9.10 "Karmic Koala" - Release amd64 (20091027)
NonfreeKernelModules: nvidia
Package: ssh (not installed)
ProcEnviron:
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.31-19.56-generic
SourcePackage: openssh
Uname: Linux 2.6.31-19-generic x86_64

Revision history for this message
GregoryHuey (ghlaunchpad-ubuntubugs) wrote :
Download full text (9.3 KiB)

I have more info to add.
This bug seems to involve wpa_supplicant. I have wpa_supplicant version 0.6.9-3ubuntu1 installed on the Ubuntu box in question. Internet traffic is over wired ethernet (eth0). ifconfig eth0 and route -n both report what one expects. However, I noticed wpa_supplicant was running. Why is wpa_supplicant running when the wireless device is not active? (I am using _only_ wired ethernet, eth0, at the moment). I noticed the following in /var/log/syslog :
Feb 16 15:44:43 fingon wpa_supplicant[30855]: WPS-AP-AVAILABLE
Feb 16 15:45:43 fingon wpa_supplicant[30855]: CTRL-EVENT-SCAN-RESULTS
Feb 16 15:45:43 fingon wpa_supplicant[30855]: WPS-AP-AVAILABLE
Feb 16 15:46:43 fingon wpa_supplicant[30855]: CTRL-EVENT-SCAN-RESULTS
Feb 16 15:46:43 fingon wpa_supplicant[30855]: WPS-AP-AVAILABLE
Feb 16 15:47:43 fingon wpa_supplicant[30855]: CTRL-EVENT-SCAN-RESULTS
Feb 16 15:47:43 fingon wpa_supplicant[30855]: WPS-AP-AVAILABLE
Feb 16 15:48:43 fingon wpa_supplicant[30855]: CTRL-EVENT-SCAN-RESULTS
Feb 16 15:48:43 fingon wpa_supplicant[30855]: WPS-AP-AVAILABLE
Feb 16 15:49:43 fingon wpa_supplicant[30855]: CTRL-EVENT-SCAN-RESULTS
Feb 16 15:49:43 fingon wpa_supplicant[30855]: WPS-AP-AVAILABLE
Feb 16 15:50:43 fingon wpa_supplicant[30855]: CTRL-EVENT-SCAN-RESULTS
Feb 16 15:50:43 fingon wpa_supplicant[30855]: WPS-AP-AVAILABLE
Feb 16 15:51:43 fingon wpa_supplicant[30855]: CTRL-EVENT-SCAN-RESULTS
Feb 16 15:51:43 fingon wpa_supplicant[30855]: WPS-AP-AVAILABLE

ps -auxwww | grep wpa_supplicant yields:
root 30855 0.0 0.0 28328 2676 ? S 13:32 0:00 /sbin/wpa_supplicant -u -s

So why is it running?
I did a kill -9 30855, and that process died, but a new one was created (by NetworkManager ?) This happened each time I killed the current wpa_supplicant process, until I killed them in rapid succession (with many killall -9 wpa_supplicant). That does finally kill wpa_supplicant _without_ it getting immediately restarted, but now the NetworkManager icon is gone from the toolbar at the botton of the screen (however, a ps -auxwww | grep NetworkManager reveals it is still running). The network device, eth0, was shut down, and the route table flushed. So, I manually restarted eth0 with ifconfig, and recreated the route table. I had network access. I logged into other machines from the Ubuntu box in question, and the remote sessions did not die as before. So, getting rid of wpa_supplicant apparently solved one problem, but now NetworkManager is screwed - here is part of /var/log/messages:

Feb 16 13:19:39 fingon kernel: [907935.419193] Registered led device: iwl-phy0::radio
Feb 16 13:19:39 fingon kernel: [907935.419219] Registered led device: iwl-phy0::assoc
Feb 16 13:19:39 fingon kernel: [907935.419242] Registered led device: iwl-phy0::RX
Feb 16 13:19:39 fingon kernel: [907935.419262] Registered led device: iwl-phy0::TX
Feb 16 13:19:39 fingon kernel: [907935.472428] ADDRCONF(NETDEV_UP): wlan0: link is not ready
Feb 16 13:19:40 fingon kernel: [907936.785985] Registered led device: iwl-phy0::radio
Feb 16 13:19:40 fingon kernel: [907936.786016] Registered led device: iwl-phy0::assoc
Feb 16 13:19:40 fingon kernel: [907936.786038] Register...

Read more...

Revision history for this message
GregoryHuey (ghlaunchpad-ubuntubugs) wrote :

Oh, one more thing. Normally the desk background shows a different Hubble (?) image, changing about every 10 or 15 minutes. That has stopped. Now its keeping the most recent background image - the image is not changing to the next in the sequence. What the heck is going on?

Revision history for this message
Thierry Carrez (ttx) wrote :

Not a bug in openssh, works fine everywhere. Looks like an issue with your local network (especially as other things, like the background download, also fail).

Changed in openssh (Ubuntu):
status: New → Invalid
Revision history for this message
GregoryHuey (ghlaunchpad-ubuntubugs) wrote :
Download full text (4.8 KiB)

No

This is certainly not a local network issue.

I am continuing to see network connections from this machine to another machine on the same local network being lost. The network connections are over wired ethernet (device eth0). I also have wireless on this laptop (a Thinkpad W700ds) which I sometimes use (device wlan0). When my network connections are dying I am using only eth0. wlan0 is not being used, but its not deactivated either (that is, I have not done a ifconfig wlan0 down).

 I think the problem might be due to a problem with wpa_supplicant (or, perhaps this is a second problem with a common cause). I get the following messages in /var/log/syslog and daemon.log :

Mar 6 21:18:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:18:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:19:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:19:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:20:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:20:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:21:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:21:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:22:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:22:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:23:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:23:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:24:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:24:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:25:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:25:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:26:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:26:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:27:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:27:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:28:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:28:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE
Mar 6 21:29:12 fingon wpa_supplicant[1379]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:29:12 fingon wpa_supplicant[1379]: WPS-AP-AVAILABLE

and

Mar 6 21:29:43 fingon wpa_supplicant[5778]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:29:43 fingon wpa_supplicant[5778]: WPS-AP-AVAILABLE
Mar 6 21:29:43 fingon wpa_supplicant[5778]: Failed to initiate AP scan.
Mar 6 21:29:48 fingon wpa_supplicant[5778]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:29:48 fingon wpa_supplicant[5778]: WPS-AP-AVAILABLE
Mar 6 21:30:07 fingon wpa_supplicant[5778]: CTRL-EVENT-SCAN-RESULTS
Mar 6 21:30:07 fingon wpa_supplicant[5778]: WPS-AP-AVAILABLE
Mar 6 21:30:34 fingon wpa_supplicant[5778]: Failed to initiate AP scan.
Mar 6 21:31:14 fingon wpa_supplicant[5778]: Failed to initiate AP scan.
Mar 6 21:32:04 fingon wpa_supplicant[5778]: Failed to initiate AP scan.
Mar 6 21:33:04 fingon wpa_supplicant[5778]: Failed to initiate AP scan.

and when I turn off the wifi with a manual switch on the machine, I see the following in
syslog:

Mar 6 21:32:04 fingon wpa_supplicant[5778]: Fail...

Read more...

Changed in openssh (Ubuntu):
status: Invalid → Incomplete
Revision history for this message
GregoryHuey (ghlaunchpad-ubuntubugs) wrote :
Download full text (11.0 KiB)

I set LogLevel in the ssh and sshd config files to "debug3" in an attempt to discover the cause for the network connection being dropped. I see alot of debugging info, butnothing jumps out at me as a possible cause. The sshd info was not logged to a file that I could find, but here is the output of the ssh client. Note that this time the connections all died simultaneously, and within only a few minutes of being initiated.

ggh@fingon:~$ ssh -l hgreg isildur
debug2: ssh_connect: needpriv 0
debug1: Connecting to isildur [69.12.176.89] port 22.
debug1: Connection established.
debug1: identity file /home/ggh/.ssh/identity type -1
debug3: Not a RSA1 key file /home/ggh/.ssh/id_rsa.
debug2: key_type_from_name: unknown key type '-----BEGIN'
debug3: key_read: missing keytype
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug3: key_read: missing whitespace
debug2: key_type_from_name: unknown key type '-----END'
debug3: key_read: missing keytype
debug1: identity file /home/ggh/.ssh/id_rsa type 1
debug1: Checking blacklist file /usr/share/ssh/blacklist.RSA-4096
debug1: Checking blacklist file /etc/ssh/blacklist.RSA-4096
debug1: identity file /home/ggh/.ssh/id_dsa type -1
debug1: Remote protocol version 2.0, remote software version OpenSSH_4.2
debug1: match: OpenSSH_4.2 pat OpenSSH_4*
debug1: Enabling compatibility mode for protocol 2.0
debug1: Local version string SSH-2.0-OpenSSH_5.1p1 Debian-6ubuntu2
debug2: fd 3 setting O_NONBLOCK
debug1: SSH2_MSG_KEXI...

Revision history for this message
Thierry Carrez (ttx) wrote :

Re: "background image", not "background download":
I was assuming this changing Hubble background image was downloaded from somewhere on the Internet. So you mean there is no download, it's changing background image over a set of locally-installed images, and that started to fail as well ?

Revision history for this message
GregoryHuey (ghlaunchpad-ubuntubugs) wrote :

Well, here is something strange. The machine in question is on the network via a Network Manager, connection "eth0 home". "eth0 home" is a static-IP configuration for eth0 - the device IP, its netmask, router, DNSes, etc are all given explicitly in Network Manager for this connection. The IP is supposed to be 69.12.176.91. However, I noticed that instead the IP was 69.12.176.95. This is last IP of the 8-IP allowed range on my home network. It is what dhcp would have given, I think. But, I verified dhclient is not running - "killall dhclient" yields "dhclient: no process found". I also verified that in Network Manager, the active connection is "eth0 home" under "wired networks". There is no way the eth0 IP should have been 69.12.176.95. I disconnected "eth0 home", and then reconnected it (under Network Manager), and the IP went to 69.12.176.91. The problem with my machine dropping connections - ssh logins to remote hosts - seems to have stopped. I have now gone 12hrs without a dropped connection (ie: the "Read from remote host isildur: Connection reset by peer"). So there is nowtwo questions: How did this happend? and How would it cause the dropped connections?

I do use dhclient when I use this laptop with WiFi networks, and also one wired network - but never for my home network, "eth0 home", which is where this problem was happening - and dhclient was not running. How did the IP get set to the wrong value? Does hibernate & thaw kill a running version of dhclient when it shuts down the network connection(s)? If dhclient survived a hibernate & thaw, could it have changed the eth0 IP, then exited, leaving no trace that it was responsible?

Even so, simply having the wrong IP on eth0 shouldn't cause the network connections to be periodically dropped, or so I would think. The only issue I can imagine is that hostname reports fingon.cosmology.name, and nslookup of fingon.cosmology.name yields 69.12.176.91. Could having the wrong IP for one's hostname cause network connections to be dropped? I would not think so. But, clearly something strange & buggy is going on.

Anyone have any insight?

Thanks,
Greg Huey

Revision history for this message
Thierry Carrez (ttx) wrote :

Would 69.12.176.95 be the broadcast address on your network ? Or would 69.12.176.95 also be attributed to something else on your network ?

Revision history for this message
GregoryHuey (ghlaunchpad-ubuntubugs) wrote :

Yes, 69.12.176.95 is the broadcast address for my home network. I don't have a good idea what would happen if one attempted to use the broadcast address for the IP of a machine, but I imagine it might have problems like dropped connections. Ok, that might be half the mystery. The other half is how/why did Network Manager set the IP on eth0 to the broadcast address instead of 69.12.176.91 (which is static)? I'm paying close attention now to what the IP on eth0 gets set to when I enable "eth0 Home" under Network Manager. So far this problem has not repeated.

Thanks,
Greg

Revision history for this message
Thierry Carrez (ttx) wrote :

At least that explains the openssh part of it, marking Invalid. You might want to open a bug over NetworkManager if you can reproduce weird behavior on that side.

Changed in openssh (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.