Heavy network activity (eg: torrent/nfs file transfers) causes Hard System Locks and/or Network Freezes.

Bug #147464 reported by Jonathan Strander
178
This bug affects 24 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
High
Unassigned
Declined for Gutsy by Henrik Nilsen Omma
Nominated for Hardy by hajkos
Nominated for Intrepid by hajkos
Nominated for Lucid by RussianNeuroMancer
Nominated for Maverick by RussianNeuroMancer
nfs-utils (Ubuntu)
Invalid
Undecided
Unassigned
Declined for Gutsy by Henrik Nilsen Omma
Nominated for Hardy by hajkos
Nominated for Intrepid by hajkos
Nominated for Lucid by RussianNeuroMancer
Nominated for Maverick by RussianNeuroMancer

Bug Description

Simply downloading a torrent at high speed (speed caps mitigate this somewhat) or transferring large files at high data rates over a network (for instance transfering files to NFS mounts as some users have reported) for any period of time longer than a few minutes may cause one of two conditions to occur:

1) The system Hard Locks (no i/o response from keyboard or mouse and the display manager freezes) and needs a hard reboot.
2) The network connection freezes, but still identifies itself as having a signal and being connected. The only way I could determine to fix this was to soft-reboot the system.

This is basically a complete show-stopper bug as these are common activities.

More information:
Processor: Athlon XP 2000+
#RAM: 1GB
Video: Nvidia Geforce 6600GT using driver 100.14.19
Network: DWL-520+ run using ndiswrapper
Kernel: linux-image-rt (linux-image-2.6.22-12-rt)

This occurs even when downloading to an EXT3 partition (and thus is related to networking in general and not specifically NFS). Capping torrent bandwidth will sometimes allow hours of activity of this kind, but normally it's 15 minutes or less. The system can be frozen quickly as a test by simply opening up all available bandwidth and filling it with file transfers/downloads (hence torrents causing the crash). There are three threads (possibly more) on the Gutsy Development forums.

http://ubuntuforums.org/showthread.php?t=530107
http://ubuntuforums.org/showthread.php?t=563251
http://ubuntuforums.org/showthread.php?t=563049

It would appear the common factor is wireless networking.

Tags: lucid
description: updated
Revision history for this message
Jonathan Strander (mblackwell1024) wrote :

I have tested this with both the -rt and -generic kernels and with and without Nvidia as the display driver (in case it was some kernel module craziness). The result is the same in both, although turning off Nvidia causes the length of timeof contiguous high network activity needed before a hang to increase.

Tomorrow I will attempt to boot with an older kernel and see if I have the same result.

Beyond that I have no idea how to trace this further.

Revision history for this message
Jonathan Strander (mblackwell1024) wrote :

This was also tested with linux-image-lowlatency, kernel version 2.6.20-16, and this crash did not occur.

Revision history for this message
Deepak Manoharan (deepakm+ubuntu) wrote :

I see the same issue, Its not related to wireless though in my case. A heavy udp burst using iperf on a wired network causes gutsy to become unresponsive. To get it back to the old stage, I had to stop the traffic coming through and takes a couple of minutes to become alright.

I dont think I have had this issue with 7.04 and dont think I can try with it either.

-Deepak

Revision history for this message
Stavros Korokithakis (stavrosk) wrote :

I can confirm this. I have a server that backs up files from other computers to a NAS device (it retrieves the files through CIFS and stores on the NAS through NFS). After a bit of heavy copying, the ethernet adapter locks up (the system is responsive, though I'm not sure because it's headless, I just see log entries), but the ethernet device will not accept or emit packets. It's a pretty serious bug. I worked around it by setting large packet sizes and it worked for a bit, but now it's back to broken.

Revision history for this message
Chow Loong Jin (hyperair) wrote :

Okay, so thank goodness I'm not the only one. I was already considering a reinstall, because I couldn't find anything related to my bug for days. If it helps, the syslog doesn't show anything related to the crash, though at one point of time: I had this assertion failure.

Oct 12 16:29:03 Hyperair-PC kernel: [ 1185.457745] KERNEL: assertion (!atomic_read(&sk->sk_wmem_alloc)) failed at /build/buildd/linux-source-2.6.22-2.6.22/net/ipv4/af_inet.c (149)

Revision history for this message
Chow Loong Jin (hyperair) wrote :

Something that might be worth highlighting is that ndiswrapper (version 1.45, which is included in the current Ubuntu kernel 2.6.22-14-generic) has a bug whereby it crashes with large transfers. This has been fixed in version 1.46, and the current version of ndiswrapper is 1.48. Perhaps this version of ndiswrapper may be included into the next version of the kernel?

More info: http://ndiswrapper.sourceforge.net/joomla/index.php?/content/view/16/2/

Revision history for this message
Stavros Korokithakis (stavrosk) wrote :

I have installed gutsy server on a new pc and am not getting this problem any more. I upgraded the old pc from edgy to feisty and there was an update for nfs-utils (I think that's the name), but I haven't tested to see whether it is fixed there.

Revision history for this message
undine (niirti) wrote :

I can confirm that the same bug on my Travelmate 4402WLMI. I am also using Ndiswrapper for my BCM4318.

Revision history for this message
Chow Loong Jin (hyperair) wrote :

Okay, so I tried using the 4.18 ndiswrapper driver, but the same error still happened. So I'm using the ACX driver for my DWL-520+ I'll confirm tomorrow if the lockup still occurs.

Revision history for this message
Chow Loong Jin (hyperair) wrote :

Make that 1.48, I mistyped it ><

Revision history for this message
sanjuan (eric-sanjuan) wrote :

Hi,
I'm not sure it comes only from ndiswrapper,
This module is not installed on a new dell dual core 2 server (no graphic interface)
with gutsy Linux 2.6.22-9-generic #1 x86_64 GNU/Linux installed
and I'm facing the same troubles at least once a week.

The netwokf interface is freezing, the server only responds to ping.
What it is amazing is that when I wait some hours (on or two)
I can connect again normally to the computer and there is no trace
of the freezing in the log files.

At the beegenning the problem happenned daily each 24 hours,
I stopped the cron tables to see if that was the problem,
the netwok still freezing but less often.

Any idea if my problem is related to yours?

Revision history for this message
Chow Loong Jin (hyperair) wrote :

I have no idea, really. It only happens with torrents on my computer, and never otherwise. But then again, my maximum throughput is 70kB/s. I hate my ISP. You say the problem happens every 24 hours. I noticed that every 24 hours, apt-check is run, and that does generate quite a bit of network traffic. Could that be it?

My log files also show no trace of the error. It just shows the time i restart. Which means that the kernel is not completely frozen. syslogd still runs. The magic keys refuse to work though.

Revision history for this message
Chow Loong Jin (hyperair) wrote :

On a side note, I noticed that users of other distros have similar problems, and the common factor seems to be ndiswrapper.

Revision history for this message
Stavros Korokithakis (stavrosk) wrote :

Not so with me, I was using Ubuntu's native drivers.

Revision history for this message
sanjuan (eric-sanjuan) wrote :

neither with me, I'm not using ndiswrapper but it could be a similar problem with Ubuntu's native drivers and some recent network cards.

I agree with the idea that apt-get could cause such freezing because of the network trafic it generates.

Revision history for this message
Chow Loong Jin (hyperair) wrote :

Strange eh, how about everyone posting their network card models and the drivers used here so the developers can work something out?

Revision history for this message
Chow Loong Jin (hyperair) wrote :

Okayy, so I tried downloading a torrent last night using the acx module instead of the ndiswrapper module. No crash. I guess I'll be using that for heavy traffic =\

Revision history for this message
jasongegere (jason-hgmail) wrote :

I have been doing some research trying to figure out why my installation of Feisty 7.0.4 locks up and I ultimately found this bug report. I can confirm many of the items that are talked about in this report. About or exactly at 2gb of network traffic using SMB (samba) the system locks up, kernel panic with a beep and the numlock and scolllock flashing a few times. I also am able to make the system lock up while trying to create a new disk with VMware Server. At around 2gb the system also locks up. I am running a Intel Xeon 2.6 64bit on a Asus motherboard.

The weird thing is I was able to download a 3.5gb Fedora Core 5 DVD image with Firefox.

Hope this helps. I would love to find a solution.

Revision history for this message
Chow Loong Jin (hyperair) wrote :

I never had a problem on Ubuntu Feisty, and certainly this problem only happens for me on Gutsy with ndiswrapper. I can't figure out why though. It seems that it was a bug to do with version 1.45, but even after installing 1.48 I still had the problem.

Revision history for this message
Adolfo González Blázquez (infinito) wrote :
Revision history for this message
Chow Loong Jin (hyperair) wrote :

Ah thanks for referring me there. That's definitely my bug.

Revision history for this message
Matthew Specker (matthew-specker) wrote :

I've been experiencing the same problem running Gutsy for the last month or so. Only occurs during large transfers over NFS. No ndiswrapper here. I'm running an Intel wireless card.

Revision history for this message
effell (effell) wrote :

I'm experiencing the same problem on a PCMCIA 3G/HSDPA wireless card (a Novatel Merlin) on a Thinkpad X31.
High bandwidth (eg bittorrent) quickly causes a system freeze.
I'm available for performing further testing, in case anyone needs.

Revision history for this message
Chow Loong Jin (hyperair) wrote :

Are you running via ndiswrapper? If so go to http://ndiswrapper.sourceforge.net and go download the 1.49rc4 version. That will fix the bug. I've been running for days now on that.

Revision history for this message
Chow Loong Jin (hyperair) wrote :

Sorry I meant 1.49. I didn't realize it had already been released. And launchpad.net should allow editing of posts!

Revision history for this message
effell (effell) wrote :

Not running NDISwrapper, not even a wifi card. This is a 3G cellular data card, showing up on a couple of ttyUSB ports courtesy of the kernel Option module, it seems. From then on it accepts Hayes AT commands like a regular modem.

Revision history for this message
Chow Loong Jin (hyperair) wrote :

Well it might be then that I don't get this bug because my maximum download/upload speeds for my internet connection respectively are: 512kbps/256kbps or 64kBps/32kBps. Miserable huh.

Revision history for this message
effell (effell) wrote :

For the record, in my case the system locks up at or below 80kBps.
Running FC7 on the same card/network (but different Thinkpad) has been fine.

Revision history for this message
Chow Loong Jin (hyperair) wrote :

Well why don't you get an Ubuntu Gutsy LiveCD and try doing some heavy transfers on that other Thinkpad to see if the problem persists?

You could also try using an FC7 LiveCD on the Thinkpad that locks up and see if the problem persists to be sure if it's really Ubuntu's fault.

Revision history for this message
effell (effell) wrote :

Just to clarify the above, I didn't mean that no lockup occurs above 80kBps, only that it occurs even at speeds (as low?) as below 80.

The hardware seems fine: ran self-test on CPU, ran memtest, network card runs fine on another laptop. The lock-up occurs only under heavy network traffic, so I'm leaning towards a software cause and I think some sort of postmortem info would be the most useful. I'll go and find out how to obtain that, but would appreciate pointers on how to do it in this specific circumstance, ie, full lock-up.

Still, for kicks I'll also add further data points by running live CDs later on.

Revision history for this message
JackC (jack-jncsoftware) wrote :

I'm experiencing the same issue. Large file copies will hard crash the machine in a matter of minutes. This is over a wired connection. No wireless card is even installed.

Revision history for this message
-emory- (emory-roane) wrote :

AHA! I think I solved the problem. What I think was happening was that gutsy wasn't using swap correctly. Change swappiness to 10 and it works beautifully so far!
btw to do so just do
sudo gedit /etc/sysctl.conf
then add the line swappiness=10 to the bottom, if it isn't already there. Can anyone confirm this for me?

Revision history for this message
-emory- (emory-roane) wrote :

ouch...looks like i spoke a bit too soon. All worked well until I loaded amarok. It could have been a coincidence of course that it started to crash as soon as that loaded up, I'm not sure. Any luck on anyone else? I wonder why everyone isnt experiencing this though...

Revision history for this message
effell (effell) wrote :

A long shot: here's a possible extra clue for solving this problem. Twice after freezing this way and being left alone for a while, my Thinkpad X31 would not even turn on with the battery plugged in -- it was necessary to remove it. Some sort CMOS corruption connection?

Anyway, again if anyone working on this wants specific trace info or some test performed, please do ask.

Revision history for this message
jasongegere (jason-hgmail) wrote : Re: [Bug 147464] Re: Heavy network activity (eg: torrent/nfs file transfers) causes Hard System Locks and/or Network Freezes.

I had posted earlier in this forum. My system locked ups were also on
a wired connection.

--
HTMLgraphic Designs, LLC
(920) 965-0090 x101
www.htmlgraphic.com

On Nov 3, 2007, at 10:45 PM, JackC <email address hidden> wrote:

> I'm experiencing the same issue. Large file copies will hard crash the
> machine in a matter of minutes. This is over a wired connection. No
> wireless card is even installed.
>
> --
> Heavy network activity (eg: torrent/nfs file transfers) causes Hard
> System Locks and/or Network Freezes.
> https://bugs.launchpad.net/bugs/147464
> You received this bug notification because you are a direct subscriber
> of the bug.

Revision history for this message
irwjager (jager49) wrote :

Vostro 1400 and intel 3945 wireless. Not using NDIS wrapper. System consistently locks up after +/- 30 minutes of bittorrent traffic (+/- 100kb upload, +/-40 kb download, single torrent).
No traces in any system log or ring buffer of a crash or problem. System completely freezes; mouse cursor just stops moving. System monitor shows no unusual activity up until freeze.

I'm not sure what to post here to help developers...

Revision history for this message
Chow Loong Jin (hyperair) wrote :

It occurred to me that the problem could be the network stack hanging. I mean X does depend on the network stack doesn't it? What happens if you're logged into a terminal instead? Would it still lock you out? What about processes in the background? A way to find out might be to run a program that writes to a file the time, at every, say, 5-second interval, for example:

#!/usr/bin/perl
while (1)
{
open(FILE,"$ENV{HOME}/test");
print FILE time();
close(FILE);
sleep(5);
}

And running that in the background.

Revision history for this message
Tareeq (spock-rock) wrote :

I would like to point out that I have encountered this as well, it seems the error occurs when using torrents while streaming over a windows share with samba. This pretty much hardlocks the machine, and syslog has nothing in it. I will post my syslog the next time it happens.

Tareeq

Revision history for this message
JackC (jack-jncsoftware) wrote :

It appears the problem is either with the 64 bit version on something on the desktop install. I was running the 64-bit desktop install and getting crashes every few minutes of heavy network use. I installed the 32-bit server on the same machine and transfered well over 50GB last night without any problems.

Revision history for this message
-emory- (emory-roane) wrote :

are we sure it's with the 64 bit version? I'm running 32 bit desktop and it's doing this... This is the ONLY thing that I've noticed wrong with this release too... In other news, I just did a fresh install and its' still messing up. I couldn't test on teh live cd though. Has anyone tried with the alternate install disk?

Changed in nfs-utils:
status: New → Invalid
103 comments hidden view all 183 comments
Revision history for this message
Tony (tonybaca) wrote :

Just had, what I think is the same problem. Running X86-64 8.10 Intrepid fresh insall. I mounted an NFS volume from another system (SUSE) via wired connection. Tried to copy over 300G video files. Each is about 1G. System transfered about 30% then hard lock, TOP was still runing. If I stop TOP, then I can't execute any other commands. I tried to SSH but that does not work. I am forced to power cycle. Tried this severarl times and each time it hard locks the client computer. The attached file is something I found in the log

Revision history for this message
hajkos (hajkos) wrote :

Submitting the same problem. Intrepid amd64 crashes under heavy load & more network connections (samba, ssh, headless virtualbox). Used to be Hardy, but then I performed a dist-upgrade, because the bug occured at Hardy too. Thought it would help :( Sometimes I can push 50 gigs over samba straight, sometimes it crashes after 100 megs. It's an ASUS-M2A-MX board with integrated Attansic (atl1) card and an Intel 1000 (e1000) card. Detailed info enclosed in the attached file.
Hope this may help.

The /var/log/messages shows the following:

Nov 30 12:08:14 homer kernel: [ 65.393126] e1000: eth1: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex, Flow Cont
rol: RX/TX
Nov 30 12:08:14 homer kernel: [ 65.801001] NET: Registered protocol family 10
Nov 30 12:08:14 homer kernel: [ 65.801356] lo: Disabled Privacy Extensions
Nov 30 12:08:15 homer kernel: [ 71.976814] iSCSI Enterprise Target Software - version 0.4.15
Nov 30 12:08:15 homer kernel: [ 71.976892] iscsi_trgt: Registered io type fileio
Nov 30 12:08:15 homer kernel: [ 71.976894] iscsi_trgt: Registered io type blockio
Nov 30 12:08:15 homer kernel: [ 71.976896] iscsi_trgt: Registered io type nullio
Nov 30 12:10:08 homer kernel: [ 184.698833] vboxdrv: fAsync=1 offMin=0xfbbf3 offMax=0xfbbf3
Nov 30 12:10:08 homer kernel: [ 184.698842] vboxdrv: TSC mode is 'asynchronous', kernel timer mode is 'normal'.
Nov 30 12:28:14 homer -- MARK --
Nov 30 12:35:20 homer kernel: [ 1697.047794] md: md3: resync done.
Nov 30 12:35:20 homer kernel: [ 1697.203374] RAID1 conf printout:
Nov 30 12:35:20 homer kernel: [ 1697.203379] --- wd:2 rd:2
Nov 30 12:35:20 homer kernel: [ 1697.203381] disk 0, wo:0, o:1, dev:sda6
Nov 30 12:35:20 homer kernel: [ 1697.203383] disk 1, wo:0, o:1, dev:sdb6
Nov 30 12:38:46 homer kernel: [ 1903.073282] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_I
O
Nov 30 12:38:46 homer kernel: [ 1903.074556] program smartctl is using a deprecated SCSI ioctl, please convert it to SG_I
O
Nov 30 12:40:04 homer kernel: [ 1980.761244] kjournald starting. Commit interval 5 seconds
Nov 30 12:40:04 homer kernel: [ 1980.762724] EXT3 FS on sdd1, internal journal
Nov 30 12:40:04 homer kernel: [ 1980.762731] EXT3-fs: recovery complete.
Nov 30 12:40:04 homer kernel: [ 1980.762734] EXT3-fs: mounted filesystem with ordered data mode.

>>>>> STARTED COPYING OVER SAMBA AND AFTER A FEW MINUTES CRASH OCCURED HERE <<<<<<<<

Nov 30 13:08:13 homer syslogd 1.5.0#2ubuntu6: restart.
Nov 30 13:08:13 homer kernel: Inspecting /boot/System.map-2.6.27-9-server
Nov 30 13:08:14 homer kernel: Cannot find map file.
Nov 30 13:08:14 homer kernel: Loaded 49686 symbols from 66 modules.
Nov 30 13:08:14 homer kernel: [ 0.000000] Initializing cgroup subsys cpuset
Nov 30 13:08:14 homer kernel: [ 0.000000] Initializing cgroup subsys cpu
Nov 30 13:08:14 homer kernel: [ 0.000000] Linux version 2.6.27-9-server (buildd@yellow) (gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu11) ) #1 SMP Thu Nov 20 22:56:07 UTC 2008 (Ubuntu 2.6.27-9.19-server)
Nov 30 13:08:14 homer kernel: [ 0.000000] Command line: root=/dev/md2 ro quiet splash

Revision history for this message
Kunin (kunin) wrote :

I have had this same issue since 7.10, currently on 8.04 and after almost a year finally figured out that it always seems to hard lock when I've got a ton of torrent activity. The up/down speed it locks on is always different, so I'm thinking it has to do with the number of connections present at the time and not the about of bandwidth being used.

To be sure I have tested every piece of hardware, replacing my NIC (no effect, except I upgraded so now I get even better speeds) and replaced my PSU due to a slight fluctuation in one of the 5v rails (within limits, but barely) which also had no effect. CPU, heat, RAM, HDs, etc all tested fine. So this is definitely a Linux/Ubuntu issue.

My issues are exactly as above, seemingly random hard locks requiring the use of the reset/power button. I cannot confirm cases of num lock/caps lock/etc lights blinking as my wireless keyboard doesn't have those lights.

Please fix this issue, it should be of a relatively high priority due to the nature of the issue and the possible public (read non-Linux users whom wish to try Ubuntu) assumptions that Ubuntu is not a stable OS. I personally have fiber at home, and up until now (when I realized where the issue is) I would happily leave any and all torrents running until these was not a single peer left, having a constant 3-5Mbps (and higher) upload going. I've tried other torrent clients (I prefer Azureus) with the same resulting hard locks.

Revision history for this message
effell (effell) wrote :

This is a serious bug. I experienced it with the generic kernel a good while ago
and managed to get around it by compiling my own kernel from kernel.org.

root@localhost:~$ uname -a
Linux localhost 2.6.23.9 #1 PREEMPT Fri Dec 7 07:16:40 WET 2007 i686 GNU/Linux

So, I'm still using Ubuntu 7.10 with this kernel and maybe I'm missing all kinds of
security upgrades. Not good.

PS This other kernel also solves bug #43092, which is also alive and kicking.

Revision history for this message
Gary Mansell (garymansell) wrote :

I think I spoke too early when I said that this was fixed in Intrepid x86_64....

I think I am seeing this again. I am downloading a file from an ftp site with Firefox and my system keeps freezing for approx 5-10 secs at a time every couple of minutes. The Windows all go dark and then come back to alive again which is what I was seeing before.

The download is only running at 100KB/s sois only using minimal bandwidth on my corporate 30MB/s Internet connection.. My laptop is plugged into a 1GB network port at the core of our 3Com network.

I declare this still a problem - but the occurrence is different to what I was suffering before: it is a lot less and the system seems to be able to un-stick itself rather than requiring a hard reset.

Still Bl**dy frustrating though - my system has locked up about 10 times whilst writing this post. Almost unusable!!

Gary

Revision history for this message
Qiwichupa (scorpion-matrixagents) wrote :

I can confirm that the same bug on my Dell Inspiron 1300 with Kubuntu 8.04 (lastest updates included). System going down from the coping files, or watching the movies (with quickly pause/play cycle) from share.

Changed in linux:
status: New → Confirmed
Changed in linux:
status: Confirmed → New
Revision history for this message
xteejx (xteejx) wrote :

In order to get this to a Triaged state, the Kernel Team will need the following information.

Qiwichupa, or anyone else in Hardy: Can you attach these following as separate attachments please:

uname -a > uname-a.log
cat /proc/version_signature > version.log
dmesg > dmesg.log
sudo lspci -vvnn > lspci-vvnn.log

Note, if this is a problem in Jaunty you can use the following command:

apport-collect -p linux-image-`uname -r` 147464

Thank you.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
headlessspider (headlessspider) wrote : apport-collect data

Architecture: i386
DistroRelease: Ubuntu 9.04
HibernationDevice: RESUME=UUID=033dee7e-6241-4b76-8c46-194bd632bcaf
MachineType: Hewlett-Packard HP Compaq nx6120 (PV160PA#UUF)
Package: linux-image-2.6.28-11-generic 2.6.28-11.42
PackageArchitecture: i386
ProcCmdLine: root=UUID=8395a20c-0276-4ba5-a2af-a1dee65495f9 ro quiet splash
ProcEnviron:
 SHELL=/bin/bash
 LANG=en_PH.UTF-8
ProcVersionSignature: Ubuntu 2.6.28-11.42-generic
Uname: Linux 2.6.28-11-generic i686
UserGroups: adm admin audio cdrom dialout dip floppy fuse lpadmin plugdev video

9 comments hidden view all 183 comments
Revision history for this message
headlessspider (headlessspider) wrote :

ey teej,

i have done what you requested (am running jaunty)

just an update. i was copying approximately 220 megabytes of files to a samba server via sshfs and it just stopped at the point where 32.4 megabytes have been transferred. i left it alone for about 20 minutes and when i got back it was still there.

i'm trying to confirm if it also happens on an hp compaq nc6320 also running on jaunty.

Revision history for this message
xteejx (xteejx) wrote :

headlessspider, thank you very much for updating us, very much appreciated. I shall mark this as Triaged and set High importance, this will undoubtedly affect a lot of users, especially if this has been a problem since Feisty.

Changed in linux (Ubuntu):
importance: Undecided → High
status: Incomplete → Triaged
xteejx (xteejx)
tags: removed: bittorrent freeze gutsy hang lock network nfs wireless
Revision history for this message
headlessspider (headlessspider) wrote :

i can confirm that this bug appears also in an hp compaq nc6320 running jaunty. so far, the only thing that's the same on an nc6320 and the nx6120 that i'm using is the network driver which is labeled tg3.

using kernel 2.6.28-15-generic on both laptops

-- noel

Revision history for this message
headlessspider (headlessspider) wrote :

one thing i did notice lately is that the copy would stop (not hang) for about 2 minutes, the system would say that eth1 is disconnected then the system prompty reconnects and then the copy proceeds. should the copy stop again the system then disconnects and then reconnects again until the copy is finished.

yes, the file gets copied eventually but everyone here knows that eth1 shouldn't reset just to have the file copy continue and finish.

Revision history for this message
I Kovalev (iakovalev) wrote :
Download full text (4.3 KiB)

Have being running Ubuntu (9.04 to 9.10) on Toshiba Satellite L40-139 laptop for about a year. Heavy network traffic systematically triggers a crash (either kernel panic or segmentation fault). Among applications that have triggered system crash are: firefox (all versions since 3.2.3), system update manager, a Java-based application downloading via http/ftp as well as bittorrent client and wget. Usually the problem appears within 20-30 min while downloading at speed of 150kBytes/s or more. Just before the moment of crash, tons of messages like:
Feb 3 02:30:04 kovalev-home kernel: [35474.511460] recvmsg bug: copied BFA5DAF9 seq BFA5E0A1
fill up system logs on pair with application-related warning. Sometimes (fifty-fifty) there is no immediate crash after warnings have appeared, but part of downloaded data have become corrupt.

The problem has been observed first with kernel 2.6.28-13 but still persists through current 2.6.31-19, all have been installed from official Ubuntu repositories. Network connection is via Atheros AR5007eg wireless (reported as AR5001 by ath5k). I've experienced this issue with several different routers and providers, though have never got a chance to try a wired connection. As Windows Vista had never crashed by similar network traffic on this laptop, the issue doesn't look like a hardware fault...

An example of warning message right before crash (triggered by bittorrent client):
---------------
Feb 2 12:51:38 kovalev-home kernel: [76478.454177] WARNING: at /build/buildd/linux-2.6.31/net/ipv4/tcp.c:1408 tcp_recvmsg+0xa49/0xb20()
Feb 2 12:51:38 kovalev-home kernel: [76478.454179] Hardware name: Satellite L40
Feb 2 12:51:38 kovalev-home kernel: [76478.454181] Modules linked in: sbp2 xt_tcpudp aes_i586 aes_generic binfmt_misc ppdev nls_iso8859_1 nls_cp437 vfat fat joydev snd_hda_codec_analog arc4 ecb pcmcia snd_hda_intel snd_hda_codec snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy iptable_filter snd_seq_oss ip_tables x_tables snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device psmouse serio_raw yenta_socket rsrc_nonstatic pcmcia_core snd shpchp ath5k mac80211 ath cfg80211 soundcore snd_page_alloc asus_laptop led_class video1394 raw1394 lp parport usbhid fbcon tileblit font bitblit softcursor 8139too 8139cp mii ohci1394 ieee1394 i915 drm i2c_algo_bit intel_agp agpgart video output
Feb 2 12:51:38 kovalev-home kernel: [76478.454230] Pid: 2357, comm: transmission Tainted: G W 2.6.31-18-generic #55-Ubuntu
Feb 2 12:51:38 kovalev-home kernel: [76478.454233] Call Trace:
Feb 2 12:51:38 kovalev-home kernel: [76478.454237] [<c014513d>] warn_slowpath_common+0x6d/0xa0
Feb 2 12:51:38 kovalev-home kernel: [76478.454242] [<c04d2439>] ? tcp_recvmsg+0xa49/0xb20
Feb 2 12:51:38 kovalev-home kernel: [76478.454245] [<c04d2439>] ? tcp_recvmsg+0xa49/0xb20
Feb 2 12:51:38 kovalev-home kernel: [76478.454250] [<c0145185>] warn_slowpath_null+0x15/0x20
Feb 2 12:51:38 kovalev-home kernel: [76478.454254] [<c04d2439>] tcp_recvmsg+0xa49/0xb20
Feb 2 12:51:38 kovalev-home kernel: [76478.454259] [<c0492433>] sock_common_recvmsg+0x43/0x60
Feb 2 12:51:38 kovalev-home kernel: [76478.454263] [<c0492...

Read more...

Revision history for this message
I Kovalev (iakovalev) wrote :

System information in addition to my previous post attached:
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/147464/comments/165)

Revision history for this message
I Kovalev (iakovalev) wrote :

System information in addition to my previous post attached:
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/147464/comments/165)

5 comments hidden view all 183 comments
Revision history for this message
gr8viju (gr8viju) wrote :

Hi I am also facing similar issues since, I am using Karmic 32 bit 2.6.31.19-Generic. The system freezes and nothing can be done except hard reboot. I have Intel 5150 WiMax card. This happens mostly using torrent clients and or with firefox with flash playing on it. Getting frustrated with it. If I use more than 10 files for downloading it happens within 5-10 mins.

Yesterday I switched to X86_64 bit version of Karmic 2.6.31.19-Generic. Will post the recent results.

Revision history for this message
gr8viju (gr8viju) wrote :

In X86_64 bit version also I am getting system hang. One torrent with speed more than 450KBps and one can see the result, This has become very annoying.

Revision history for this message
I Kovalev (iakovalev) wrote :

Have being using Transmission (1.75 to 1.91) for over a year in Ubuntu 9.04 to 9.10 on my Toshiba Satellite L40 laptop connected to the Internet via built-in Atheros AR5007EG wireless card installed in mini-PCI. While using Transmission I discovered that downloading from large swarms at high speed resulted in regular system crash (kernel panic) within 30 min to 1 hour. Recently it turned out the issue is system-wide, i.e. any heavy network activity triggers system crash. My search through Linux support community made me think that a bug in Linux kernel which causes i/o stalls in some cases (seems has been fixed in kernel 2.6.32) could be responsible for it.

Recently however I have found a hardware problem in my laptop. The PCI chip (not the CPU!) lacks proper cooling due to limitations in laptop design, mechanical defect in its radiator and partial blocking of ventilation slits by dust. After fixing mechanical problems my system has become rock stable. Still I was able to trigger system crash by heavy i/o activity - while downloading at ~2.5 MBits/s over wireless and transferring lots of data (~10MByte/s) via IEEE1394 PCMCIA card simultaneously and for a prolonged time providing that a temperature of cooling air is above 24 deg C.

Problem solved.
Conclusion: before reporting a software problem, it should be wise to check all your hardware carefully...

Revision history for this message
RussianNeuroMancer (russianneuromancer) wrote :

This issue is still present in Lucid RC?

Revision history for this message
I Kovalev (iakovalev) wrote :

Well, in my case there was (and still is) a hardware issue - due to flaws in design the PCI chip on a motherboard (not CPU neither graphic card!) has become overheated during prolonged heavy wireless transfer by additional heat dissipated in nearby on-board inverters. I have to fix the issue completely (I am very close to that) before testing my system again.

Revision history for this message
gr8viju (gr8viju) wrote :

Well in my case I don't think it is a hardware issue. I am also using windows7 alongside and I use utorrent for almost 3 days in a row and it works fine without any hang, but in Ubuntu 9.10 it was an issue with download speed more than 500 kbps. Right now in Lucid, I am still to encounter this problem; it works fine with speed up to 1.5 Mbps. Will update on this soon.

Revision history for this message
I Kovalev (iakovalev) wrote :

I had experienced this problem (system becomes hardlocked during heavy network transfer) on Toshiba Satellite L4 laptop in Jaunty and Karmic. Later it turned out there was a hardware fault in one of on-board inverters. Just have my motherboard replaced and the issue disappeared completely! No more system hardlocks, neither error messages in system logs for several days.

Conclusion: while reporting software failure one has to be cautious about the conditions of hardware. It even worth it to ask a specialist when possible. These faults are not easily identifiable, and manifestation can be contr-intuitive.

Revision history for this message
Thomas (thoms) wrote :

I hade some frezes as well with Ubuntu 9.10 64-Bit and 10.04

The problem was as followed:

As I started to download (in my case it was by JDownloader) after a while the whole system freezes.

I checked My RAMs and one seemed to be faulty but as I changed it to a new one the problem was still up.

We I used one single 1024MB and it worked perfect (each of the stand alone)

As soon I entered the second one the freezing problem is back.

I've seen at some forums threads, they had problems with the same wireless card as I use (Netgear WN311B).

For addition the same Netgear card was running fine on an other PC with Ubuntu 32-Bit

I was using the Broadcom driver by the proprietary drivers.

Now I changed the Wireless-card to a plug and Play (out of the box support) and all the freezing problems are gone.

I don't think the card is faulty, Probably something with the RAM addresses?

I have no idea, but the fault can be rebuilt.

Informations about my actual Hardware:

Tower Yeong-Yang Midi-Tower YY-5707 Black, 350Watt, ATX

Mainboard ASUS P5QL-EPU, FSB1600, Intel P43 Chipset, SATA, PCI-E, GLAN

Processor Intel CORE2Quad Q8400, QUAD-Core,1333MHz, 4MB Kentsfield, 2.66GHz, SpeedStep, I64bit, NX

Memory 4069MB DDR2 PC-6400, (2x2048MB), 240Pin, 800MHz.

Harddisc 500GB, SATA-II, 7200rpm, 16MB Cache und ein paar weiter NTFS Partitionen

Graphiccard nVidia GF GT210, 512MB DDR2, TV-Out, DVI, HDMI

Wirelesscard now: TP-Link TL-WN851N

I hope my informations helping you to find the source of the freezing problem

Revision history for this message
Nathan Adams (nadams) wrote :
Download full text (3.3 KiB)

My system is locks up when an application attempts to copy a large file to an nfs mount. The system becomes unresponsive to the point that I cannot even reboot via the command line over SSH.

For example, when soundKonverter copies an ogg file from /tmp to /home/music (nfs share).

I am unable to kill the copy command:

nate 2791 1 0 10:32 ? 00:00:00 cp /tmp/kde-nate/soundkonverterkMTAC0.ogg /home/music/Foo Fighters/[Foo Fighters] One By One - 07. Halo.ogg

And I see lots of messages like this in /var/log/syslog:

May 22 10:44:31 nereidum kernel: [ 1680.390484] INFO: task cp:2791 blocked for more than 120 seconds.
May 22 10:44:31 nereidum kernel: [ 1680.390488] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 22 10:44:31 nereidum kernel: [ 1680.390492] cp D 00000000ffffffff 0 2791 2503 0x00000000
May 22 10:44:31 nereidum kernel: [ 1680.390501] ffff88012a457c48 0000000000000082 0000000000015bc0 0000000000015bc0
May 22 10:44:31 nereidum kernel: [ 1680.390508] ffff8801291331a0 ffff88012a457fd8 0000000000015bc0 ffff880129132de0
May 22 10:44:31 nereidum kernel: [ 1680.390516] 0000000000015bc0 ffff88012a457fd8 0000000000015bc0 ffff8801291331a0
May 22 10:44:31 nereidum kernel: [ 1680.390523] Call Trace:
May 22 10:44:31 nereidum kernel: [ 1680.390545] [<ffffffffa0cff2b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
May 22 10:44:31 nereidum kernel: [ 1680.390552] [<ffffffff8153eb87>] io_schedule+0x47/0x70
May 22 10:44:31 nereidum kernel: [ 1680.390573] [<ffffffffa0cff2be>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
May 22 10:44:31 nereidum kernel: [ 1680.390579] [<ffffffff8153f3df>] __wait_on_bit+0x5f/0x90
May 22 10:44:31 nereidum kernel: [ 1680.390587] [<ffffffff812b6234>] ? __lookup_tag+0x64/0x120
May 22 10:44:31 nereidum kernel: [ 1680.390608] [<ffffffffa0cff2b0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
May 22 10:44:31 nereidum kernel: [ 1680.390615] [<ffffffff8153f488>] out_of_line_wait_on_bit+0x78/0x90
May 22 10:44:31 nereidum kernel: [ 1680.390622] [<ffffffff81085360>] ? wake_bit_function+0x0/0x40
May 22 10:44:31 nereidum kernel: [ 1680.390643] [<ffffffffa0cff29f>] nfs_wait_on_request+0x2f/0x40 [nfs]
May 22 10:44:31 nereidum kernel: [ 1680.390665] [<ffffffffa0d036af>] nfs_wait_on_requests_locked+0x7f/0xd0 [nfs]
May 22 10:44:31 nereidum kernel: [ 1680.390688] [<ffffffffa0d04aee>] nfs_sync_mapping_wait+0x9e/0x1a0 [nfs]
May 22 10:44:31 nereidum kernel: [ 1680.390711] [<ffffffffa0d04ed9>] nfs_write_mapping+0x79/0xb0 [nfs]
May 22 10:44:31 nereidum kernel: [ 1680.390733] [<ffffffffa0d04f47>] nfs_wb_all+0x17/0x20 [nfs]
May 22 10:44:31 nereidum kernel: [ 1680.390751] [<ffffffffa0cf3eba>] nfs_do_fsync+0x2a/0x60 [nfs]
May 22 10:44:31 nereidum kernel: [ 1680.390770] [<ffffffffa0cf4105>] nfs_file_flush+0x75/0xa0 [nfs]
May 22 10:44:31 nereidum kernel: [ 1680.390777] [<ffffffff8114051c>] filp_close+0x3c/0x90
May 22 10:44:31 nereidum kernel: [ 1680.390783] [<ffffffff81140627>] sys_close+0xb7/0x120
May 22 10:44:31 nereidum kernel: [ 1680.390790] [<ffffffff810131b2>] system_call_fastpath+0x16/0x1b

$ uname -a
Linux nereidum 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13...

Read more...

Nathan Adams (nadams)
tags: added: lucid
Revision history for this message
Nathan Adams (nadams) wrote :

I just upgraded to linux 2.6.33.3 and no joy - soundKonverter is stuck trying to copy a file to a nfs mount.

Revision history for this message
Jeremy Foshee (jeremyfoshee) wrote :

Folks,
    I have closed this bug as WontFix due to the age of the original report. This bug has progressed in a nebulous form as to be undefinable and unfixable.

Any of you who believe you are affected by this particular bug, please file a new bug so that it can be addressed on it's merit.

Thanks!

~JFo

Changed in linux (Ubuntu):
status: Triaged → Won't Fix
Displaying first 40 and last 40 comments. View all 183 comments or add a comment.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.