system freeze when using network with linux-rt on smp amd64 machines

Bug #354816 reported by ttoine on 2009-04-03
74
This bug affects 9 people
Affects Status Importance Assigned to Milestone
linux-rt (Ubuntu)
Low
Unassigned

Bug Description

I have a workstation with an Asus M2N Sli Deluxe mother board, Amd64 athlon X2 cpu, and own two video cards, a nVidia Quadro NVS 285 by HP, and an ATI FireGL 3600 (both are not at the same time in the workstation box)

I did the test with both cards, with or without restricted driver, on Jaunty Beta, with the -generic and the -rt kernel.

No problem with the -generic kernel.

So the problem I experienced with the -rt kernel: everything work well (low latency audio, video watching, office etc...) except going on Internet. After a few moment, sometime a few second, sometime a few minutes, the system freeze, first Gnome windows, then mouse, then keyboard. I must shut-down my workstation. The problem is here with both video-card, restricted or free driver, 3D effects activated or not.

I don't have a local network so I can't test that point.

The motherboard as two "nVidia Corporation MCP55 Ethernet (rev a2)" (extract from lspci)

Tell me if you need more info.

Toine

Alessio Igor Bogani (abogani) wrote :

@ttoine,

Using lspci -v could you report exactly what is the kernel driver involved?

Thanks!

Changed in linux-rt (Ubuntu):
status: New → Incomplete
ttoine (ttoine) wrote :

Alessio,

Please find attached the file with the result of lspci -v.

Let me know if you need further tests.

Toine

ttoine (ttoine) wrote :

Alessio,

nVidia Corporation MCP55 Ethernet (rev a2)
Kernel driver in use: forcedeth
Kernel modules: forcedeth

Toine

Alessio Igor Bogani (abogani) wrote :

@ttoine,

Could you confirm me that you incur in this bug only on -rt kernel?
Are you sure that -generic isn't affected?

Thanks!

Alessio Igor Bogani (abogani) wrote :

@ttoine,

Could you also provide output of the "cat /proc/interrupts" command?

Thanks!

Alessio,

I confirm that the -generic kernel is not affected, only the -rt.

Toine

Hi,
I can confirm that issue. I got the same behaviour with rt kernel.
It was happening each times I got access to the network when the wifi card and the ethernet card was both active at the same times.
Since I've deactivated the wifi, it happens sooner or later but later than before.

Alessio Igor Bogani (abogani) wrote :

@ttoine:
Could you provide what i have requested to you on comments #5, please?

@freyr,
Could you provide more information about you Ethernet adapter as ttoine have already done on comment #3?
Moreover provide also information requested on comment #5, please.

Thanks!

Alessio,

Sorry, I am just out of hospital, I do it as soon as I can launch my
workstation, I hope this week-end.

Download full text (3.1 KiB)

Hi,
The result of lspci - v in attachment.

In my /proc/interrupts I have :

          CPU0 CPU1
  0: 406 0 XT-PIC-XT timer
  1: 69 0 XT-PIC-XT i8042
  2: 0 0 XT-PIC-XT cascade
  5: 12622 0 XT-PIC-XT ohci_hcd:usb2, ath
  7: 4 0 XT-PIC-XT ehci_hcd:usb1, EMU10K1
  8: 1 0 XT-PIC-XT rtc0
  9: 0 0 XT-PIC-XT acpi
 10: 6369 0 XT-PIC-XT sata_nv
 12: 1090 0 XT-PIC-XT i8042
 14: 1607 0 XT-PIC-XT pata_amd
 15: 0 0 XT-PIC-XT pata_amd
2300: 10 13732 PCI-MSI-edge eth0
NMI: 0 0 Non-maskable interrupts
LOC: 157316 157256 Local timer interrupts
RES: 14036 18394 Rescheduling interrupts
CAL: 37 50 Function call interrupts
TLB: 898 851 TLB shootdowns
SPU: 0 0 Spurious interrupts
ERR: 1
MIS: 0

I've found a strange message in /var/log/messages at a time of freeze :

Apr 10 22:42:27 mao kernel: [ 1412.391213] Pid: 740, comm: kjournald2 Not tainted 2.6.28-3-rt #11-Ubuntu
Apr 10 22:42:27 mao kernel: [ 1412.391216] Call Trace:
Apr 10 22:42:27 mao kernel: [ 1412.391224] [<ffffffff806b2a99>] ? rt_spin_lock+0x9/0x10
Apr 10 22:42:27 mao kernel: [ 1412.391229] [<ffffffff802477b6>] __schedule_bug+0x76/0x80
Apr 10 22:42:27 mao kernel: [ 1412.391233] [<ffffffff806b0da8>] thread_return+0x145/0x37d
Apr 10 22:42:27 mao kernel: [ 1412.391236] [<ffffffff806b2eb3>] ? _spin_unlock+0x33/0x40
Apr 10 22:42:27 mao kernel: [ 1412.391239] [<ffffffff806b1fef>] rt_spin_lock_slowlock+0x1df/0x2b0
Apr 10 22:42:27 mao kernel: [ 1412.391242] [<ffffffff806b2a84>] __rt_spin_lock+0x64/0x70
Apr 10 22:42:27 mao kernel: [ 1412.391245] [<ffffffff806b2a99>] rt_spin_lock+0x9/0x10
Apr 10 22:42:27 mao kernel: [ 1412.391249] [<ffffffff802e9b7f>] kmem_cache_free+0x5f/0x250
Apr 10 22:42:27 mao kernel: [ 1412.391254] [<ffffffff803a210b>] __journal_remove_journal_head+0xcb/0x160
Apr 10 22:42:27 mao kernel: [ 1412.391257] [<ffffffff803a41f8>] jbd2_journal_put_journal_head+0x88/0x110
Apr 10 22:42:27 mao kernel: [ 1412.391260] [<ffffffff8039e030>] journal_wait_on_commit_record+0x70/0x110
Apr 10 22:42:27 mao kernel: [ 1412.391263] [<ffffffff8039f21f>] jbd2_journal_commit_transaction+0x10af/0x1110
Apr 10 22:42:27 mao kernel: [ 1412.391265] [<ffffffff806b2a99>] ? rt_spin_lock+0x9/0x10
Apr 10 22:42:27 mao kernel: [ 1412.391268] [<ffffffff803a2b94>] kjournald2+0xe4/0x250
Apr 10 22:42:27 mao kernel: [ 1412.391271] [<ffffffff8026cd70>] ? autoremove_wake_function+0x0/0x40
Apr 10 22:42:27 mao kernel: [ 1412.391274] [<ffffffff803a2ab0>] ? kjournald2+0x0/0x250
Apr 10 22:42:27 mao kernel: [ 1412.391277] [<ffffffff8026c879>] kthread+0x49/0x90
Apr 10 22:42:27 mao kernel: [ 1412.391280] [<ffffffff80214b49>] child_rip+0xa/0x11
Apr 10 22:42:27 mao kernel: [ 1412.391283] [<ffffffff8026c830>] ? kthread+0x0/0x90
Apr 10 22:42:27 mao kernel: [ 1412.391285] [<ffffffff80214b3f>] ? chil...

Read more...

freyr (david-lapetina) wrote :

Hi again,
I've found a workaround ... I added the nosmp and maxcpus=0 options to the kernel (I think both option are equals) ...
And then it works. It do not freeze anymore.

Alessio Igor Bogani (abogani) wrote :

@freyr, @ttoine,

You should do this test:

1) Locate forcedeth.ko (of the 2.6.28-3-rt kernel) module and move it out (in your home for example)
2) Execute sudo depmod -a
3) Reboot in that kernel
4) Check manually with lsmod that forcedeth.ko isn't loaded.
5) Do all things that you do usually stressing the pc a lot (obviously except wired network related stuff)

Thanks!

freyr (david-lapetina) wrote :

Hi,
it's quite strange but I removed the forcedeth.ko from /lib/modules/2.6.28-3-rt/kernel/drivers/net/forcedeth.ko into my home and also from the 2.6.28-11-generic kernel.
I ran sudo depmod -a
But after a reboot, lsmod appears that the module is still loaded ...

freyr (david-lapetina) wrote :

ok.
So I tried an rmmod forcedeth which worked an deactivated eth0.
I've used my wifi connection instead and removed the options nosmp maxcpus=0 I previously added.
All is working fine, even with access to the network (by wifi ...).

freyr (david-lapetina) wrote :

Hi, in fact I was wrong. The kernel has freezed anyway. But it took much more times than before.
Sorry for the wrong information ...

Alessio,

Please find inclosed the result of cat /proc/interrups.

I removed the fordedeth module like freyr suggested, all is working
fine. To be sure if there is not an important delay before freezing, I
leave the workstation with some audio processes running for a few day.
So we will see if it is ok without "forcedeth".

I have a question for you : if the Gnome System Monitor seem to indicate
good information, the panel widget display that 50% of the cpu (one
core?) is always in use. Is it normal? It is not the case with the
-generic kernel. This is the same with or without the "forcedeth" module.

Let me know if you want me to do other tests or reports.

Toine

Just installed Jaunty AMD64 yesterday from daily ISO build, and experienced something that looks like this bug systematically.

My machine is a AMD64 X2 , with an nvidia card (GeForce 6150 LE, onboard). I'm using the nvidia driver, version 180 (everything coming from ubuntu repositories without fiddling with anything).

Just to be clear about what happens (and others should clarify this)

- this happens using an smp machine, with rt kernel. The scenario is something seems to get "filled" up when using anything over the network, even just getting mounted through ssh from a remote host, and the machine completely freezes after a couple of minutes.
- the machine never crashes with -generic kernel
- the machine never crashes with rt kernel, provided it is booted with nosmp option.

I did try to see if "forcedeth" module was the problem, but it was not. I deactivated my onboard card in the BIOS, and added a Realtek PCI network card. The machine still froze when using the network after a couple of minutes. I could not find any forcedeth module loaded.

Other thing that could be relevant, I can see that my second CPU is used at 100% (that's what htop says). That seems to corroborate what Toine has experienced.

Willing to test whatever other kernel package or modifications needed.

summary: - system freeze when browsing internet with linux-rt
+ system freeze when using network with linux-rt on smp amd64 machines
Changed in linux-rt (Ubuntu):
status: Incomplete → Confirmed
Alessio Igor Bogani (abogani) wrote :

Raise Importance to Low for the moment.

Changed in linux-rt (Ubuntu):
importance: Undecided → Low
Les (lnorbo) wrote :

My problem seems similar, but with different hardware.

When I use the internet, and especially if I try to download updates, my PC will freeze. First the application seems to freeze, then within 15 seconds, the mouse will as well.

Athlon 64 x2
AMD 690 chipset on a Gigabyte MB
4 gigs ram

Both the 32 and 64 bit version do the EXACT same thing.

Thanks

nobody (tuimonen) wrote :

I can confirm this also.

Tried with custom 2.6.29-rt in 64studio and stock ubuntu studio 9.04 rt kernel, both hang with network activity at some point.

I think this may be a duplicate of bug #364530.

I own a Core 2 Quad 6600, and I think I'm experiencing this issue.

I tried first with a Kubuntu Jaunty + nVidia binary (9800 GT) + ubuntustudio-audio, getting a total freeze after entering the session. If I just 'ifconfig ethX down' before entering the session, it works. If I boot with nosmp, it works. I need to test rmmodding the r8169 driver (RTL8111/8168B rev 02). The -generic kernel works fine (I'm using it daily with no issues).

I've also tried a fresh ubuntustudio install with audio presets. I can't say it was up to date because mouse and keyboard freeze before the update ended. The progress bar was still "moving".

I'm including the result of 'lspci -v -s <my network device>' with this post.

Adding my 'cat /proc/interrupts' bit. I'll mention both the previous lspci and this attachment are from the generic kubuntu jaunty.

Reading my comments again I realize I didn't mention clearly I'm talking about the -rt kernel. To make it clear:

Two Jaunty installations:
 - Kubuntu + ubuntustudio + nvidia binary booted with -rt kernel freezed after entering the session. The -rt kernel with nosmp works fine. The -rt kernel with eths down seems to work fine too. The -generic kernel works fine.
 - Fresh install of UbuntuStudio with the audio tasksel option freezes mouse and keyboard, but it happened when update-manager was on screen and downloading, and the progress bar keeped "moving" (the inside animation), but not advancing.

Sorry about my verbosity (and my english too).

nobody (tuimonen) wrote :

The update manager was also the first place where I noticed the freeze, and in deed the progress bar was alive but not advancing and qjackctl updated screen also (the RT blinking occasionally) but keyboard/mouse input was lost.

Also tested later from command line with apt-get upgrade and it froze too.

Kristof (christophe-perus) wrote :

I also experience the same here. I own an AMD 64 X2, the ethernet adapter is the one on my motherboard (nVidia nForce 10/100).

I'm running ubuntu studio, which uses the rt kernel.

Whatever network operation I perform (copying files, browsing the web, updating the system, etc) the system freezes after a very short time and the only solution is to reboot.

puppet (puppet-trash) wrote :

same problem with almost the same hw configuration. The standard 64 bits kernel works fine, but the rt doesn't.

puppet (puppet-trash) wrote :

uh... and I confirm that the CPU seems to work much more with the rt kernel than with the normal one.

Only for your info, i installed the std kernel only using apt-get

luca@puppet:~$ sudo apt-get install linux

;-D

Does anyone know how to increase developers interest in this bug?

In the meantime... I've compiled a 2.6.29.4 kernel with rt16 patch with a very similar configuration to the one Ubuntu uses in its -generic kernel (but with rt activated, 1000Hz, etc). It was yesterday when I installed it and it has been running for a while without problems.

I was expecting some breakage due to the version bump and Ubuntu being geared towards a 2.6.28 kernel, but I finded not a single issue, except nVidia drivers shipped with Ubuntu (180 series). Updating to 185 solves the problem.

Does this lights a bulb somewhere?

I've never packaged anything at all... I was thinking "I can try", and probably it is not hard to copy pkg structure from a normal ubuntu kernel... But somehow packaging a kernel seems to me a little over-optimistic. I used make-kpg to build a binary .deb, thats all.

I'll research a little. I suppose you have ppa in mind. I'm not plenty of time, exams are in two weeks or so, but I'll try to understand the deeps of the task and answer realistically. At least I can provide the steps...

Also, I've not tested this kernel very well. All my desktop and hardware seems to be working fine but I have not stressed the system nor I've done something 'demanding'.

Sorry for my english.

I downloaded pkg sources for linux-2.6.28-11.42 and looked around rules, control and other debian files. A bit overwhelming, I think my first package should be something little and with no possible side-effects... And kernel is not any of that. Sorry. With more time in my hands, maybe I could try.

I can upload somewhere the binary build and you can trust me I have not applied vile patches... Or I can tell you what I've done (nothing weird, really simple).

It is easy: grab a vanilla 2.6.29 (last at post time is 2.6.29.4 [1]), apply the corresponding rt patch (2.6.29.4-rt16 [2]), copy the configuration from the last 2.6.28 ubuntu kernel over the new one, enable full rt-preemt and set to 1000Hz the config and make-kpkg. I also used the PPA from Michael Marley to get updated nvidia drivers [3]. You can look for outdated instructions on build an rt kernel for ubuntu here [4] to get inspiration over the config step.

[1] - http://www.kernel.org/pub/linux/kernel/v2.6/linux-2.6.29.4.tar.bz2
[2] - http://www.kernel.org/pub/linux/kernel/projects/rt/patch-2.6.29.4-rt16.bz2
[3] - https://launchpad.net/~thefirstm/+archive/ppa
[4] - https://help.ubuntu.com/community/HowToVanillaKernelWithRealtimePreemption

Sorry to not be more helpfull. I can test and run anything you want in my computer, but I can not try packaging a kernel right now.

PD: At this ancient bug they are doing similar things. Look for comments from Realtime Dutchman and vivichrist.
https://bugs.launchpad.net/ubuntu/+source/linux-rt/+bug/290498/

lexum (justmoen) wrote :

I can confirm that building a custom linux-2.6.29.6-rt23 kernel on my machine solves the freezing problem caused by mcp55 and forcedeth.

Blake W (blake-weyman) wrote :

This bug affects me, in the 32bit version of Ubuntu Studio. Haven't tried 64bit. I have a Realtek "RTL8111/8168B PCI Express Gigabit Ethernet Controller".

I'm too confused by kernel compilation to risk compiling my own.

Alessio Igor Bogani (abogani) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. The issue that you reported is one that could be fixed in the Beta release of Karmic Koala. It would help us greatly if you could test with it! Thanks again and we appreciate your help.

puppet (puppet-trash) wrote :

I'm testing the beta version of Karmik Koala and this bug seems to be solved.

Thanks for your job guys!

Alessio Igor Bogani (abogani) wrote :

puppet,

Thank you for have tested this bug against Karmic!

Alessio Igor Bogani (abogani) wrote :

@ttoine

Antoine,

As original reporter can you confirm that this bug is fixed?

Thanks!

ttoine (ttoine) wrote :

Alessio,

I confirm, it is OK for me on Karmic Beta linux-rt kernel.

Toine

Alessio Igor Bogani (abogani) wrote :

So I close this bug.

Changed in linux-rt (Ubuntu):
status: Confirmed → Invalid
ebrjvd (jos-van-dyck) wrote :

I can confirm that the bug still persists in Beta Karmic Ubuntustudio 9.10 with kernel 2.6.31-9-rt using dual core AMD64.

With the old trick "nosmp acpi=off" it works however, so this issue is still unsolved.

Jos

To the opposite of ebrjvd, but confirming what ttoine experienced, the issue seems completely solved for me. (See previous comments, I clearly had the issue in jaunty).

My machine is alive and running since 3 days, did multiple recording sessions on it, downloaded tons of data, reproduced what made the machine crash before, and it never froze.

Don't know why this bug was changed to invalid. It was confirmed, now we could say "Fix released", since using the new Karmic version solves the issue. Just did it so that it could be seen as fixed.

If you are not OK with this, at least keep the bug confirmed.

Changed in linux-rt (Ubuntu):
status: Invalid → Fix Released
ebrjvd (jos-van-dyck) wrote :

After this morning's updates (24/10), both kernels 2.6.31-9-rt and 2.6.31-14-generic seem to be working on Ubuntustudio 9.10 beta.
Yesterdag 23/10 the kernel 2.6.31-9-rt was still freezing, so last night something got fixed. Thanks.
Keep fingers crossed.

ebrjvd (jos-van-dyck) wrote :

Afrer installing Ubuntustudio 9.10-RC, the problem with kernel 2.6.31-9-rt persists; system stalls when network is accessed (Firefox, System update, freedb access...).

With "nosmp" option, everything is fine and all network access is OK.

ttoine (ttoine) wrote :

I still have minor bugs with the current -rt kernel and ATI catalyst driver:
 - I have to inactivate network if I don't want to have x-runs during a
session,
 - Some times, desktop effects generate x-runs too.
 - Last, the workstation don't power off itself after system shutdown, I
have to manually do it.

But it is quite stable at this time.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers