modprobe -r iwl3945 causes total system freeze

Bug #345710 reported by TJ on 2009-03-20
60
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Fedora)
Fix Released
Medium
linux (Ubuntu)
High
TJ
Jaunty
High
Stefan Bader

Bug Description

SRU Justification:

Impact: Removing the iwl3945 module can cause a hard lockup of the system as there is a race condition which might get the rfkill_poll workqueue restarted/running after the module code has been unloaded.

Fix: Two patches from upstream, the first moving the call that restarts the workqueue on the way down to be executed before stopping the workqueue. The second patch makes the call stopping the workqueue wait for any running worker.

Testcase: Removing and loading the module in a loop (has been verified to be working after the patches being applied).

---

I can reliably reproduce a total system lock-up (no response to SysReq keys) when-ever the module is removed. It occurs about 3 seconds after the removal has completed whilst the notification of removal is still on-screen.

sudo modprobe -r iwl3945

sudo modprobe iwl3945

uname -a
Linux hephaestion 2.6.28-11-generic #35-Ubuntu SMP Wed Mar 18 21:55:34 UTC 2009 x86_64 GNU/Linux
lspci -nn | grep -i Wireless
06:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection [8086:4222] (rev 02)

Because the lock-up is total the log files aren't synced and therefore the last 5 seconds or so of messages are not in the logs to help diagnose it. Tomorrow I shall use a netconsole connection to try to get more information.

Hi TJ,

Might be good to also test linux-backports-modules-jaunty too.

I should probably elaborate a bit more. I myself have the same card:

ogasawara@emiko:~$ sudo lspci -vnvn | grep "Wireless"
0c:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection [8086:4222] (rev 02)

I was able to reproduce the lock-up without linux-backports-modules-jaunty installed. After installing linux-backports-modules-jaunty I no longer can trigger the lock-up. Would be good if you could confirm as well. Thanks.

Changed in linux (Ubuntu):
status: New → Incomplete
TJ (tj) wrote :

I've found and fixed the bug. Submitting to mailing list and attaching patch here too.

Subject: [PATCH] UBUNTU: SAUCE: iwl3945: Don't queue rfkill_poll work when module is exiting

Bug: #345710

When the wireless interface is active and the iwl3945 module is unloaded the
call to ieee80211_unregister_hw() would call iwl3945_mac_stop() which would
restart the delayed workqueue for rfkill_poll. That workqueue had already been
cancelled so when the next work item was run (2 seconds later) the system would
suffer a hard lock-up because the module had been unloaded by then.

This patch implements STATUS_EXIT_PENDING checks in places where the rfkill_poll
work is scheduled, and moves the final workqueue cancellation to occur after the
call to ieee80211_unregister_hw().

Bug discovered, experienced and fix tested on my PC.

Signed-off-by: TJ <email address hidden>

Changed in linux (Ubuntu):
assignee: nobody → intuitivenipple
importance: Undecided → High
milestone: none → ubuntu-9.04-beta
status: Incomplete → In Progress
bloo (bloo) wrote :

How likely will the patch provided by TJ be included in Jaunty RC (if it is not already)?

Thanks!

Hi Marc,

TJ was asked to run this by upstream first which he did:

http://marc.info/?l=linux-wireless&m=123791044313158&w=2

Although it seems they asked for it to be resubmitted but there has been no response yet:

http://marc.info/?l=linux-wireless&m=123809602824425&w=2

However, upstream did seem to agree this does need to be fixed so I imagine it should be pulled in as soon as they get the patch modification they're wanting. It will then be pulled back into the Ubuntu kernel as a stable release update. For the mean time I'd suggest installing linux-backports-modules-jaunty. Thanks.

bloo (bloo) wrote :

Does installing linux-backports-modules-jaunty fix the problem? I need to know if this *currently* solves the problem or if it will solve the problem (so, if linux-backports-modules-jaunty has the included patch or not). I need to use Linux but I can't use my wireless because of that, I tried Fedora 10 too with the same problems.

As a second question, does the same apply for linux-backports-modules in Intrepid?

It would be nice I could start using Ubuntu again and leave my XP with an Ubuntu VM...

Thank you very much for your comment, Leann. I hope upstream accepts the patch ASAP and gets into Jaunty before the release, if we still have time...

Pacovi (pacovi) wrote :

I have the very same problem, and I can reproduce it even with linux-backports-modules in Jaunty

"sudo lspci -vnvn | grep Wireless" -> shows:
03:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection [8086:4222] (rev 02)

"uname -a" shows:
Linux pacovi-laptop 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:57:59 UTC 2009 i686 GNU/Linux

I have the same wifi card.

06:00.0 Network controller [0280]: Intel Corporation PRO/Wireless
3945ABG [Golan] Network Connection [8086:4222] (rev 02)

Description of problem:
$subj says it all.
While it looks as a duplicate of Bug 495223 = Bug 495003 those bugs should have been already fixed (kernel-2.6.29.1-64.bz495003.2.fc11) in my kernel version.

Version-Release number of selected component (if applicable):
kernel-2.6.29.2-126.fc11.x86_64

How reproducible:
Each time my network setup crashed (~5 times).
The artificial reproducer below was tried once on a "s" (init s) GRUB boot.

Steps to Reproduce:
modprobe iwl3945
ifconfig wlan0 up
rmmod iwl3945

Actual results:
Crash. Seen some (cut as too big) oops on the screen. No local other machine and kdump just hangs (nothing dumped).

Expected results:
No crash.

Additional info:
Easy workaround is to do before rmmod:
ifconfig wlan0 down

I confirm that my iwl3945 machine crashes on rmmod when the device is up. Please try to capture an oops, as will I.

Download full text (6.1 KiB)

cfg80211: Using static regulatory domain info
cfg80211: Regulatory domain: US
 (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
 (2402000 KHz - 2472000 KHz @ 40000 KHz), (600 mBi, 2700 mBm)
 (5170000 KHz - 5190000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
 (5190000 KHz - 5210000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
 (5210000 KHz - 5230000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
 (5230000 KHz - 5330000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
 (5735000 KHz - 5835000 KHz @ 40000 KHz), (600 mBi, 3000 mBm)
cfg80211: Calling CRDA for country: US
lib80211: common routines for IEEE802.11 drivers
iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.26kds
iwl3945: Copyright(c) 2003-2008 Intel Corporation
iwl3945 0000:06:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
iwl3945: Tunable channels: 11 802.11bg, 13 802.11a channels
iwl3945: Detected Intel Wireless WiFi Link 3945ABG
wmaster0 (iwl3945): not using net_device_ops yet
wlan0 (iwl3945): not using net_device_ops yet
iwl3945 0000:06:00.0: firmware: requesting iwlwifi-3945-2.ucode
iwl3945 loaded firmware version 15.28.2.8
Registered led device: iwl-phy0:radio
Registered led device: iwl-phy0:assoc
Registered led device: iwl-phy0:RX
Registered led device: iwl-phy0:TX
ADDRCONF(NETDEV_UP): wlan0: link is not ready
------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x44/0x5c() (Not tainted)
Hardware name: VGN-FE570G
list_add corruption. prev->next should be next (c0977c34), but was (null). (prev=f4774adc).
Modules linked in: iwl3945 mac80211 lib80211 cfg80211 netconsole configfs i915 drm i2c_algo_bit rfcomm sco bridge stp bnep l2cap autofs4 sunrpc nf_conntrack_netbios_ns ip6t_REJECT ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device arc4 gspca_vc032x snd_pcm_oss snd_mixer_oss firewire_ohci snd_pcm gspca_main ecb tifm_7xx1 firewire_core iTCO_vendor_support snd_timer videodev yenta_socket tifm_core crc_itu_t snd rsrc_nonstatic v4l1_compat soundcore e100 mii i2c_i801 i2c_core btusb pcspkr serio_raw joydev snd_page_alloc bluetooth sony_laptop video output ata_generic pata_acpi [last unloaded: cfg80211]
Pid: 3000, comm: iwl3945/0 Not tainted 2.6.29.2-52.fc10.i686 #1
Call Trace:
 [<c042e5c4>] warn_slowpath+0x77/0xb4
 [<c0410063>] ? speedstep_detect_processor+0xf4/0x1fd
 [<c06d7a61>] ? _spin_lock_irqsave+0x2b/0x32
 [<c04369a9>] ? lock_timer_base+0x1f/0x3e
 [<c0436a10>] ? try_to_del_timer_sync+0x48/0x4f
 [<c0436a24>] ? del_timer_sync+0xd/0x18
 [<c041b455>] ? default_spin_lock_flags+0x8/0xb
 [<c06d7a61>] ? _spin_lock_irqsave+0x2b/0x32
 [<c0436b5a>] ? __mod_timer+0x9d/0xa8
 [<c043c8bf>] ? queue_delayed_work_on+0xad/0xba
 [<c0421da5>] ? update_curr+0x94/0x1a8
 [<c05356b0>] __list_add+0x44/0x5c
 [<c04362d1>] internal_add_timer+0x88/0x8c
 [<c0436b50>] __mod_timer+0x93/0xa8
 [<c06d6564>] schedule_timeout+0x98/0xbc
 [<f8e85c75>] ? iwl3945_enqueue_hcmd+0x27e/0x2ae [iwl3945]
 [<c04366af>] ? process_timeout+0x0/0xa
 [<c0436ad0>] ? __mod_timer+0x13/0xa8
 [<f8e85dc2>] iwl3945_send_cmd_sync+0x11d/0x292 [iw...

Read more...

Download full text (9.1 KiB)

iwl3945 0000:06:00.0: PCI INT A disabled
cfg80211: Using static regulatory domain info
cfg80211: Regulatory domain: US
 (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp)
 (2402000 KHz - 2472000 KHz @ 40000 KHz), (600 mBi, 2700 mBm)
 (5170000 KHz - 5190000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
 (5190000 KHz - 5210000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
 (5210000 KHz - 5230000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
 (5230000 KHz - 5330000 KHz @ 40000 KHz), (600 mBi, 2300 mBm)
 (5735000 KHz - 5835000 KHz @ 40000 KHz), (600 mBi, 3000 mBm)
cfg80211: Calling CRDA for country: US
lib80211: common routines for IEEE802.11 drivers
iwl3945: Intel(R) PRO/Wireless 3945ABG/BG Network Connection driver for Linux, 1.2.26kds
iwl3945: Copyright(c) 2003-2008 Intel Corporation
iwl3945 0000:06:00.0: PCI INT A -> GSI 18 (level, low) -> IRQ 18
iwl3945: Tunable channels: 11 802.11bg, 13 802.11a channels
iwl3945: Detected Intel Wireless WiFi Link 3945ABG
wmaster0 (iwl3945): not using net_device_ops yet
wlan0 (iwl3945): not using net_device_ops yet
iwl3945 0000:06:00.0: firmware: requesting iwlwifi-3945-2.ucode
iwl3945 loaded firmware version 15.28.2.8
Registered led device: iwl-phy0:radio
Registered led device: iwl-phy0:assoc
Registered led device: iwl-phy0:RX
Registered led device: iwl-phy0:TX
ADDRCONF(NETDEV_UP): wlan0: link is not ready
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<c04368c3>] get_next_timer_interrupt+0xeb/0x1b2
*pde = 302e1067 *pte = 00000000
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/pci0000:00/0000:00:1c.2/0000:06:00.0/net/wlan0/address
Modules linked in: iwl3945 mac80211 lib80211 cfg80211 fuse i915 drm i2c_algo_bit rfcomm netconsole configfs sco bridge stp bnep l2cap autofs4 sunrpc nf_conntrack_netbios_ns ip6t_REJECT ip6table_filter ip6_tables ipv6 cpufreq_ondemand acpi_cpufreq dm_multipath uinput snd_hda_codec_idt snd_hda_intel snd_hda_codec snd_hwdep snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm firewire_ohci arc4 yenta_socket snd_timer firewire_core ecb iTCO_wdt e100 mii tifm_7xx1 rsrc_nonstatic gspca_vc032x snd iTCO_vendor_support tifm_core i2c_i801 i2c_core crc_itu_t serio_raw gspca_main sony_laptop soundcore snd_page_alloc pcspkr joydev btusb videodev bluetooth video output v4l1_compat ata_generic pata_acpi [last unloaded: cfg80211]

Pid: 0, comm: swapper Not tainted (2.6.29.2-52.fc10.i686 #1) VGN-FE570G
EIP: 0060:[<c04368c3>] EFLAGS: 00010017 CPU: 0
EIP is at get_next_timer_interrupt+0xeb/0x1b2
EAX: 00000000 EBX: 0000003b ECX: 00ffff3b EDX: 0000003b
ESI: 00000000 EDI: c0977280 EBP: c0892f6c ESP: c0892f34
 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 0, ti=c0892000 task=c0824350 task.ti=c0892000)
Stack:
 ffff3a8e c0977a8c 00000001 c0977c64 c09776cc c0892f50 ffff3b88 c0977a8c
 c0977c8c c0977e8c c097808c c17f8af4 149df780 0000003a c0892fb8 c0449242
 c0441dd4 00000000 00000000 00000001 c0892fa0 c0448d9f 149f17a0 0000003a
Call Trace:
 [<c0449242>] ? tick_nohz_stop_sched_tick+0x177/0x34d
 [<c0441dd4>] ? hrtimer_start_range_ns+0x10/0x12
 [<c0448d9f>] ? hrtimer_start_expires+0x1a/0x22
 [<c0402db7>] ? cpu_idle+0x26/0x8b
...

Read more...

I can also confirm this on my setup.

Kernels affected by this bug :

- 2.6.29.1-30.fc10.i686
- 2.6.29.1-42.fc10.i686
- 2.6.29.2-52.fc10.i686
- 2.6.29.3-60.fc10.i686

Kernels Unaffected :

- 2.6.27.21-170.2.56.fc10.i686
- possibly lower versions (untested)

Sometimes I get blinking leds when the system freezes, sometimes not.
In lower runlevels, I can sometimes see the bottom of a kernel call trace followed by a kernel panic notice, even if more often than not the console seems to freeze before any text appears.

I can also confirm this on current rawhide.

I also ran into this problem. The last known working kernel for me was 2.6.29-0.237.rc7.git4.fc11. I'm seeing the failure under 2.6.29.3-140.fc11.

tonfa (bboissin) wrote :

Same problem here:

0c:00.0 Network controller: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection (rev 02)

the error is: "Oops: Unable to handle kernel paging request" (sorry I don't have a camera so I can't take a picture)

linux-backport-modules seems to fix it.

linux-2.6-iwl3945-use-cancel_delayed_work_sync-to-cancel-rfkill_poll.patch which has been applied since -158 should fix it:
* Fri May 22 2009 John W. Linville <email address hidden> - 2.6.29.3-158
- back-port "iwl3945: use cancel_delayed_work_sync to cancel rfkill_poll"

Is this still reproducible with a current F11 kernel?

(In reply to comment #7)
> linux-2.6-iwl3945-use-cancel_delayed_work_sync-to-cancel-rfkill_poll.patch
> which has been applied since -158 should fix it:

I'm not yet under F11 but I tried 2.6.27.24-170.2.68.fc10.i686 found in updates-testing, as it seems to contain the above patch, indicated by the build log located at http://kojipkgs.fedoraproject.org/packages/kernel/2.6.29.4/75.fc10/data/logs/i686/build.log (Patch685)

I booted under runlevel 1, modprobe iwl3945, ifconfig wlan0 up, rmmod iwl3945, and bingo, blinking keyboard leds and hard-lockup !

The kernel panic is visible here http://img172.imageshack.us/img172/8582/1003246.jpg

Still no change in 2.6.29.4-167.fc11.x86_64.

May I ask why there is still a Jaunty beta milestone here, as the proper Jaunty release has now superseded the beta and alpha releases.

This bug appears to have been reported against 'rawhide' during the Fedora 11 development cycle.
Changing version to '11'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Stefan Bader (smb) wrote :

This should be fixed in Jaunty

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Stefan Bader (smb) wrote :

Too late, I meant this should be fixed in Karmic

Changed in linux (Ubuntu):
milestone: ubuntu-9.04-beta → none
Stefan Bader (smb) wrote :

For Jaunty I created some test kernel at http://people.ubuntu.com/~smb/bug345710/. Can someone verify those fix the issue? Thanks

Changed in linux (Ubuntu Jaunty):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
importance: Undecided → High
status: New → In Progress
Stefan Bader (smb) wrote :

Waiting for test results.

Changed in linux (Ubuntu Jaunty):
status: In Progress → Incomplete
Pacovi (pacovi) wrote :

I have tested it and no results seen

uname -a shows:
Linux pacovi-portatil 2.6.28-13-generic #44bug345710v1 SMP Wed Jun 10 17:20:27 CEST 2009 i686 GNU/Linux

if you need more info only ask for it

attached is the wpa_supplicant.log

Pacovi (pacovi) wrote :

here is the syslog file

Stefan Bader (smb) wrote :

Just to double check that "no results seen" is meant as: after repeatedly loading and unloading the iwl3945 module with the test kernel, there was no system freeze. Is this correct?

Changed in linux (Fedora):
status: Unknown → In Progress
Andres Mujica (andres.mujica) wrote :

Just marked some dupes from this bug.

Upstream waited for the modified version http://article.gmane.org/gmane.linux.kernel.wireless.general/31209

and finally Rennard sent a one liner here:

http://osdir.com/ml/linux-wireless/2009-04/msg00840.html
http://osdir.com/ml/linux-wireless/2009-04/msg00832.html

However from RedHat's bugzilla it seems it didn't worked out...

https://bugzilla.redhat.com/show_bug.cgi?id=499811#c8
https://bugzilla.redhat.com/show_bug.cgi?id=499811#c9

Stefan Bader (smb) wrote :

Thanks a lot for the references Andres. Huaxu sent a patch on our mailing list which also moved the statement which remove the work queue to a later point.
Given your references this would sound like even the current upstream code will run into this. So we definitely need someone with that hardware doing some testing here. If this can be confirmed to be still a problem with the updated Jaunty code and either a mainline or Karmic kernel, we could then try the modification from Huaxu and if that helps ask for inclusion upstream.

Pacovi (pacovi) wrote :

> Just to double check that "no results seen" is meant as: after
> repeatedly loading and unloading the iwl3945 module with the test
> kernel, there was no system freeze. Is this correct?
>

Well, at least there's no system freeze now, that's right, I forgot to tell
sorry...
I've dobule checked it loading and unloading de module and now it doesn't
freeze anymore.

2009/6/24 Stefan Bader <email address hidden>

> Thanks a lot for the references Andres. Huaxu sent a patch on our mailing
> list which also moved the statement which remove the work queue to a later
> point.
> Given your references this would sound like even the current upstream code
> will run into this. So we definitely need someone with that hardware doing
> some testing here. If this can be confirmed to be still a problem with the
> updated Jaunty code and either a mainline or Karmic kernel, we could then
> try the modification from Huaxu and if that helps ask for inclusion
> upstream.
>

I can test what you want, but because of exams I won't have much time, I'll
try whatever as soon as I can.

mcarni (mcarni) wrote :

Hi,

I have the same wifi card, and the same problem when trying to rmmod iwl 3945.
I would be happy to help with some testing, but I am afraid I need some assistance, I know some of the baics but I am not a linux expert.
Should I try to use the kernel suggested by Stefan Bader? (If so is there a guide I can follow to replace the standard kernel with that one?) Or is there anything else you want me to do?

let me know

Michele

Andres Mujica (andres.mujica) wrote :

Michele, no problem. We need you to test the Stefan kernel. So what you've got to do is:

For an i386 kernel click on

http://people.ubuntu.com/~smb/bug345710/linux-image-2.6.28-13-generic_2.6.28-13.44bug345710v1_i386.deb

It would prompt you to install using Gdebi the package you're about to download, you must say yes.

After the download is completed, the Gdebi would pop up giving you a warning about a more recent version installed. Just clic accept, and then clic on Install Package.

Once installed, reboot your system. Be sure to select the new kernel that appears at Grub menu. The system would boot normally.

Then make the test procedure again (rmmod iwl3945) and report here how it went.

(if you're using 64bits use the other kernel available there)

Andres Mujica (andres.mujica) wrote :

well.. i just test (haven't had the chance before) and it panics here. Didn't have response to SysRq keys either.

uname -a
Linux tecnica 2.6.28-13-generic #44bug345710v1 SMP Wed Jun 10 17:20:27 CEST 2009 i686 GNU/Linux
lspci -nn | grep -i Wireless
0c:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 3945ABG [Golan] Network Connection [8086:4222] (rev 02)

I don't have l-bm installed.

I'll check with Mainline kernel and with Karmic and report back.

Michele, please tell us about your test so we can confirm the issue is alive.

Stefan Bader (smb) wrote :

Andres and Michele, going forward I added a v2 of the kernel on my peoples page. That one includes the changes proposed by Huaxu Wan in his patch (moves the statement slightly down, so it will be done a little later). If upstream and the v1 kernel do not work, can you compare that against v2. If that does we need to get that change upstream. Thanks.

mcarni (mcarni) wrote :

it is only a partial result, i am afraid I need some more guidance to complete the test.

So, with: Linux Alyosha3 2.6.28-13-generic #44-Ubuntu SMP Tue Jun 2 07:57:31 UTC 2009 i686 GNU/Linux
the system crashes every time I run "modprobe iwl3945"

I then downloaded the two kernels v01 and v02 for i386, I thought that I could install both and then chose which one to boot from with grub. It didn't go like that, I got no option from grub but when I checked in terminal I had V02 installed

Linux Alyosha3 2.6.28-13-generic #44bug345710v2 SMP Fri Jun 26 09:21:39 CEST 2009 i686 GNU/Linux
Good news is that I executed "rmmod iwl3945" and "modprobe iwl3945" several times and I experienced no problem at all.

Bad news is that I don't know how to test v01 since when I try to install it with Gdebi I get the following error "Error: A later version is already installed" and it doesn't let me proceed.
Sorry for the screw up, please feel free to tell me how to force the install of V01 and I will be happy to test it.

thanks

M

Stefan Bader (smb) wrote :

@mcarni, sure. Thanks for the tests so far. The problem is that those kernels
replace each other (as they have the same ABI). If they wouldn't it would
require all other depending kernel packages as well and I tried to avoid that.

I am not sure how this would work with Gdebi. What I usually do is right-click
and save the deb package. Then I call 'sudo dpkg --install <pkg>' which will
warn about downgrades but continues.

> Linux Alyosha3 2.6.28-13-generic #44bug345710v2 SMP Fri Jun 26 09:21:39 CEST 2009 i686 GNU/Linux
> Good news is that I executed "rmmod iwl3945" and "modprobe iwl3945" several times and I
 > experienced no problem at all.

That is good new indeed. If the v1 still crashes, this will be enough
information to send this upstream as well.

One more question with that: will it be ok to add your mail addresses (from
launchpad) as "Tested-by:" lines to the patch?

mcarni (mcarni) wrote :

"sudo dpkg --install" was much better, did warn me but continued.

I got v01 installed:

"Linux Alyosha3 2.6.28-13-generic #44bug345710v1 SMP Wed Jun 10 17:20:27 CEST 2009 i686 GNU/Linux"

I did "sudo rmmod iwl3945" and then as soon as I did "modprobe iwl3945" the system froze.

Now I got back t v02, tried again to rmmod and modprobe and everything is fine.

no problem with email address, I am glad to help, I wish it was so easy to get also pulseaudio and skype sorted...

M

Andres Mujica (andres.mujica) wrote :

i've just tested and can confirm that v2 kernel solves the issue.

Pacovi (pacovi) wrote :

If now we still have problems to associate to the AP (but the module doesn't
cause system freeze) should we start a new bug?

2009/6/30 Andres Mujica <email address hidden>

> i've just tested and can confirm that v2 kernel solves the issue.
>
> --
> modprobe -r iwl3945 causes total system freeze
> https://bugs.launchpad.net/bugs/345710
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Stefan Bader (smb) wrote :

Ok, thanks for the testing, I see that this gets upstream and into Jaunty and Karmic. For the other problem about not being able to connect, yes please open another bug. This is another issue. If you do, could you use "ubuntu-bug linux" (that gathers some default data into the report) and it is also helpful if you can say you tested the same against a mainline (or a Karmic) kernel and whether it works there or not.

Changed in linux (Ubuntu Jaunty):
status: Incomplete → In Progress
Stefan Bader (smb) wrote :

Setting this back to "in progress" as the 2nd half of the solution is not upstream/in Karmic, yet.

Changed in linux (Ubuntu):
status: Fix Released → In Progress
Stefan Bader (smb) on 2009-06-30
tags: added: upstream-pending
Stefan Bader (smb) wrote :

This sounds a bit stupid but I created even a v3. The reason is that after getting in touch with Huaxu, he told me that the problem does not seem to be in 2.6.30 but he wasn't sure why. So looking a bit closer I realized that in newer kernels, instead of stopping the workqueue in question later, code after this moved to a location above. And this code might cause the queue to get scheduled again (I hope I do not loose everybody here).
So the v3 version takes the two patches from upstream instead of one from upstream and make some other changes. I hope I can get you to try that as well... The kernels are again on my peoples page.

mcarni (mcarni) wrote :

No problem Stefan,
I must confess I got lost with your explanation, I will read it more carefully tomorrow morning.
I tested v03:

Linux Alyosha3 2.6.28-13-generic #44bug345710v3 SMP Wed Jul 1 11:57:21 CEST 2009 i686 GNU/Linux

launched a couple of rmmod iwl3945 and modprobe iwl3945 and I had no problem at all.

I hope this helps

M

Stefan Bader (smb) wrote :

Yes, thanks. That means I need to set the status for the karmic release back to fixed and I can prepare a SRU with only upstream patches, without any need to upstream something.
Maybe this helps to explain:

<P1>
<some code>
<stop the workqueue (and wait until done)>
<more code>
<remove the interface (which is suspected of restarting the workqueue)>
<event more code>
<P2>

The first patch to fix this waits for the workqueue being really stopped. The first attempt on the second patch moved the workqueue stop to <P2> (after the interface removal). The upstream change moved interface removal to <P1>. Effectively both is the same in the way that it makes the function calls done in the right order. Though the upstream patch also adds some more checking and calls some things only depending n that check. Which should be even better.

Stefan Bader (smb) wrote :

This was my mistake. It is actually fixed upstream. Even as the solution is different from what I initially thought to be necessary.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Stefan Bader (smb) on 2009-07-02
description: updated
tags: removed: upstream-pending
Pacovi (pacovi) wrote :

I've also tested it (the v3) and no problem when loading and unloading the
module.

2009/7/2 Stefan Bader <email address hidden>

> ** Description changed:
>
> + SRU Justification:
> +
> + Impact: Removing the iwl3945 module can cause a hard lockup of the
> + system as there is a race condition which might get the rfkill_poll
> + workqueue restarted/running after the module code has been unloaded.
> +
> + Fix: Two patches from upstream, the first moving the call that restarts
> + the workqueue on the way down to be executed before stopping the
> + workqueue. The second patch makes the call stopping the workqueue wait
> + for any running worker.
> +
> + Testcase: Removing and loading the module in a loop (has been verified
> + to be working after the patches being applied).
> +
> + ---
> +
> I can reliably reproduce a total system lock-up (no response to SysReq
> keys) when-ever the module is removed. It occurs about 3 seconds after
> the removal has completed whilst the notification of removal is still
> on-screen.
>
> sudo modprobe -r iwl3945
>
> sudo modprobe iwl3945
>
> uname -a
> Linux hephaestion 2.6.28-11-generic #35-Ubuntu SMP Wed Mar 18 21:55:34 UTC
> 2009 x86_64 GNU/Linux
> lspci -nn | grep -i Wireless
> 06:00.0 Network controller [0280]: Intel Corporation PRO/Wireless 3945ABG
> [Golan] Network Connection [8086:4222] (rev 02)
>
> Because the lock-up is total the log files aren't synced and therefore
> the last 5 seconds or so of messages are not in the logs to help
> diagnose it. Tomorrow I shall use a netconsole connection to try to get
> more information.
>
> ** Tags removed: upstream-pending
>
> --
> modprobe -r iwl3945 causes total system freeze
> https://bugs.launchpad.net/bugs/345710
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Stefan Bader (smb) on 2009-07-03
Changed in linux (Ubuntu Jaunty):
status: In Progress → Fix Committed
Martin Pitt (pitti) wrote :

Accepted linux into jaunty-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

tags: added: verification-needed

*** Bug 498769 has been marked as a duplicate of this bug. ***

mcarni (mcarni) wrote :

Martin,

I am not sure I did everything right, please let me know if I need to change something.
I enabled jaunty proposed and installed the latest linux image

Linux Alyosha3 2.6.28-13-generic #45-Ubuntu SMP Tue Jun 30 19:49:51 UTC 2009 i686 GNU/Linux

but unfortunately after typing "sudo rmmod iwl3945" the sytem froze.

I now reverted back to Stefan's V03 and no mroe system freezes when rmmod iwl3945

Hope this helps

M

> Linux Alyosha3 2.6.28-13-generic #45-Ubuntu SMP Tue Jun 30 19:49:51 UTC
> 2009 i686 GNU/Linux

I believe there must be something wrong. The version in proposed would be
-14.46. This looks like the version in updates/security.

mcarni (mcarni) wrote :

I enabled "proposed updates" (jaunty-proposed) in my sources and then I enabled also "unsupported updates" (jaunty-backports), just in case.
But still I see only "Version 2.6.28-13.45" in my update manager.

let me know, so far I have done everything using the GUI, i can try the CLI but I will need some guidance

Thanks

M

Stefan Bader (smb) wrote :

Doh, sorry. I think I know what is wrong. You don't see the files automatically
until the meta-package gets done. You can remove the backports from your list.
I hope meta gets done soon...

Martin Pitt (pitti) wrote :

Stefan Bader [2009-07-14 19:50 -0000]:
> I hope meta gets done soon...

Accepted last night, should be available now.

mcarni (mcarni) wrote :

Hi,

I updated to the image in the proposed repos:

Linux Alyosha3 2.6.28-14-generic #46-Ubuntu SMP Wed Jul 8 07:21:34 UTC 2009 i686 GNU/Linux

I did "rmmod" and "modprobe iwl3945" and system is up and running.

well done!!

thanks

M

Martin Pitt (pitti) on 2009-07-18
tags: added: verification-done
removed: verification-needed

This bug should be fixed by:

commit d552bfb65241a35d48e44ddb0d27e0454f579ab4
Author: Kolekar, Abhijeet <email address hidden>
Date: Fri Dec 19 10:37:41 2008 +0800

    iwl3945: release resources before shutting down

The cause is described in BZ 501117. While this bug (way to reproduce and oops message) is a bit different than bug 501117, fix should work here. I tested on my laptop on i686. If someone else would like to test, here is koji build:

http://koji.fedoraproject.org/koji/taskinfo?taskID=1494153

(In reply to comment #12)
> This bug should be fixed by:
>
> commit d552bfb65241a35d48e44ddb0d27e0454f579ab4
> Author: Kolekar, Abhijeet <email address hidden>
> Date: Fri Dec 19 10:37:41 2008 +0800
>
> iwl3945: release resources before shutting down
>
> The cause is described in BZ 501117. While this bug (way to reproduce and oops
> message) is a bit different than bug 501117, fix should work here. I tested on
> my laptop on i686. If someone else would like to test, here is koji build:
>
> http://koji.fedoraproject.org/koji/taskinfo?taskID=1494153

The patch works for me too - Toshiba Satellite A200 with 3945ABG (i686). Hope we'll see it in testing soon. :)

As bug is now fixed in 2.6.29.6-217.2.3.fc11 I'm closing this bug report.

Changed in linux (Fedora):
status: In Progress → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (7.2 KiB)

This bug was fixed in the package linux - 2.6.28-15.48

---------------
linux (2.6.28-15.48) jaunty-proposed; urgency=low

  [ Andy Whitcroft ]

  * SAUCE: pnp: add PNP resource range checking function
    - LP: #349314
  * SAUCE: i915: enable MCHBAR if needed
    - LP: #349314

  [ Brad Figg ]

  * SAUCE: Add information to recognize Toshiba Satellite Pro M10 Alps
    Touchpad
    - LP: #330885

  [ Colin Ian King ]

  * Input: atkbd - add forced release keys quirk for Samsung Q45
    - LP: #347623

  [ Manoj Iyer ]

  * SAUCE: Added quirk to enable the installer to recognize NetXen NIC.
    - LP: #389603

  [ Stefan Bader ]

  * SAUCE: input: Blacklist digitizers from joydev.c
    - LP: #300143

  [ Tim Gardner ]

  * Revert "SAUCE: md: wait for possible pending deletes after stopping an
    array"
    - LP: #334994

  [ Upstream Kernel Changes ]

  * bonding: Fix updating of speed/duplex changes
    - LP: #371651
  * net: fix sctp breakage
    - LP: #371651
  * ipv6: don't use tw net when accounting for recycled tw
    - LP: #371651
  * ipv6: Plug sk_buff leak in ipv6_rcv (net/ipv6/ip6_input.c)
    - LP: #371651
  * netfilter: nf_conntrack_tcp: fix unaligned memory access in tcp_sack
    - LP: #371651
  * xfrm: spin_lock() should be spin_unlock() in xfrm_state.c
    - LP: #371651
  * bridge: bad error handling when adding invalid ether address
    - LP: #371651
  * bas_gigaset: correctly allocate USB interrupt transfer buffer
    - LP: #371651
  * USB: EHCI: add software retry for transaction errors
    - LP: #371651
  * USB: fix USB_STORAGE_CYPRESS_ATACB
    - LP: #371651
  * USB: usb-storage: increase max_sectors for tape drives
    - LP: #371651
  * USB: gadget: fix rndis regression
    - LP: #371651
  * USB: add quirk to avoid config and interface strings
    - LP: #371651
  * cifs: fix buffer format byte on NT Rename/hardlink
    - LP: #371651
  * b43: fix b43_plcp_get_bitrate_idx_ofdm return type
    - LP: #371651
  * Add a missing unlock_kernel() in raw_open()
    - LP: #371651
  * x86, PAT, PCI: Change vma prot in pci_mmap to reflect inherited prot
    - LP: #371651
  * security/smack: fix oops when setting a size 0 SMACK64 xattr
    - LP: #371651
  * x86, setup: mark %esi as clobbered in E820 BIOS call
    - LP: #371651
  * dock: fix dereference after kfree()
    - LP: #371651
  * mm: define a UNIQUE value for AS_UNEVICTABLE flag
    - LP: #371651
  * mm: do_xip_mapping_read: fix length calculation
    - LP: #371651
  * vfs: skip I_CLEAR state inodes
    - LP: #371651
  * net/netrom: Fix socket locking
    - LP: #371651
  * kprobes: Fix locking imbalance in kretprobes
    - LP: #371651
  * netfilter: {ip, ip6, arp}_tables: fix incorrect loop detection
    - LP: #371651
  * ALSA: hda - add missing comma in ad1884_slave_vols
    - LP: #371651
  * SCSI: libiscsi: fix iscsi pool error path
    - LP: #371651
  * SCSI: libiscsi: fix iscsi pool error path again
    - LP: #371651
  * posixtimers, sched: Fix posix clock monotonicity
    - LP: #371651
  * sched: do not count frozen tasks toward load
    - LP: #371651
  * spi: spi_write_then_read() bugfixes
    - LP: #371651
  * powerpc: Fix data-corrupting bug in __futex_atomic_op
    - LP...

Read more...

Changed in linux (Ubuntu Jaunty):
status: Fix Committed → Fix Released
Changed in linux (Fedora):
importance: Unknown → Medium
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.