Kernel Panic, in gdth (RAID) driver on reboot

Bug #199934 reported by Phoenix on 2008-03-08
34
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Stefan Bader
Hardy
High
Stefan Bader
linux-meta (Ubuntu)
Undecided
Unassigned
Hardy
Undecided
Unassigned

Bug Description

BTW: this is hardy

[ 205.798259] Oops: 0000 [#1] SMP
[ 205.837054] Modules linked in: bridge dm_crypt dm_mod ipv6 kvm_intel kvm sha256_generic aes_i586 aes_generic geode_aes af_pp
[ 206.669196]
[ 206.686942] Pid: 7818, comm: reboot Not tainted (2.6.24-11-server #1)
[ 206.763841] EIP: 0060:[<f88e1444>] EFLAGS: 00010002 CPU: 0
[ 206.829427] EIP is at gdth_copy_internal_data+0xe4/0x200 [gdth]
[ 206.900196] EAX: 00000000 EBX: 000001d8 ECX: 000001d8 EDX: 000001d8
[ 206.975118] ESI: 000001d8 EDI: 00000000 EBP: 00000000 ESP: f61998b4
[ 207.049937] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 207.114378] Process reboot (pid: 7818, ti=f6198000 task=f61e8000 task.ti=f6198000)
[ 207.202689] Stack: 00000000 00000000 f6199904 c0412424 c0c002a8 00000001 f7c08f30 00000000
[ 207.303342] 00000000 00000001 c0c00278 000001d8 f6199922 f5534624 f5534634 00000000
[ 207.404000] c0223ecd 00000001 f6199922 0000000f 000001d8 c0c002a8 f608f800 00000000
[ 207.504659] Call Trace:
[ 207.536006] [<c0223ecd>] number+0x35d/0x370
[ 207.587161] [<f88e291c>] gdth_next+0x27c/0xad0 [gdth]
[ 207.648595] [<c02248e3>] vsnprintf+0x453/0x610
[ 207.702870] [<c0224765>] vsnprintf+0x2d5/0x610
[ 207.757142] [<f88e18ef>] gdth_putq+0x8f/0x150 [gdth]
[ 207.817640] [<f88e31a5>] __gdth_queuecommand+0x35/0x40 [gdth]
[ 207.887479] [<f88e3927>] __gdth_execute+0x117/0x150 [gdth]
[ 207.954209] [<f88e3ab4>] gdth_execute+0x34/0x50 [gdth]
[ 208.016777] [<f88e3b30>] gdth_flush+0x60/0xc0 [gdth]
[ 208.077383] [<c030e279>] fn_hash_lookup+0x19/0xc0
[ 208.134764] [<c03302f8>] _spin_lock_bh+0x8/0x20
[ 208.190075] [<c02d7b41>] rt_intern_hash+0x371/0x480
[ 208.249535] [<c030f790>] fib4_rule_action+0x0/0x70
[ 208.307960] [<c02c86e3>] fib_rules_lookup+0xa3/0x100
[ 208.368462] [<c0223ecd>] number+0x35d/0x370
[ 208.419625] [<c0224765>] vsnprintf+0x2d5/0x610
[ 208.473891] [<c0131c8a>] release_console_sem+0x1aa/0x1d0
[ 208.538545] [<f8c28af0>] br_nf_local_in+0x0/0x70 [bridge]
[ 208.604328] [<c02d52d7>] nf_iterate+0x57/0x80
[ 208.657567] [<f88e4d09>] gdth_halt+0x79/0x100 [gdth]
[ 208.718063] [<f8c23faf>] br_flood+0x9f/0xc0 [bridge]
[ 208.778665] [<f8c240e0>] __br_forward+0x0/0x70 [bridge]
[ 208.842383] [<f8c24bd9>] br_handle_frame_finish+0x139/0x140 [bridge]
[ 208.919485] [<c02b8d10>] netif_receive_skb+0x0/0x440
[ 208.979983] [<f8c24d15>] br_handle_frame+0x135/0x1e0 [bridge]
[ 209.049820] [<f8c24aa0>] br_handle_frame_finish+0x0/0x140 [bridge]
[ 209.124951] [<c014cf1a>] clocksource_get_next+0x3a/0x40
[ 209.188559] [<c014b853>] update_wall_time+0x2e3/0x840
[ 209.249993] [<f8a8535e>] sky2_rx_submit+0x2e/0x90 [sky2]
[ 209.314645] [<c010e7f3>] sched_clock+0x13/0x30
[ 209.368919] [<c0127227>] update_curr+0x147/0x150
[ 209.425265] [<c014c164>] getnstimeofday+0x34/0xf0
[ 209.482648] [<c011c27c>] lapic_next_event+0xc/0x10
[ 209.541070] [<c014e5d3>] clockevents_program_event+0xa3/0x150
[ 209.610911] [<c0149a0e>] ktime_get_ts+0x1e/0x60
[ 209.666222] [<c013ad70>] run_timer_softirq+0x30/0x1e0
[ 209.727756] [<c014f6f8>] tick_program_event+0x38/0x60
[ 209.789294] [<c01371ad>] tasklet_action+0x4d/0xc0
[ 209.846678] [<c0136c52>] __do_softirq+0x82/0x110
[ 209.903028] [<c0137001>] irq_exit+0x51/0x80
[ 209.954185] [<c011c825>] smp_apic_timer_interrupt+0x55/0x80
[ 210.021949] [<c0108ebc>] apic_timer_interrupt+0x28/0x30
[ 210.085562] [<c0224fd0>] delay_tsc+0x20/0x50
[ 210.137758] [<c0224f76>] __delay+0x6/0x10
[ 210.186841] [<f8875b5d>] md_notify_reboot+0xcd/0xe0 [md_mod]
[ 210.255746] [<c0332730>] notifier_call_chain+0x30/0x60
[ 210.318320] [<c014aa8a>] __blocking_notifier_call_chain+0x4a/0x70
[ 210.392311] [<c014aac7>] blocking_notifier_call_chain+0x17/0x20
[ 210.464222] [<c013fa11>] kernel_restart_prepare+0x11/0x30
[ 210.529908] [<c013fb7b>] kernel_restart+0xb/0x50
[ 210.586255] [<c01efeb9>] security_capable+0x9/0x10
[ 210.644677] [<c0139908>] __capable+0x8/0x20
[ 210.695734] [<c013fda6>] sys_reboot+0x1d6/0x200
[ 210.751045] [<c02bb7ad>] netdev_run_todo+0x13d/0x250
[ 210.811544] [<c02b9bdd>] dev_change_flags+0x11d/0x190
[ 210.873081] [<c02b8194>] __dev_get_by_name+0x74/0x90
[ 210.933579] [<c030417c>] devinet_ioctl+0x1ec/0x690
[ 210.992001] [<c02bab1e>] dev_ioctl+0x14e/0x530
[ 211.046276] [<c01be66e>] invalidate_inode_buffers+0xe/0xc0
[ 211.113000] [<c03302f8>] _spin_lock_bh+0x8/0x20
[ 211.168313] [<c01ababc>] dput+0x1c/0x100
[ 211.216354] [<c019b7f4>] __fput+0x114/0x170
[ 211.267515] [<c01b0783>] mntput_no_expire+0x13/0x70
[ 211.326975] [<c0198729>] filp_close+0x49/0x80
[ 211.380210] [<c0199dae>] sys_close+0x6e/0xd0
[ 211.432408] [<c0108412>] syscall_call+0x7/0xb
[ 211.485645] =======================
[ 211.528395] Code: 84 b3 00 00 00 66 83 44 24 3c 01 f6 45 00 02 0f 84 f3 00 00 00 31 ed 0f b7 74 24 44 66 39 74 24 3c 0f 84
[ 211.760530] EIP: [<f88e1444>] gdth_copy_internal_data+0xe4/0x200 [gdth] SS:ESP 0068:f61998b4
[ 211.861503] Kernel panic - not syncing: Fatal exception
[ 211.923974] Rebooting in 15 seconds..

Furthermore the reboot in 15 seconds does not work.

KWAndi (lst-hoe01) wrote :

We have nearly the same error with Ubuntu 8.04 (Server) on reboot/halt with Intel SRCU 32 (ICP-Vortex GDT8523RZ) RAID Controller on a Tyan Thunder h2000M Board. The same machine works fine with Ubuntu 6.06 LTS both with 32 Bit and AMD64 version. The last line is "Segmentation fault" in our case.

KWAndi (lst-hoe01) on 2008-04-23
Changed in linux-meta:
status: New → Confirmed
MartinK (kopp01) wrote :

I have the same problem with all my ICP-Vortex Raid Controllers and not only on Ubuntu 8.04 Server, Fedora 8 with the latest kernel is affected too.
This bug was introduced with kernel 2.6.24 (with kernel 2.6.23.15-137 on fedora 8 everything was OK)
Reading the changelog from www.kernel.org there are things fixed in 2.6.25
Hopefully with the next kernel update to 2.6.25 this is fixed.

MartinK (kopp01) wrote :

From the changelog of www.kernel.org kernel 2.6.25 -> so this bug should be fixed with kernel 2.6.25

commit 1b96f8955aaeeb05f7fb7ff548aa12415fbf3904
Author: Sven Schnelle <email address hidden>
Date: Mon Mar 10 22:50:04 2008 +0100

    [SCSI] gdth: Allocate sense_buffer to prevent NULL pointer dereference

    Fix NULL pointer dereference during execution of Internal commands,
    where gdth only allocates scp, but not scp->sense_buffer. The rest of
    the code assumes that sense_buffer is allocated, which leads to a kernel
    oops e.g. on reboot (during cache flush).

    Signed-off-by: Sven Schnelle <email address hidden>
    Signed-off-by: James Bottomley <email address hidden>

Changed in linux-meta:
status: Confirmed → In Progress
Fabien (fabien-ubuntu) wrote :

I can confirm the problem with ICP GDT-8524RZ.

Fabien (fabien-ubuntu) wrote :

BTW, it's weird that this bug is not assigned and has no status... It's quite critical : the RAID controllers are only in servers. Not being able to reboot or power-off a server without getting a kernel crash is really problematic!

Fabien (fabien-ubuntu) wrote :

I think there are enough reporters now to confirm the problem :)

Changed in linux-meta:
status: In Progress → Confirmed
Fabien (fabien-ubuntu) wrote :

This is a diff between ubuntu linux_2.6.24-16.30 source and the official source of linux 2.6.24.7 for the gdth* files...

Fabien (fabien-ubuntu) wrote :

WOW, it doesn't just affect halt/reboot !
Try to run icpcon with linux_2.6.24-16.30 and it will instantly crash the kernel (soft lock blah blah...).

Ok, here's how to fix everything :
1) get *official* 2.6.24.7 kernel and build a new kernel package (make-kpkg works fine if you don't enable XEN in your configuration)

2) install it

3) remove bogus ubuntu kernel :
# apt-get remove linux-image-2.6.24-16-server
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be REMOVED:
  linux-image-2.6.24-16-server linux-image-server linux-server linux-ubuntu-modules-2.6.24-16-server
0 upgraded, 0 newly installed, 4 to remove and 0 not upgraded.
After this operation, 77.9MB disk space will be freed.
Do you want to continue [Y/n]?
(Reading database ... 34538 files and directories currently installed.)
Removing linux-server ...
Removing linux-image-server ...
Removing linux-ubuntu-modules-2.6.24-16-server ...

4) Relax :)
No, I'm not sarcastic ;-)

To be more serious, 2.6.24.7 works perfectly and I can run icpcon and reboot my server like in the good old days. Great, isn't it ?

MartinK (kopp01) wrote :

I can confirm that using the kernel 2.6.25.2 from www.kernel.org fixes the problem.
What is the next step in ubuntu 8 ?

KWAndi (lst-hoe01) wrote :

Is something going on with this? We like to switch to Ubuntu 8.04 LTS for our servers, but most of them are equipped with ICP/Intel RAID controller...

Changed in linux:
status: New → Confirmed
JoeX (jdinner) wrote :

What is required or missing to get this into Ubuntu 8.04 LTS?

Fabien (fabien-ubuntu) wrote :

There are more than 1300 bugs registered for the linux package in Ubuntu, so I guess it's not that easy...
But, it's true that doesn't look very "professional" for a long term support distribution.

I think the best is to build our own kernel and that's it :)

dmatrix7 (dmatrix7) wrote :

Confirmed problem here too on Sun Fire VX60s with Intel RAID controller. I have 6 of these servers that I would like to move to Ubuntu 8.04 LTS, but cannot with this bug. Any ETA on a fix?

Building your own kernel (2.6.25 upwards) fixes the problem, but there will be no more ubuntu updates for the self built kernel.
I'm wondering why ubuntu 8.04 is still using kernel 2.6.24 - fedora 8 switched to 2.6.25 some time ago ?

Changed in linux:
assignee: nobody → colin-king
importance: Undecided → High
Colin Ian King (colin-king) wrote :

Hi,

I've put up some kernel packages (linux - 2.6.24-20.35cking7 with a linux-image that contains two relevant patches) in my PPA at https://launchpad.net/~colin-king/+archive

Please try this kernel and report any success/regressions. If this fixes the problem I can add the patch to Hardy.

Thanks. Colin

KWAndi (lst-hoe01) wrote :

Hello

i have set up a test server and used

deb http://ppa.launchpad.net/colin-king/ubuntu hardy main
deb-src http://ppa.launchpad.net/colin-king/ubuntu hardy main

in /etc/apt/sources.list to get the patched kernel with "apt-get update" and "apt-get upgrade". Unfortunately the error is still the same...

Regards

Andi

Fabien (fabien-ubuntu) wrote :

Hi,

Sorry, but I won't be able to test them : the servers are in production now...
I will try it if I find an available server.

dmatrix7 (dmatrix7) wrote :

No luck here as well with these patched kernels. I confirmed I have the proper ones installed but both kernel panic on shutdown/reboot still.

ii linux-image-2.6.24-19-server 2.6.24-19.34cking9 Linux kernel image for version 2.6.24 on x86
ii linux-image-2.6.24-20-server 2.6.24-20.35cking7 Linux kernel image for version 2.6.24 on x86

MartinK (kopp01) wrote :

Still the same problems for me with both kernels

linux-image-2.6.24-19-server 2.6.24-19.34cking9
linux-image-2.6.24-20-server 2.6.24-20.35cking7

Colin Ian King (colin-king) wrote :

Thanks for testing this out. I shall re-investigate. Colin

JoeX (jdinner) wrote :

Same result here.
The easiest way to get it fixed is to take over all the gdth*.* source files form the vanilla kernel.
Unfortunateley they did bundle the latest changes also with some clean up work.
So extracting the related part might become a bit weired.

Regards, Joe

Colin Ian King (colin-king) wrote :

Hi,

I've built another kernel linux - 2.6.24-20.37cking4 which is available in my PPA. It includes the following commits that exist from the Hardy 8.04.1 kernel up to the 2.6.24.7 kernel that seems to work. For reference sake, the commits including in this build are as follows:

230e886e7bd663ff2e83cdeede12d7f09b9d3711
99109301d103fbf0de43fc5a580a406c12a501e0
b31ddd31c266c2ad1b708cad0d3d8e0aa7fa2737
ee54cc6af95a7fa09da298493b853a9e64fa8abd
4c9c8d782c8dddc5e97d33210e8a993cec6bc168
cff2680643f9288a1cd4e27c241e1da51f476d66
d35055a0f2637f29f95001a67b464fe833b09ebc
2d6f0d0cd94f9b8b24102300d8dd9cbbd1688826
a85591fd0baf4ed3f03ee1aaac6a985e400cf089

However, I have not applied commit 1b96f8955aaeeb05f7fb7ff548aa12415fbf3904 as this requires changes to the include/scsi/scsi_cmnd.h across all SCSI subsystems which is a too intrusive a change (since it needs to be verified across all the SCSI subsystems that it affects).

I hope these patches get the driver into a more stable state.

Please test and let me know the results as I do not have the hardware here to test these patches.

Thanks, Colin.

Zitat von Colin King <email address hidden>:

> Hi,
>
> I've built another kernel linux - 2.6.24-20.37cking4 which is available
> in my PPA. It includes the following commits that exist from the Hardy
> 8.04.1 kernel up to the 2.6.24.7 kernel that seems to work. For
> reference sake, the commits including in this build are as follows:

Hmm...
I tried with "apt-get upgrade" but get no new kernel offered??

How is it possible to access the new version from a (test-)machine
which have already installed the previous test versions?

Many thanks for your efforts

Andi

--
All your trash belong to us ;-) www.spamschlucker.org
To: <email address hidden>

MartinK (kopp01) wrote :

the problem is still the same...

the upgrade works without problems - new kernel linux - 2.6.24-20.37cking4 installed -

last instruction executed still:
gdth_copy_internal_data+0xe4/0x200 [gdth]
results in /etc/rc6.d/S90/reboot: line 17: 4621 Segementation fault reboot -d -f -i

the same with icpcon

last instruction executed:
gdth_copy_internal_data+0xe4/0x200 [gdth] -> Segmentation fault

thanks for the effort anyway

btw: kernel 2.6.26 is out

Colin Ian King (colin-king) wrote :

MartinK,

Thanks for trying this out. Can you give me the dmesg dump of the crash so that I can look at the assembler stack dump in detail against my PPA build to allow me to tease this one out a bit.

I suspect that commit 1b96f8955aaeeb05f7fb7ff548aa12415fbf3904 may be the fix to this, but it's hard to include this because of the changes it makes to a whole lot of the SCSI drivers across the board.

Thanks, Colin

MartinK (kopp01) wrote :

This is the dump from a icpcon crash (i don't know how to do a dump from a reboot crash...)

[ 104.159853] Pid: 4307, comm: icpcon Not tainted (2.6.24-20-server #1)
[ 104.162724] EIP: 0060:[<e0899444>] EFLAGS: 00010002 CPU: 0
[ 104.165601] EIP is at gdth_copy_internal_data+0xe4/0x200 [gdth]
[ 104.168254] EAX: 00000000 EBX: 000001d8 ECX: 000001d8 EDX: 000001d8
[ 104.171108] ESI: 000001d8 EDI: 00000000 EBP: 00000000 ESP: de88ba34
[ 104.174029] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 104.176949] Process icpcon (pid: 4307, ti=de88a000 task=de1585c0 task.ti=de88a000)
[ 104.179993] Stack: 00000000 00000000 c049b480 c04d2f00 c0c002a8 de88bb14 00000000 c1407c80
[ 104.185927] 00000000 00000001 c0c00278 000001d8 c1394c40 ffffffff 00000000 00000000
[ 104.192478] de13accc 00000001 00000000 de88baf0 000001d8 c0c002a8 de5d7200 00000000
[ 104.199295] Call Trace:
[ 104.205871] [<e089a91c>] gdth_next+0x27c/0xad0 [gdth]
[ 104.209403] [<c01962fe>] __slab_alloc+0x2ee/0x4a0
[ 104.212930] [<c01757cd>] mempool_alloc+0x2d/0xe0
[ 104.216379] [<e08998ef>] gdth_putq+0x8f/0x150 [gdth]
[ 104.219507] [<e089b1a5>] __gdth_queuecommand+0x35/0x40 [gdth]
[ 104.222853] [<e089b927>] __gdth_execute+0x117/0x150 [gdth]
[ 104.226165] [<e089c312>] gdth_ioctl+0x782/0x10a0 [gdth]
[ 104.229457] [<c01757cd>] mempool_alloc+0x2d/0xe0
[ 104.232785] [<c01751d3>] filemap_fault+0x213/0x420
[ 104.235696] [<c012617c>] kmap_atomic+0x1c/0x30
[ 104.238774] [<c012617c>] kmap_atomic+0x1c/0x30
[ 104.241718] [<c018089f>] __do_fault+0x2af/0x4c0
[ 104.244585] [<c018389b>] handle_mm_fault+0x21b/0xb80
[ 104.247377] [<c022023f>] kobject_get+0xf/0x20
[ 104.250149] [<c01a69f8>] do_ioctl+0x78/0x90
[ 104.252594] [<c01a6c3e>] vfs_ioctl+0x22e/0x2b0
[ 104.255237] [<c01a6d16>] sys_ioctl+0x56/0x70
[ 104.257817] [<c010839a>] sysenter_past_esp+0x6b/0xa1
[ 104.260380] [<c0330000>] _spin_lock_irq+0x10/0x20
[ 104.262941] =======================
[ 104.265412] Code: 84 b3 00 00 00 66 83 44 24 3c 01 f6 45 00 02 0f 84 f3 00 00 00 31 ed 0f b7 74 24 44 66 39 74 24 3c 0f 84 91 00 00 00 01 5c 24 10 <0f> b7 7d 10 0f b7 44 24 3e 0f b7 d7 01 d0 0f b7 54 24 2c 66 2b
[ 104.272938] EIP: [<e0899444>] gdth_copy_internal_data+0xe4/0x200 [gdth] SS:ESP 0068:de88ba34
[ 104.277926] ---[ end trace a8732abe4e478729 ]---

rogerb (info-connective) wrote :

Hi

Sorry for my bad/short english ;-)

I can confirm the problem as described above with ICP-Vortex GDT8546RZ.

After that bug i switched to Debian testing. The 2.6.24 Kernel works without crashing on reboot. Probably you will find the difference.

cat /proc/version shows:

Linux version 2.6.24-1-amd64 (Debian 2.6.24-7) (<email address hidden>) (gcc version 4.1.3 20080114 (prerelease) (Debian 4.1.2-19)) #1 SMP Sat May 10 09:28:10 UTC 2008

Package:

linux-image-2.6.24-1-amd64

Regards,

Roger

Colin Ian King (colin-king) wrote :

MartinK,

Teasing out which ioctl() is causing the driver to crash is not so straight forward as I first imagined.

Is it possible to attach strace onto icpcon and do a shutdown so that I can see what ioctl() is causing the driver to crash? Something like:

sudo strace -p `ps -e | grep icpcon | awk '{print $1}'` > strace_icpcon.log

And attaching the strace log would be most helpful.

Thanks, Colin

MartinK (kopp01) wrote :

Hi Colin

The strace on the running icpcon process is not possible - icpcon crashes the system upon invocation - see the dump above.

However i made you an strace_icpcon.log running on a 2.6.26 kernel (kernel.org) - no crash - just entering icpcon and exiting without even connecting to the controller - mark the difference: kernel 2.6.24-20 crashes without even showing the first screen

Hth Martin

Phoenix (phoenix-dominion) wrote :

I was wondering, I reported the bug quite in advance of hardy release, though it didn't got the attention - could I have done anything to spare some folks the headache of running into this? I mean, we talk about Server-Hardware and not about "my USB Rocket Launcher stopped working"....

dmatrix7 (dmatrix7) wrote :

Still looking for a fix to this. If not I guess I will have to go with software RAID and remove these cards from my servers.

Stefan Bader (smb) wrote :

I am working on (yet another) kernel to get this fixed. The fix mentioned in comment #3 really won't help since the buffer in question is static in the codebase of Hardy. However the one in comment #4 really should be a good candidate. The strange thin is that in all the backtraces given the jump location of gdth_next+0x27c/0xad0 [gdth] matches the module without the change. So I would like to make sure I verify the assembly befiore the upload. I hope to have something by tomorrow.

Stefan Bader (smb) wrote :

The kernel is currently (still) building on my PPA (https://launchpad.net/~stefan-bader-canonical/+archive), It is based on the latest kernel in -proposed. Since that required a new ABI version this also requires an updated version of LUM/LBM or LRM to be complete. The packages should be in the archive but might not yet be shown since the meta package has to be updated.

It would be great if someone could try the test kernel. Should the problem persist, please provide a new backtrace, so I can check against that.

Mike Bianchi (mbianchi-foveal) wrote :

On Thu, Aug 14, 2008 at 11:56:35PM -0000, Stefan Bader wrote:
> The kernel is currently (still) building on my PPA
> (https://launchpad.net/~stefan-bader-canonical/+archive), It is based on
> the latest kernel in -proposed. Since that required a new ABI version
> this also requires an updated version of LUM/LBM or LRM to be complete.
> The packages should be in the archive but might not yet be shown since
> the meta package has to be updated.
>
> It would be great if someone could try the test kernel. Should the
> problem persist, please provide a new backtrace, so I can check against
> that.
>
> --
> Kernel Panic, in gdth (RAID) driver on reboot
> https://bugs.launchpad.net/bugs/199934
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in “linux” source package in Ubuntu: Confirmed
> Status in “linux-meta” source package in Ubuntu: Confirmed

Stefan,

Thanks for working on this.

I may be able to test for you.
I can see your PPA area,
 https://launchpad.net/~stefan-bader-canonical/+archive

But cannot see how to download a kernel (either as source or package).

What am I missing? I saw something of having to set up my own PPA, but I
don't understand how that helps.

--
 Mike Bianchi
 Foveal Systems

 973 822-2085 call to arrange Fax

 <email address hidden>
 http://www.AutoAuditorium.com
 http://www.FovealMounts.com

Stefan Bader (smb) wrote :

Hi Mike,

if you follow the link to the PPA there are two lines under the heading apt sources.list entries which have to be added to /etc/apt/sources.list to get the kernel offered by a normal update.

Or if you prefer to install manually follow the first of those links into pool/main/l/linux/. There you get all the packages.

Changed in linux:
assignee: colin-king → stefan-bader-canonical
dmatrix7 (dmatrix7) wrote :

This PPA kernel allows my systems to reboot cleanly without any kernel panics. I have not tested any icpcon utilities.
Any ETA on this fixed kernel getting into the main repos?

Stefan Bader (smb) on 2008-08-15
Changed in linux-meta:
status: Confirmed → Invalid
Stefan Bader (smb) wrote :

I am starting the SRU process. A second successful verification would be great. Thanks. ETA: if things go well the next -proposed kernel. How long this will be: hard to predict.

Changed in linux:
status: Confirmed → In Progress
dmatrix7 (dmatrix7) wrote :

icpcon utilities also tested and working fine.

Stefan Bader (smb) on 2008-08-15
Changed in linux-meta:
status: New → Invalid
dmatrix7 (dmatrix7) wrote :
Download full text (30.3 KiB)

Actually checking my syslog I see this warning when I start storcon.

Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097092] WARNING: at /build/buildd/linux-2.6.24/arch/x86/kernel/pci-dma_32.c:66 dma_free_coherent()
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097102] Pid: 4793, comm: storcon-2.16.6 Not tainted 2.6.24-21-server #1
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097122] [parport_pc:dma_free_coherent+0x9c/0xa0] dma_free_coherent+0x9c/0xa0
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097142] [<f88cd660>] gdth_ioctl_free+0x60/0x90 [gdth]
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097157] [<f88d12f5>] gdth_ioctl+0x7f5/0x10a0 [gdth]
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097212] [kmap_atomic_prot+0xfc/0x130] kmap_atomic_prot+0xfc/0x130
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097223] [loop:kmap_atomic+0x1c/0x30] kmap_atomic+0x1c/0x30
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097231] [__do_fault+0x2af/0x4c0] __do_fault+0x2af/0x4c0
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097249] [handle_mm_fault+0x21b/0xb80] handle_mm_fault+0x21b/0xb80
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097258] [edac_core:kobject_get+0xf/0x120] kobject_get+0xf/0x20
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097278] [do_ioctl+0x78/0x90] do_ioctl+0x78/0x90
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097285] [vfs_ioctl+0x22e/0x2b0] vfs_ioctl+0x22e/0x2b0
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097293] [sys_ioctl+0x56/0x70] sys_ioctl+0x56/0x70
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097300] [syscall_call+0x7/0x0b] syscall_call+0x7/0xb
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097312] [<c0330000>] _read_unlock_bh+0x0/0x10
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097322] =======================
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097466] WARNING: at /build/buildd/linux-2.6.24/arch/x86/kernel/pci-dma_32.c:66 dma_free_coherent()
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097469] Pid: 4793, comm: storcon-2.16.6 Not tainted 2.6.24-21-server #1
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097473] [parport_pc:dma_free_coherent+0x9c/0xa0] dma_free_coherent+0x9c/0xa0
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097481] [<f88cd660>] gdth_ioctl_free+0x60/0x90 [gdth]
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097492] [<f88d12f5>] gdth_ioctl+0x7f5/0x10a0 [gdth]
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097550] [kmap_atomic_prot+0xfc/0x130] kmap_atomic_prot+0xfc/0x130
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097559] [loop:kmap_atomic+0x1c/0x30] kmap_atomic+0x1c/0x30
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097565] [__do_fault+0x2af/0x4c0] __do_fault+0x2af/0x4c0
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097581] [handle_mm_fault+0x21b/0xb80] handle_mm_fault+0x21b/0xb80
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097592] [edac_core:kobject_get+0xf/0x120] kobject_get+0xf/0x20
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097610] [do_ioctl+0x78/0x90] do_ioctl+0x78/0x90
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097618] [vfs_ioctl+0x22e/0x2b0] vfs_ioctl+0x22e/0x2b0
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097626] [sys_ioctl+0x56/0x70] sys_ioctl+0x56/0x70
Aug 15 14:19:42 quinte-vm4 kernel: [ 3494.097632] [syscall_call+0x7/0x0b] syscall_call+0x7/0xb
Aug 15 14:19:42 quinte-vm4 k...

Stefan Bader (smb) wrote :

Right that was mentioned in the patch. Looking at that it is issued when dma_free_coherent is called with disabled interrupts. I have not seen a patch which has this in its description, yet. But I try to keep that in mind.

Stefan Bader (smb) wrote :

Ok, just found it:

commit ff83efacf2b77a1fe8942db6613825a4b80ee5e2
Author: James Bottomley <email address hidden>
Date: Sun Feb 17 11:24:51 2008 -0600

    [SCSI] gdth: don't call pci_free_consistent under spinlock

    The spinlock is held over too large a region: pscratch is a permanent
    address (it's allocated at boot time and never changes). All you need
    the smp lock for is mediating the scratch in use flag, so fix this by
    moving the spinlock into the case where we set the pscratch_busy flag
    to false.

    Cc: Stable Tree <email address hidden>
    Signed-off-by: James Bottomley <email address hidden>

Mike Bianchi (mbianchi-foveal) wrote :

I'm trying to test if the gdth SCSI disk controller still dumps at the end
of /sbin/poweroff or /sbin/reboot .

I don't routinely compile kernels but I have been successful in the past,
so I thought I would try.

I have built the 2.6.24.2 kernel:
 cp /boot/config-2.6.24-19-generic /usr/src/linux-2.6.24.2/.config

 make all
 make install
 make modules-intstall

and got
 /boot/System.map-2.6.24.2
 /boot/config-2.6.24.2
 /boot/vmlinuz-2.6.24.2

But I don't think I built the initramfs properly. I used
 update-initramfs -c -k 2.6.24.2
to create
 /boot/initrd.img-2.6.24.2

But the results look _very_big_ when compared to the others:

-rw-r--r-- 1 root root 7907485 2008-08-13 14:56 initrd.img-2.6.24-19-generic
-rw-r--r-- 1 root root 49159103 2008-08-16 16:30 initrd.img-2.6.24.2

(hd0,5) is a SCSI disk
from /boot/grub/menu.lst

 title Ubuntu 8.04.1, kernel 2.6.24.2 Hardy
 root (hd0,5)
 kernel /boot/vmlinuz-2.6.24.2 \
  root=UUID=c3bf9923-8fd9-45fe-9c7d-929644751f79 ro vga=791
 initrd /boot/initrd.img-2.6.24.2
 quiet
 savedefault

And when I start the kernel, I see Grub flash Starting Up
and then nothing; no image, no disk rattle.

Does anyone see what I'm missing?

--
 Mike Bianchi

KWAndi (lst-hoe01) wrote :

@Stefan Bader :

Installing your kernel 2.6.24-24-server fixed the problem with the SRCU32 and the older GDT6518RS. The console utility "icpcon" is working fine as it should. How can one get a notify if the fix is getting in the main repo??

Thanxs

Andreas

Stefan Bader (smb) wrote :

The relevant fixes are already in Intrepid.

Changed in linux:
assignee: nobody → stefan-bader-canonical
importance: Undecided → High
status: New → In Progress
status: In Progress → Fix Released
Stefan Bader (smb) wrote :

SRU justification:

Impact: There are changes in the stable 2.6.24 tree that avoid problems with
the gdth SCSI driver. Without these changes, the kernel will panic on reboot or
when issuing certain control commands (see bug report)

Fix: The panic is fixed by:
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.24.y.git;a=commitdiff;h=94429518999e4e6b8f84807afa4bf089a63da0b4

but mentiones a WARN_ON triggered. This is addressed by another (earlier) patch in the stable tree:

http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.24.y.git;a=commitdiff;h=461bab342d2601c5e032f85b27e66beafef66ff8

Testcase: See bug report.

Mike Bianchi (mbianchi-foveal) wrote :

I acquired
<br> linux-image-2.6.24-21-generic
<br> linux-headers-2.6.24-21
<br> linux-headers-2.6.24-21-generic
<br> linux-ubuntu-modules-2.6.24-21-generic
<br> linux-restricted-modules-2.6.24-21-generic
from
<br> deb http://us.archive.ubuntu.com/ubuntu/ hardy-proposed restricted main multiverse universe
<P>
The kernel halt and reboot now work as expected.

Steve Langasek (vorlon) wrote :

Accepted into -proposed, please test and give feedback here. Please see https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you in advance!

KWAndi (lst-hoe01) wrote :

Tested with Kernel 2.6.24-21.42 from -proposed and halt/reboot and the "icpcon" utility works fine on 32-Bit Ubuntu 8.04 server. I will test 64Bit (AMD) version soon.

Andi

MartinK (kopp01) wrote :

Works fine ! (icpcon and reboot)
Tested with Kernel 2.6.24-21-server (32Bit) from hardy-proposed with a ICP GDT6518RS.
Thanks a lot !

Martin

Richard M (rnmixon) wrote :

I'm a bit of a noobie with Ubuntu. Was moving one of my servers over to Ubuntu server 8.0.4 when this issue hit us with our Intel SRCU42L RAID controller.

I followed the instruction on the link at:
  https://wiki.ubuntu.com/Testing/EnableProposed

and now I've gone into aptitude after enabling the -proposed, like so
  sudo aptitude -t hardy-proposed

Is this list of packages all that is needed (from the post by Mike Bianchi on 8/21):
  linux-image-2.6.24-21-generic
  linux-headers-2.6.24-21
  linux-headers-2.6.24-21-generic
  linux-ubuntu-modules-2.6.24-21-generic
  linux-restricted-modules-2.6.24-21-generic

I do not find the last package. Tried substituting "proposed" for "restricted" that does not work either.

Thanks in advance for help.
Thanks.

Richard M (rnmixon) wrote :

OK, I just tried it. In spite of not finding the "linux-restricted-modules-2.6..24-21-generic". Now restart works, and I'm trying to install a version of srcd and storcon.

KWAndi (lst-hoe01) wrote :

Tested with AMD x64 and it is working fine. Controller used Intel SRCU32.

Thanxs

Richard M (rnmixon) wrote :

Does anyone know where to find an AMD x64 version of srcd and storcon? The Intel site only appears to have 32-bit and I cannot find the original ICP manufacturer site. Thanks for any ideas/suggestions.

Martin Pitt (pitti) wrote :

linux 2.6.24-21 copied to hardy-updates.

Changed in linux:
status: Fix Committed → Fix Released
JayReeder (jayreeder-yahoo) wrote :

Getting this same bug now with 10.10

Stefan Bader (smb) wrote :

Strictly speaking it cannot be the same bug because the patches for fixing this bug were upstream around 2.6.25. So could you please open a new bug with "ubuntu-bug linux" (if possible). At least the panic, dmesg in general should be there. Also (if known) whether this worked in 10.04.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers