kernel 2.6.20-15-generic oops in ext3

Bug #108647 reported by Ruben Garcia
8
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Invalid
Undecided
Unassigned
linux-source-2.6.20 (Ubuntu)
Won't Fix
Medium
Unassigned

Bug Description

Binary package hint: linux-image-2.6.20-15-generic

I've had the kernel oops and hang the machine in the last kernels.

I've had the machine's ram, cpu and disks tested for a week with no errors found.

I had to reinstall Ubuntu, so I used the new 7.04 for i386, with kernel 2.6.20-15-generic
The machine is an AMD Athlon(tm) 64 Processor 3200+

This oops was generated after some hours of running.
I had to copy it by hand from the monitor, since the machine was hung and no copy was in /var/log/messages.

I don't have any other machines with serial ports, so I cannot get the full oops.

I installed the kernel-debug package, but I couldn't find any documentation on how to use it to narrow the error down.

I couldn't find any similar looking oops either in the ubuntu bug list or lkml (I googled the names of the functions in the backtrace, but nothing caught my eye)

I'm running some heavy reading and writing on the disks, so I suspect locking errors leading to corruption (wild guess)

If you need more info (lspci, modules, etc), I'll post it.

The oops follows:

------------probably some part of the oops was lost
ext3_journal_dirty_data + 0x0/0x50 [ext3]
ext3_ordered_commit_write +0x0/0xf0 [ext3]
generic_file_buffered_write +0x33b/0x6d0
ext3_mark_inode_dirty +0x32/0x50 [ext3]
current_fs_time +0x50/0x60
__generic_file_aio_write_nolock +0x2ed/0x610
find_extend_vma +0x1d/0x70
get_futex_key +0x40/0x110
try_to_del_timer_sync+0x47/0x50
generic_file_aoi_write +0x55/0xd0
ext3_file_write +0x30/0xc0 [ext3]
do_sync_write +0xd5/0x120
autoremove_wake_function +0x0/0x50
vfs_write +0xbe/0x190
sys_futex +0x91/0x140
do_sync_write +0x0/0x120
sys_write +0x41/0x70
sysenter_past_esp+0x69/0xa9

Code:
F4 89 55 F0 89 75 F8 BE 40 6E 42 10 9C 58 FA 66 66 66 90 66 66 66 90 66 66 66 90 66 66 90 8B 55 F0
89 F3 89 02 8B 47 04 8B 40 10 <03> 1C 85 00 FC 3C C0 89 D8 E8 DA FF 1C 00 8B 47 04 8B 40 10 8B

EIP:[<C011E718>] task_rq_lock + 0x38 / 0x80
SS:ESP 0068:F029FBB0
<0> Kernel panic - not syncing: Fatal exception in interrupt.

------

LEDS:
Num_lock OFF
Caps_lock BLINKING
Scroll_lock BLINKING

Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

I can reproduce the oops by letting my machine run for some days; the oops eventually hangs the machine.

However, I do not know a way to trigger it easily.

Revision history for this message
Dan O'Huiginn (daniel-ohuiginn) wrote :

Thanks for taking the time to report this bug. To help us track it down please include the following additional information, if you have not already done so (please pay attention to lspci's additional options), as required by the Ubuntu Kernel Team:
1. Please include the output of the command "uname -a" in your next response. It should be one, long line of text which includes the exact kernel version you're running, as well as the CPU architecture.
2. Please run the command "dmesg > dmesg.log" and attach the resulting file "dmesg.log" to this bug report.
3. Please run the command "lspci -vvnn > lspci-vvnn.log" and attach the resulting file "lspci-vvnn.log" to this bug report.

For your reference, the full description of procedures for kernel-related bug reports is available at [WWW] http://wiki.ubuntu.com/KernelTeamBugPolicies. Thanks in advance!

Changed in linux-source-2.6.20:
assignee: nobody → daniel-ohuiginn
status: Unconfirmed → Needs Info
Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

uname -a
Linux casa 2.6.20-15-generic #2 SMP Sun Apr 15 07:36:31 UTC 2007 i686 GNU/Linux

Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :
Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

Perhaps unrelated oops
BUG: unable to handle kernel paging request at virtual address fb81ff04

while playing slime forest adventure
The system continued working and I didn't notice until I closed the game window.

I attach a dmesg which ends in the oops.
Same machine as above.
Same usage pattern.

Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

After some time, I got another oops in the same place, with a different Call Trace.

Are these oopses after an oops useful? I don't know if the recover procedure after an oops leaves the system in a state as though nothing had happened, or not

I'm attaching this one, please reply if they are not trustworthy and I'll reboot and post only the first oops.

Revision history for this message
Dan O'Huiginn (daniel-ohuiginn) wrote :

Thanks for all this, Ruben. I'm assigning this to the Kernel team; I believe there is now enough information for them to start pinning down the bug.

Changed in linux-source-2.6.20:
assignee: daniel-ohuiginn → ubuntu-kernel-team
importance: Undecided → Medium
status: Needs Info → Confirmed
Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

I got another one of the crashing oops (the one which made me open the bug).

the only difference is the stack pointer content:

SS:ESP 0068:cd27dbb0

Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

I suppose there was no answer because I was using the nvidia driver and the kernel was tainted.

I've now rebooted and am using the nv driver (but I can't get xinerama to work, so only one display)

I'll post some clean oopses soon, I'm pretty sure.

Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

Here are two more oopses (recuperables) one of which mentions ext3, the other related to swap.

Clean boot with nothing tainted.

I also have two crashes from yesterday and the day before, which I copied by hand. Also untainted.

Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

First crash:

----------------------------------------------probably some part of the oops was lost
c03c8328 c0426380 c0427ce0 c012b422 0000000a 00000000 00000046 c0426380
c0426300 00004664 c012b4f5 00000000 c0115485 00000000 c01042f8 00000000

Call trace:
[<c01386d9>] __rcu_process_callbacks +0x69/0x1d0
[<c013885c>] rcu_process_callbacks + 0x1c/0x40
[<c012b903>] tasklet_action + 0x63 / 0xe0
[<c012b422>] __do_soft_irq + 0x82/ 0x100
[<c012b4f5>] do_soft_irq + 0x55 / 0x60
[<c01154b5>] smp_apic_timer_interrupt +0x75 /0x80
[<c01042f8>] apic_timer_interrupt + 0x28 / 0x30
[<c0101e00>] default_idle + 0x0 / 0x60
[<c011c092>] native_safe_halt + 0x2 / 0x10
[<c0101e3d>] default_idle + 0x3d / 0x60
[<c0101409>] cpu_idle + 0x49 / 0xd0
[<c03d77f5>] start_kernel + 0x365 / 0x420
[<c03d7230>] unknown_bootoption +0x0/0x260

Code: 65 72 72 6f 72 3a 20 25 64 0a 00 41 43 50 49 5f
56 53 42 00 50 4e 50 30 43 30 44 2c 50 4e 50 30
43 30 43 2c 50 4e 50 30 43 30 45 <00> 3c 33 3e
41 43 50 49 3a 20 6b 73 65 74 5f 72 65 67 69 73 74

EIP: [<c0377695>] .LC4 + 0x106AD / 0x1F578
SS:ESP 0068:c03d3f24
<0> Kernel panic - not syncing. Fatal exception in interrupt.

Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

Second crash:
-----------------------------------------------probably some part of the oops was lost
K.ti=f0264000)
Stack: 00000010 000000d0 080a2708 00000000 dffffdc0 dfff0e00 dfffde40 00000000
          c0176475 00016058 00000000 f0265f30 00000082 ffffde5c 00000246 000000d0
          dffffdc0 fffffff4 c0172c00 007d0f00 dff77a90 df86oeao c0123f1b 00000000
Call trace:
[<c0176475>] do_sync_read +0xd5 / 0x120
[<c0172c00>] kmem_cache_alloc + 0x80/0x90
[<c0123f1b>] copy_process + 0x8b/0x11e0
[<c013adf0>] autoremove_wake_function +0x0/0x50
[<c0138320>] alloc_pid + 0x180/0x290
[<c01252fa>] do_fork + 0x77/0x1eo
[<c018c974>] mntput_no_expire + 0x24/0xa0
[<c0101236>] sys_clone + 0x36 / 0x40
[<c0103280>] syscall_call +0x7 / 0xb
===============================================
Code: 8b 77 14 8B 44 24 34 03 57 0c 8b 34 b0 8d 41 01 89 77 14 89
54 8d 14 89 45 00 8b 44 24 10 8b 77 10 3b 70 3c 72 be 8b 17 8b 47 04 <89> 42 04
89 10 83 7f 14 ff c7 07 00 01 10 00 c7 47 04 00 02 20
EIP: [<c0172d3a>] cache_alloc_refill + 0x12a / 0x 550
SS:ESP 0068:f0265eb0
<7> APIC error on CPU0: 80 (80)

Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

When I got this last hang, I tried booting with noapic, but the machine rebooted just after decompressing the kernel (every time). Is this normal?

Revision history for this message
Ruben Garcia (rubengarciahernandez) wrote :

I updated to Linux casa 2.6.20-16-generic and the crash seems to have dissappeared.
I have an uptime of 3 days.

I have had one recuperable oops, but I think it will be better to close this bug report and open a new one with the new kernel version.
Please close this bug

Revision history for this message
Stefan Kull (stefan-kull) wrote :

Random hanging leaving Caps_lock and Scroll_lock LED-blinking...
...only way out is to reboot by unplugging batteries and power adapter.

Ubuntu 8.04 Beta (2.6.24-12-generic) using both Gnome and KDE4
I previously used the same Dell Latitude C810 with 7.10 without any problem...
Differences between my 7.10 installation vs. 8.04 beta: With 7.10 i used wired-LAN, now with 8.04 beta i using WiFi.

LEDS:
Num_lock OFF
Caps_lock BLINKING
Scroll_lock BLINKING

HW:
Dell Latitude C810 (1GHz, 512MB and build in 32Mb Nvidia graphics)
D-Link DWL-G630 (pc-card Wifi adapter)

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this bug to the new "linux" package. However, development has already began for the upcoming Intrepid Ibex 8.10 release. It would be helpful if you could test the upcoming release and verify if this is still an issue - http://www.ubuntu.com/testing . If the issue still exists, please update this report by changing the Status of the "linux" task from "Incomplete" to "New". We appreciate your patience and understanding as we make this transition. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
Andres Mujica (andres.mujica) wrote :

I'm marking this as invalid according to ruben last comment.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.