RAID array causing "BUG: soft lockup" errors/system freeze
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
High
|
Andy Whitcroft | ||
Intrepid |
Fix Released
|
High
|
Andy Whitcroft |
Bug Description
Running Ubuntu 8.10 Intrepid Desktop 64-bit with all current updates.
Kernel is 2.6.27-9-generic #1 SMP Thu Nov 20 22:15:32 UTC 2008 x86_64 GNU/Linux.
System is Abit IP35-Pro motherboard, Q9550 CPU, 8 GB RAM, 4 Western Digital WD6401AALS-00L3B2 drives (640 GB, SATA).
MemTest and drive tests returns no errors. Using a RAID5 array across the 4 drives with XFS and Ext3 as the file systems.
It seems when the system gets under heavy I/O load on the RAID array it will eventually freeze (requiring a hard reboot). The errors are inconsistent in how long they take to show up and the specific applications causing the I/O load don't seem to matter but the error always shows up eventually. It has already happened dozens of times after I start a load on the machine.
Examples of scenarios where the bug has appeared:
* Running Bonnie++ benchmarks. I have seen this kill the system within 30 seconds. Other times it causes no problems.
* Running a large SQlite import while simultaneously tar/gzip'ing a large directory structure.
* Running a large MySql import while tar/gzip'ing a large directory structure.
* Multiple tar/gzip processes running at the same time.
The specific process that gets the "CPU: soft lockup" error varies. I have seen it lockup in kswapd, pdflush, gzip, sqlite3, bonnie++ (see attached logs).
I have attached some of my stack traces. They repeat many times and I have only included the first distinct events before my system crashed.
affects: | mdadm (Ubuntu) → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
status: | New → Triaged |
tags: | added: regression-release |
Changed in linux (Ubuntu Intrepid): | |
importance: | Undecided → High |
I meant to note that the RAID5 array is a software RAID array created with:
mdadm --create /dev/md1 --chunk=256 --level=5 --raid-devices=4 /dev/sdb2 /dev/sdc2 /dev/sdd2 /dev/sde2
This same system was running XFS on a single drive without problems so it may be some sort of RAID/XFS interaction or bandwidth issue as the performance is much higher than previously.