SATA hotplug causes I/O stack to freeze
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Confirmed
|
Medium
|
Unassigned |
Bug Description
PROBLEM:
After updating to 3.2.0-30-generic on an Ubuntu 12.04 x86_64 server with 24 SATA drives and md RAID.
When a hotplug event occurs for a drive (even a drive which is not part of any active md RAID set), the machine hangs in a really bad way. I can ping it, I can open a connection to port 22 but not start ssh. Console reports messages such as
"BUG: soft lockup - CPU#1 stuck for 22s! [kworker/
For more details including stacktrace see
http://
This has a screenshot as attachment:
http://
CAUSE AND FIX:
Patch provided at http://
It says this was a regression introduced by commit 3b661a9 "[SCSI] fix hot unplug vs async scan race"
which is 1675b80 in the ubuntu-precise repository.
After building a kernel with this patch applied, I found that:
- hot plugging two inactive drives while I/O access is going on to the other drives is fine. The other drives in an md raid0 set continued to work without a hitch (activity was being generated by bonnie++)
- Even removing those two drives while dd'ing from them was fine. I/O to the other md RAID set was also unaffected.
NOTE ABOUT TESTED KERNEL:
I built this test kernel from apt-get install linux-source (version 3.2.0-30.48), untarring /usr/src/
However the deb package built is "linux-
and uname -a also reports "3.2.27-
I was surprised to find my kernel called 3.2.27 instead of 3.2.0, and so I wonder if there are other changes in this kernel apart from the patch I applied.
ADDITIONAL SYSTEM DETAILS:
Ubuntu 12.04 x86_64 server
24 SATA drives
2 x LSI HBAs (1 x 8 port, 1 x 16 port)
03:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
09:00.0 Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2116 PCI-Express Fusion-MPT SAS-2 [Meteor] (rev 02)
# ./sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 12.00.00.00 (2011.11.08)
Copyright (c) 2008-2011 LSI Corporation. All rights reserved
Adapter Selected is a LSI SAS: SAS2008(B2)
Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr
-------
0 SAS2008(B2) 12.00.00.00 0c.00.00.05 07.23.01.00 00:03:00:00
1 SAS2116_1(B1) 12.00.00.00 0c.00.00.01 07.23.01.00 00:09:00:00
Finished Processing Commands Successfully.
Exiting SAS2Flash.
---
AlsaDevices:
total 0
crw-rw---T 1 root audio 116, 1 Sep 10 10:46 seq
crw-rw---T 1 root audio 116, 33 Sep 10 10:46 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu12
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=
InstallationMedia: Ubuntu-Server 11.10 "Oneiric Ocelot" - Release amd64 (20111011)
MachineType: TYAN S5510
Package: linux (not installed)
PciMultimedia:
ProcEnviron:
LANGUAGE=en_GB:en
TERM=xterm
PATH=(custom, no user)
LANG=en_GB.UTF-8
SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.79.1
RfKill: Error: [Errno 2] No such file or directory
Tags: precise
Uname: Linux 3.2.27-
UpgradeStatus: Upgraded to precise on 2012-05-22 (111 days ago)
UserGroups: adm admin cdrom dialout kvm libvirtd lpadmin plugdev sambashare
dmi.bios.date: 04/12/2012
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: V1.05a
dmi.board.
dmi.board.name: S5510
dmi.board.vendor: TYAN
dmi.board.version: empty
dmi.chassis.
dmi.chassis.type: 3
dmi.chassis.vendor: empty
dmi.chassis.
dmi.modalias: dmi:bvnAmerican
dmi.product.name: S5510
dmi.product.
dmi.sys.vendor: TYAN
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1049013
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.