(docker/lxc) container restart causes kernel to lockup
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| linux (Ubuntu) |
High
|
Unassigned |
Bug Description
After restarting some 'ghost' docker containers on precise with the raring-lts kernel, the kernel locks up and shows:
[1095015.
... (for each core)
Here is the original, more docker focused bug report: https:/
I could reproduce this bug with various kernel versions. I've set the softlockup_panic=1 kernel parameter to get some stack traces. See this gist for stack trace for 3.5 and 3.8 kernels (will add 3.11 any minute): https:/
It also contains a small script to reproduce this, although I couldn't reproduce it in a vagrant VM just our Dell R710 systems so far.
---
AlsaDevices:
total 0
crw-rw---T 1 root audio 116, 1 Feb 3 09:45 seq
crw-rw---T 1 root audio 116, 33 Feb 3 09:45 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu17.6
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: [Errno 2] No such file or directory
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=
IwConfig: Error: [Errno 2] No such file or directory
MachineType: Dell Inc. PowerEdge R710
MarkForUpload: True
Package: linux (not installed)
PciMultimedia:
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.79.9
RfKill: Error: [Errno 2] No such file or directory
Tags: precise
Uname: Linux 3.8.0-35-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:
dmi.bios.date: 07/24/2012
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 6.3.0
dmi.board.name: 0YDJK3
dmi.board.vendor: Dell Inc.
dmi.board.version: A09
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.
dmi.product.name: PowerEdge R710
dmi.sys.vendor: Dell Inc.
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
Changed in linux (Ubuntu): | |
importance: | Undecided → High |
tags: | added: kernel-da-key kernel-stable-key |
Joseph Salisbury (jsalisbury) wrote : | #2 |
Do you you know if this is a regression? Was there a prior kernel version that did not exhibit this bug?
Also, it would be good to know if the latest mainline kernel also has the bug. It can be downloaded from:
http://
fish (discordianfish) wrote : | #3 |
The systems are new, so I'm not aware of any state where this doesn't happen. I'll try the mainline kernel soon and will if I can reproduce it there as well.
fish (discordianfish) wrote : | #4 |
Looks like 3.14 has no support for aufs, so I can't reproduce it with those (aufs based) containers.
fish (discordianfish) wrote : | #5 |
I could *not* reproduce this issue on my laptop, so it might be specific to some aspect of our servers. Those are Dell PowerEdge R710 with Intel(R) Xeon(R) CPU L5520 @ 2.27GHz and 24GB RAM.
fish (discordianfish) wrote : AcpiTables.txt | #6 |
apport information
tags: | added: apport-collected |
description: | updated |
fish (discordianfish) wrote : BootDmesg.txt | #7 |
apport information
fish (discordianfish) wrote : CurrentDmesg.txt | #8 |
apport information
fish (discordianfish) wrote : Lspci.txt | #9 |
apport information
fish (discordianfish) wrote : Lsusb.txt | #10 |
apport information
fish (discordianfish) wrote : ProcCpuinfo.txt | #11 |
apport information
fish (discordianfish) wrote : ProcEnviron.txt | #12 |
apport information
apport information
fish (discordianfish) wrote : ProcModules.txt | #14 |
apport information
fish (discordianfish) wrote : UdevDb.txt | #15 |
apport information
fish (discordianfish) wrote : UdevLog.txt | #16 |
apport information
fish (discordianfish) wrote : WifiSyslog.gz | #17 |
apport information
Changed in linux (Ubuntu): | |
status: | Incomplete → Triaged |
fish (discordianfish) wrote : | #18 |
Let me know if there is anything I can do to help.
fish (discordianfish) wrote : | #19 |
I've tried to reproduce it with the same containers but with docker's btrfs driver instead of the default aufs driver and I couldn't reproduce it. So it might be an issue with aufs.
fish (discordianfish) wrote : | #20 |
I just had the same issue when rebooting the system although aufs wasn't directly involved (it was loaded but not used). I'll blacklist it now and see if it happens again.
For what it's worth, some bugs can be easier to reproduce on machines with lots of cores (that might explain why you couldn't reproduce it on your local laptop).
I recall that bug #1011792 never happened on our local 4-cores VM, but the same workload would lock up a 8-cores VM in a few hours.
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1275809
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.