Spontaneous aws instance reboots (sphinx with raid10)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-ec2 (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
We're experiencing regular (about once in 2 days) spontaneous reboots with one of our Amazon instances running Ubuntu 10.04.2 LTS (2.6.32-316-ec2 kernel). We're running a sphinx search daemon in combination with linux raid10. The full console log is attached, but relevant part is:
...
[10783.890731] BUG: unable to handle kernel paging request at ffffffff97fc3640
[10783.890731] IP: [<ffffffff814ae
[10783.890731] PGD 1002067 PUD 3aee067 PMD 0
[10783.890731] Thread overran stack, or stack corrupted
[10783.890731] Oops: 0000 [#1] SMP
...
It is strange, that the instance reboots itself after the crash although we have the following kernel panic settings:
search00:/$ sudo sysctl -a |grep panic
kernel.panic = 0
kernel.
kernel.
kernel.
kernel.
kernel.
vm.panic_on_oom = 0
error: permission denied on key 'net.ipv4.
fs.xfs.panic_mask = 0
error: permission denied on key 'net.ipv6.
We'we also ruled out a memory defect of the underlying hardware as we launched the same image on different hardware, but still got spontaneous reboots. We have another sphinx instance with the same configuration (also equal instance type), but without the raid10 array which works flawlessly.
ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-
ProcVersionSign
Uname: Linux 2.6.32-316-ec2 x86_64
Architecture: amd64
Date: Wed Jun 29 09:45:41 2011
Ec2AMI: ami-f5d5e081
Ec2AMIManifest: (unknown)
Ec2Availability
Ec2InstanceType: c1.xlarge
Ec2Kernel: aki-4feec43b
Ec2Ramdisk: unavailable
ProcEnviron:
LANG=en_GB.UTF-8
SHELL=/bin/bash
SourcePackage: linux-ec2