Spontaneous aws instance reboots (sphinx with raid10)

Bug #803381 reported by Rudolfs Osins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-ec2 (Ubuntu)
New
Undecided
Unassigned

Bug Description

We're experiencing regular (about once in 2 days) spontaneous reboots with one of our Amazon instances running Ubuntu 10.04.2 LTS (2.6.32-316-ec2 kernel). We're running a sphinx search daemon in combination with linux raid10. The full console log is attached, but relevant part is:
...
[10783.890731] BUG: unable to handle kernel paging request at ffffffff97fc3640
[10783.890731] IP: [<ffffffff814ae5ad>] schedule+0x28d/0x5f5
[10783.890731] PGD 1002067 PUD 3aee067 PMD 0
[10783.890731] Thread overran stack, or stack corrupted
[10783.890731] Oops: 0000 [#1] SMP
...

It is strange, that the instance reboots itself after the crash although we have the following kernel panic settings:

search00:/$ sudo sysctl -a |grep panic
kernel.panic = 0
kernel.panic_on_oops = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.panic_on_io_nmi = 0
kernel.softlockup_panic = 0
kernel.hung_task_panic = 0
vm.panic_on_oom = 0
error: permission denied on key 'net.ipv4.route.flush'
fs.xfs.panic_mask = 0
error: permission denied on key 'net.ipv6.route.flush'

We'we also ruled out a memory defect of the underlying hardware as we launched the same image on different hardware, but still got spontaneous reboots. We have another sphinx instance with the same configuration (also equal instance type), but without the raid10 array which works flawlessly.

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: linux-image-2.6.32-316-ec2 2.6.32-316.31
ProcVersionSignature: Ubuntu 2.6.32-316.31-ec2 2.6.32.38+drm33.16
Uname: Linux 2.6.32-316-ec2 x86_64
Architecture: amd64
Date: Wed Jun 29 09:45:41 2011
Ec2AMI: ami-f5d5e081
Ec2AMIManifest: (unknown)
Ec2AvailabilityZone: eu-west-1a
Ec2InstanceType: c1.xlarge
Ec2Kernel: aki-4feec43b
Ec2Ramdisk: unavailable
ProcEnviron:
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-ec2

Revision history for this message
Rudolfs Osins (rudolfs) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.