IBM POWER8 unhandled signal 11 / SEGV
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
Invalid
|
Undecided
|
Unassigned | ||
apparmor (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Confirmed
|
Medium
|
Unassigned | ||
linux-meta-lts-vivid (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Hi,
We have a few IBM POWER8 servers which we're currently using as OpenStack nova compute nodes. It seems we're regularly running into issues where processes are segfaulting:
| hloeung@gligar:~$ zgrep -E '(SEGV)|(unhandled signal 11)' /var/log/
| Oct 16 23:31:38 gligar kernel: [88351.465559] neutron-
| Oct 16 23:31:38 gligar kernel: [88351.566909] init: neutron-
| Oct 16 23:31:38 gligar kernel: [88351.746611] apport[29500]: unhandled signal 11 at 8850e467250040a8 nip 0000000010201f80 lr 0000000010202984 code 30001
| Oct 16 23:31:39 gligar kernel: [88352.245829] neutron-
| Oct 16 23:31:50 gligar kernel: [88364.040340] neutron-
| Oct 16 23:31:51 gligar kernel: [88364.174218] neutron-
| Oct 16 23:31:52 gligar kernel: [88365.195380] neutron-
| Oct 16 23:31:52 gligar kernel: [88365.362374] neutron-
| Oct 16 23:32:27 gligar kernel: [88400.966976] neutron-
| Oct 16 23:32:47 gligar kernel: [88420.953053] neutron-
| Oct 16 23:34:49 gligar kernel: [88542.778503] neutron-
| Oct 16 23:35:23 gligar kernel: [88576.700721] neutron-
| Oct 16 23:35:23 gligar kernel: [88576.804961] init: neutron-
| Oct 16 23:36:01 gligar kernel: [88614.995497] nova-compute[
| Oct 16 23:36:02 gligar kernel: [88615.110735] nova-compute[4331]: unhandled signal 11 at 88befae9220010a8 nip 00000000100b5c8c lr 000000001014c734 code 30001
| Oct 16 23:36:02 gligar kernel: [88615.219436] init: nova-compute main process (4331) killed by SEGV signal
| Oct 17 03:59:56 gligar kernel: [104449.890256] landscape-
| Oct 17 04:05:00 gligar kernel: [104753.718195] sudo[63915]: unhandled signal 11 at 08e06105d1dcfff8 nip 00003fffb15cf7e4 lr 00003fffb15cfa00 code 30001
| hloeung@floette:~$ zgrep -E '(SEGV)|(unhandled signal 11)' /var/log/
| Oct 14 16:55:30 floette kernel: [149326.697938] rsync[9915]: unhandled signal 11 at 00003ffff7cb0000 nip 00003fffa242d054 lr 00003fffa2426560 code 30001
| Oct 14 21:05:57 floette kernel: [164353.333697] apparmor_
| Oct 14 22:21:24 floette kernel: [168880.481778] neutron-
| Oct 14 22:21:26 floette kernel: [168882.078608] neutron-
| Oct 14 22:21:37 floette kernel: [168893.597834] init: neutron-
| Oct 14 22:21:39 floette kernel: [168894.949777] nova-rootwrap[
| Oct 14 22:21:43 floette kernel: [168898.973700] neutron-
| Oct 14 22:21:44 floette kernel: [168900.785421] neutron-
| Oct 14 22:21:46 floette kernel: [168902.724121] neutron-
| hloeung@patrat:~$ zgrep -E '(SEGV)|(unhandled signal 11)' /var/log/
| Oct 15 00:48:13 patrat kernel: [553143.677075] rsync[89656]: unhandled signal 11 at 00003fffe6a50000 nip 00003fff77e0d054 lr 00003fff77e06560 code 30001
| Oct 16 02:42:03 wailmer kernel: [862104.157449] nova-compute[
| Oct 16 02:42:03 wailmer kernel: [862104.264242] init: nova-compute main process (11431) killed by SEGV signal
| Oct 16 06:38:22 wailmer kernel: [876282.603855] qemu-img[78662]: unhandled signal 11 at 11b625104e000000 nip 00003fffb6224bb4 lr 00003fffb620c42c code 30001
| Oct 16 06:38:23 wailmer kernel: [876283.336045] qemu-system-
| Oct 16 06:39:40 wailmer kernel: [876360.399550] neutron-
| Oct 16 06:39:47 wailmer kernel: [876367.577184] neutron-
| Oct 16 06:39:49 wailmer kernel: [876369.478066] neutron-
| Oct 16 06:39:58 wailmer kernel: [876378.286827] init: neutron-
| Oct 16 06:39:59 wailmer kernel: [876379.211801] sudo[79703]: unhandled signal 11 at 886baddd38005000 nip 886baddd38005000 lr 00003fff7da870a8 code 30001
| Oct 16 06:40:00 wailmer kernel: [876380.344562] libvirtd[109725]: unhandled signal 11 at 88806be02f000000 nip 00003fff78a70684 lr 00003fff78ab7a5c code 30001
| Oct 16 06:40:06 wailmer kernel: [876386.781123] init: libvirt-bin main process (109725) killed by SEGV signal
| Oct 16 06:40:06 wailmer kernel: [876386.818672] sudo[79919]: unhandled signal 11 at 11bda1eb70000000 nip 00003fff82094ac4 lr 00003fff8207c42c code 30001
| Oct 16 06:40:06 wailmer kernel: [876386.921414] neutron-
| Oct 16 06:40:06 wailmer kernel: [876387.024431] init: neutron-
These servers are all running Trusty with hwe-v kernel (3.19.0-31-generic #36~14.
ProblemType: Crash
DistroRelease: Ubuntu 14.04
Package: nova-compute 1:2015.
ProcVersionSign
Uname: Linux 3.19.0-30-generic ppc64le
ApportVersion: 2.14.1-0ubuntu3.16
Architecture: ppc64el
CrashDB:
{
}
Date: Fri Oct 16 23:30:00 2015
ExecutablePath: /usr/bin/
InterpreterPath: /usr/bin/python2.7
PackageArchitec
ProcCmdline: /usr/bin/python /usr/bin/
ProcEnviron:
TERM=linux
PATH=(custom, no user)
ProcLoadAvg: 1.98 1.32 1.28 3/1516 7754
ProcSwaps:
Filename Type Size Used Priority
/swap.img file 8388544 0 -1
ProcVersion: Linux version 3.19.0-30-generic (buildd@fisher04) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #34~14.04.1-Ubuntu SMP Fri Oct 2 22:21:52 UTC 2015
Signal: 6
SourcePackage: nova
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: libvirtd
cpu_cores: Number of cores present = 20
cpu_coreson: Number of cores online = 20
cpu_smt: SMT is off
---
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Oct 22 03:34 seq
crw-rw---- 1 root audio 116, 33 Oct 22 03:34 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.18
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 14.04
Lsusb:
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Package: linux-meta-
PciMultimedia:
ProcEnviron:
TERM=xterm
PATH=(custom, no user)
XDG_RUNTIME_
LANG=en_GB
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: root=UUID=
ProcLoadAvg: 3.77 2.83 2.55 3/1574 89091
ProcSwaps:
Filename Type Size Used Priority
/swap.img file 8388544 0 -1
ProcVersion: Linux version 3.19.0-31-generic (buildd@fisher04) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #36~14.04.1-Ubuntu SMP Thu Oct 8 10:25:49 UTC 2015
ProcVersionSign
RelatedPackageV
linux-
linux-
linux-firmware 1.127.16
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty uec-images
Uname: Linux 3.19.0-31-generic ppc64le
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm
_MarkForUpload: True
cpu_cores: Number of cores present = 20
cpu_coreson: Number of cores online = 20
cpu_dscr: DSCR is 0
cpu_freq:
min: 2.016 GHz (cpu 80)
max: 3.691 GHz (cpu 32)
avg: 3.527 GHz
cpu_runmode:
Could not retrieve current diagnostics mode,
No firmware implementation of function
cpu_smt: SMT is off
information type: | Private → Public |
Changed in apparmor (Ubuntu): | |
status: | New → Invalid |
tags: | added: kernel-key |
tags: |
added: kernel-da-key removed: kernel-key |
Changed in linux (Ubuntu): | |
assignee: | Chris J Arges (arges) → nobody |
These machines are in scalingstack, so they have a great many instances, mostly living for <15 minutes each. The configuration is currently 3.19-on-3.19, and it shows rare memory corruption in guests and frequent segfaults and occasional kernel hangs on the host.