P8 node modoc will reboot automatically when running the sru_misc test suite
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ubuntu-kernel-tests |
Triaged
|
Undecided
|
Unassigned | ||
linux (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
Tested with 5 attempts, 4 hangs around the following test in ubuntu_
# selftests: net: reuseport_bpf_cpu
First attempt:
23:21:32 DEBUG| [stdout] ok 2 selftests: net: reuseport_bpf_cpu
23:21:32 DEBUG| [stdout] # selftests: net: reuseport_bpf_numa
23:21:32 DEBUG| [stdout] # ---- IPv4 UDP ----
(hang here)
Second attempt:
10:17:35 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
10:17:35 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
10:17:35 DEBUG| [stdout] # ---- IPv4 UDP ----
10:17:35 DEBUG| [stdout] # send cpu 0, receive socket 0
(line skipped)
10:17:35 DEBUG| [stdout] # send cpu 159, receive socket 159
10:17:35 DEBUG| [stdout] # ---- IPv6 TCP ----
(hang here)
Third attempt failed because of test timeout:
12:46:16 DEBUG| [stdout] # [FAIL]
12:46:16 DEBUG| [stdout] # -------
12:46:16 DEBUG| [stdout] # running psock_tpacket test
12:46:16 DEBUG| [stdout] # -------
13:14:13 INFO | Timer expired (1800 sec.), nuking pid 161853
Fourth attempt:
07:41:51 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
07:41:51 DEBUG| [stdout] # ---- IPv4 UDP ----
07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
(lines skipped)
07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
07:41:51 DEBUG| [stdout] # ---- IPv6 UDP ----
07:41:51 DEBUG| [stdout] # send cpu 0, receive socket 0
07:41:51 DEBUG| [stdout] # send cpu 1, receive socket 1
(lines skipped)
07:41:51 DEBUG| [stdout] # send cpu 157, receive socket 157
07:41:51 DEBUG| [stdout] # send cpu 159, receive socket 159
07:41:51 DEBUG| [stdout] # ---- IPv4 TCP ----
(test hang here)
Fifth attempt:
04:29:17 DEBUG| [stdout] ok 1 selftests: net: reuseport_bpf
04:29:17 DEBUG| [stdout] # selftests: net: reuseport_bpf_cpu
04:29:17 DEBUG| [stdout] # ---- IPv4 UDP ----
04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
(lines skipped)
04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
04:29:17 DEBUG| [stdout] # ---- IPv6 UDP ----
04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
(lines skipped)
04:29:17 DEBUG| [stdout] # send cpu 159, receive socket 159
04:29:17 DEBUG| [stdout] # ---- IPv4 TCP ----
04:29:17 DEBUG| [stdout] # send cpu 0, receive socket 0
(lines skipped)
04:29:17 DEBUG| [stdout] # send cpu 15, receive socket 15
(test hang here)
I tried to run tests in this sru-misc suite in the following order:
'hwclock',
One by one on this node, but I can't reproduce this issue.
I tried to watch dmesg when this happens, but there is no information there, the system will be reboot automatically silently.
This is what you can see from syslog after reboot:
Mar 12 04:27:39 modoc kernel: [ 536.668305] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.684547] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.700907] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.717246] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.719288] page:c00c000000
Mar 12 04:27:39 modoc kernel: [ 536.719289] anon
Mar 12 04:27:39 modoc kernel: [ 536.719291] flags: 0x3ffff80008002
Mar 12 04:27:39 modoc kernel: [ 536.719294] raw: 003ffff800080024 5deadbeef0000100 5deadbeef0000122 c000000f8cfe0fd1
Mar 12 04:27:39 modoc kernel: [ 536.719295] raw: 0000000007611c3e 0000000000000000 00000001ffffffff c000000fcfd1c000
Mar 12 04:27:39 modoc kernel: [ 536.719296] page dumped because: unmovable page
Mar 12 04:27:39 modoc kernel: [ 536.719296] page->mem_
Mar 12 04:27:39 modoc kernel: [ 536.735465] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.751848] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.768210] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.784450] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.800756] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.817006] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.833133] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.849205] Injecting error (-12) to MEM_GOING_OFFLINE
Mar 12 04:27:39 modoc kernel: [ 536.865448] Injecting error (-12) to MEM_GOING_OFFLINE
^@^@^@^
Mar 12 04:35:41 modoc kernel: [ 0.000000] hash-mmu: Page sizes from device-tree:
Mar 12 04:35:41 modoc kernel: [ 0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
Mar 12 04:35:41 modoc kernel: [ 0.000000] hash-mmu: base_shift=12: shift=16, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=7
Mar 12 04:35:41 modoc kernel: [ 0.000000] hash-mmu: base_shift=12: shift=24, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=56
Mar 12 04:35:41 modoc systemd[1]: Started udev Kernel Device Manager.
From the log above, line "^@^@^@^@^@^" indicates the reboot. It looks like it's running the memory-hotplug test.
Maybe we need to use IPMI to see if there is anything on the console.
ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: linux-image-
ProcVersionSign
Uname: Linux 5.3.0-42-generic ppc64le
AlsaDevices:
total 0
crw-rw---- 1 root audio 116, 1 Mar 12 04:33 seq
crw-rw---- 1 root audio 116, 33 Mar 12 04:33 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.11-0ubuntu8.5
Architecture: ppc64el
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Thu Mar 12 09:42:24 2020
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
Lsusb:
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
PciMultimedia:
ProcEnviron:
TERM=xterm-
PATH=(custom, no user)
LANG=C.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: root=UUID=
ProcLoadAvg: 0.07 0.02 0.00 1/1461 86637
ProcLocks:
1: POSIX ADVISORY WRITE 3799 00:18:841 0 EOF
2: POSIX ADVISORY WRITE 3526 00:18:743 0 EOF
3: FLOCK ADVISORY WRITE 3720 00:18:837 0 EOF
ProcSwaps:
Filename Type Size Used Priority
/swap.img file 8388544 0 -2
ProcVersion: Linux version 5.3.0-42-generic (buildd@
RelatedPackageV
linux-
linux-
linux-firmware 1.183.4
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
VarLogDump_list: total 0
cpu_cores: Number of cores present = 20
cpu_coreson: Number of cores online = 20
cpu_dscr: DSCR is 0
cpu_freq:
min: 3.694 GHz (cpu 159)
max: 3.695 GHz (cpu 1)
avg: 3.694 GHz
cpu_runmode:
Could not retrieve current diagnostics mode,
No kernel interface to firmware
cpu_smt: SMT=8
description: | updated |
tags: |
added: kqa-blocker removed: sru-20200217 |
description: | updated |
Changed in ubuntu-kernel-tests: | |
status: | New → Triaged |
tags: | removed: kqa-blocker |
This change was made by a bot.