[LTCTest][OPAL][OP920] INFO: rcu_sched self-detected stall on CPU
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
In Progress
|
High
|
Canonical Kernel Team | ||
linux (Ubuntu) |
In Progress
|
High
|
Unassigned | ||
Bionic |
In Progress
|
High
|
Unassigned |
Bug Description
== SRU Justification ==
IBM is seeing kernel traces during testing. This is due to a missing
backport of some kernel fixes in the RTC driver, which is commit
682e6b4da5cb. Commit 682e6b4da5cb was also cc'd to upstream stable, but
it has not landed in Bionic as of yet. It is also a fix to upstream
commit 628daa8d5abf.
Commit 34dd25de9fe3 is also needed as a prereq to define
OPAL_BUSY_DELAY_MS.
== Fixes ==
34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops")
682e6b4da5cb ("rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops")
== Regression Potential ==
Low. Limited to powerpc. Fixes a current regression.
== Test Case ==
A test kernel was built with these patches and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.
== Comment: #0 - PAVAMAN SUBRAMANIYAM <email address hidden> - 2018-05-23 01:15:30 ==
Install a P9 Open Power Hardware with the latest OP920 Firmware images provided in the following link:
http://
root@witherspoon:~# cat /etc/os-release
ID="openbmc-
NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)"
VERSION="ibm-v2.1"
VERSION_
PRETTY_
BUILD_ID=
root@witherspoon:~# cat /var/lib/
IBM-witherspoon
occ-77bb5e6
sbe-a596975
Seeing the following messages in the dmesg logs.
[ 16.377405] ipmi_si: Unable to find any System Interface(s)
[ 17.384118] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[ 1372.711730] INFO: rcu_sched self-detected stall on CPU
[ 1372.711787] 32-....: (5249 ticks this GP) idle=182/
[ 1372.711863] (t=5250 jiffies g=22430 c=22429 q=953)
[ 1372.711921] Task dump for CPU 32:
[ 1372.711922] kworker/32:1 R running task 0 1123 2 0x00000804
[ 1372.711930] Workqueue: events rtc_timer_do_work
[ 1372.711931] Call Trace:
[ 1372.711934] [c000003fd2b97350] [c00000000014a8f8] sched_show_
[ 1372.711939] [c000003fd2b973c0] [c0000000001aa8bc] rcu_dump_
[ 1372.711942] [c000003fd2b97410] [c0000000001a9988] rcu_check_
[ 1372.711945] [c000003fd2b97540] [c0000000001b7c28] update_
[ 1372.711948] [c000003fd2b97570] [c0000000001cf974] tick_sched_
[ 1372.711950] [c000003fd2b975a0] [c0000000001cfa70] tick_sched_
[ 1372.711953] [c000003fd2b975e0] [c0000000001b87d4] __hrtimer_
[ 1372.711956] [c000003fd2b97660] [c0000000001b972c] hrtimer_
[ 1372.711959] [c000003fd2b97730] [c0000000000249f0] __timer_
[ 1372.711962] [c000003fd2b97780] [c000000000024e08] timer_interrupt
[ 1372.711965] [c000003fd2b977b0] [c000000000009054] decrementer_
[ 1372.711970] --- interrupt: 901 at opal_get_
[ 1372.711972] [c000003fd2b97aa0] [c000000000a457b8] opal_get_
[ 1372.711975] [c000003fd2b97ae0] [c000000000a3f98c] __rtc_read_
[ 1372.711977] [c000003fd2b97b60] [c000000000a41738] rtc_timer_
[ 1372.711980] [c000003fd2b97c90] [c000000000134378] process_
[ 1372.711982] [c000003fd2b97d20] [c000000000134718] worker_
[ 1372.711985] [c000003fd2b97dc0] [c00000000013d348] kthread+0x1a8/0x1b0
[ 1372.711988] [c000003fd2b97e30] [c00000000000b658] ret_from_
== Comment: #1 - PAVAMAN SUBRAMANIYAM <email address hidden> - 2018-05-23 01:31:06 ==
== Comment: #2 - Application Cdeadmin <email address hidden> - 2018-05-23 01:33:40 ==
cde00 (<email address hidden>) added native attachment /tmp/AIXOS07311
== Comment: #3 - Application Cdeadmin <email address hidden> - 2018-05-24 16:45:41 ==
==== State: Open by: jayeshp on 24 May 2018 16:42:57 ====
#=#=# 2018-05-24 16:42:54 (CDT) #=#=#
New Fix_Potential = [P920.10W]
#=#=#=#
== Comment: #4 - Stewart Smith <email address hidden> - 2018-05-30 21:15:15 ==
This'll be a missing backport of some kernel fixes in the RTC driver.
It's at least this commit:
commit 682e6b4da5cbe8e
Author: Nicholas Piggin <email address hidden>
Date: Tue Apr 10 21:49:32 2018 +1000
rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops
The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware, which causes large scheduling
latencies, up to 50 seconds have been observed here when RTC stops
responding (BMC reboot can do it).
Fix this by converting it to the standard form OPAL_BUSY loop that
sleeps.
Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
Cc: <email address hidden> # v3.2+
Signed-off-by: Nicholas Piggin <email address hidden>
Acked-by: Alexandre Belloni <email address hidden>
Signed-off-by: Michael Ellerman <email address hidden>
Changed in ubuntu-power-systems: | |
assignee: | nobody → Canonical Kernel Team (canonical-kernel-team) |
affects: | kernel-package (Ubuntu) → linux (Ubuntu) |
Changed in ubuntu-power-systems: | |
importance: | Undecided → High |
tags: | added: p9 triage-g |
Changed in ubuntu-power-systems: | |
status: | New → Triaged |
Changed in linux (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → High |
Changed in linux (Ubuntu Bionic): | |
status: | New → Triaged |
importance: | Undecided → High |
assignee: | nobody → Joseph Salisbury (jsalisbury) |
Changed in linux (Ubuntu): | |
assignee: | Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury) |
status: | Triaged → In Progress |
Changed in linux (Ubuntu Bionic): | |
status: | Triaged → In Progress |
Changed in ubuntu-power-systems: | |
status: | Triaged → In Progress |
tags: | added: cscc |
Default Comment by Bridge