IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n

Bug #1773964 reported by bugproxy on 2018-05-29
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Canonical Kernel Team
linux (Ubuntu)
Critical
Joseph Salisbury
Bionic
Critical
Joseph Salisbury

Bug Description

== SRU Justification ==
IBM is requesting these two commits in powerpc to fix an error they are
seeing:
IPL: ppc64_cpu --frequency hang with INFO: rcu_sched detected stalls on
CPUs/tasks on w34 and wsbmc016 with 920.1714.20170330n

The first commit reduces some existing high latency delays. The second
commit fixes the fact that the OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
OPAL_BUSY_EVENT from firmware.

== Fixes ==
34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops")
682e6b4da5cb ("rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops")

== Regression Potential ==
Low. The first commit is specific to powerpc. The second commit is, commit 628daa8d5abf
and it been cc'd to upstream stable to fix a regression, so it has had
additional upstream review.

== Test Case ==
A test kernel was built with these patches and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

== Comment: #0 - Application Cdeadmin <> - 2018-05-22 09:36:12 ==

== Comment: #1 - Application Cdeadmin <> - 2018-05-22 09:36:14 ==
==== State: Open by: swanman on 22 May 2018 09:17:33 ====

Both w34 and wsbmc016 are stopped on this failure, where we get hung up on the following FFDC gathering step in the OS:

#(CDT) 2018/05/22 05:07:19.398218 - 0.211662 - Issuing: OS Execute Command ppc64_cpu --frequency ignore_err=1
#(CDT) 2018/05/22 05:07:19.437808 - 0.024108 - Issuing: ppc64_cpu --frequency

, and the last console output shows:
# tail -200 /var/log/obmc-console.log
s this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=241989
[13668.719850] (detected by 3, t=3075620 jiffies, g=30289, c=30288, q=796881)
[13732.630691] INFO: rcu_sched detected stalls on CPUs/tasks:
[13732.630727] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=242996
[13732.630757] (detected by 3, t=3091597 jiffies, g=30289, c=30288, q=799822)
[13795.649568] INFO: rcu_sched detected stalls on CPUs/tasks:
[13795.649597] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=243942
[13795.649631] (detected by 7, t=3107351 jiffies, g=30289, c=30288, q=802916)
[13860.140483] INFO: rcu_sched detected stalls on CPUs/tasks:
[13860.140516] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=244940
[13860.140557] (detected by 3, t=3123473 jiffies, g=30289, c=30288, q=806646)
[13924.731404] INFO: rcu_sched detected stalls on CPUs/tasks:
[13924.731438] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=245874
[13924.731484] (detected by 3, t=3139620 jiffies, g=30289, c=30288, q=809909)
[13988.642290] INFO: rcu_sched detected stalls on CPUs/tasks:
[13988.642321] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=246892
[13988.642362] (detected by 3, t=3155596 jiffies, g=30289, c=30288, q=813048)
[14052.661176] INFO: rcu_sched detected stalls on CPUs/tasks:
[14052.661211] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=247857
[14052.661266] (detected by 3, t=3171600 jiffies, g=30289, c=30288, q=816185)
[14115.680024] INFO: rcu_sched detected stalls on CPUs/tasks:
[14115.680051] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=248819
[14115.680085] (detected by 4, t=3187355 jiffies, g=30289, c=30288, q=819190)
[14180.154911] INFO: rcu_sched detected stalls on CPUs/tasks:
[14180.154945] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=249833
[14180.154986] (detected by 132, t=3203473 jiffies, g=30289, c=30288, q=822138)
[14244.968067]I F: Ne talls onCPUs/tasks
[14244669838] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=2[14244.669944] (detected by 3, t=3219600 jiffies, g=30289, c=30288, q=825384)
[14308.748680] Id detected stalls on CPUs/tasks:
[14308.748719] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=251755
[14308.748755] (detected by 3, t=3235620 jiffies, g=30289, c=30288, q=828445)
[14333.081775] Watchdog CPU:132 Hard LOCKUP
[14344.750295] Watchdog CPU:132 became unstuck
[14372.735558] INFO: rcu_sched detected stalls on CPUs/tasks:
[14372.735587] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=252722
[14372.735611] (detected by 6, t=3251616 jiffies, g=30289, c=30288, q=831375)
[14435.910193] INFO: rcu_sched detected stalls on CPUs/tasks:
[14435.910219] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=253675
[14435.910253] (detected by 3, t=3267409 jiffies, g=30289, c=30288, q=834375)
[14500.167195] INFO: rcu_sched detected stalls on CPUs/tasks:
[14500.167230] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=254601
[14500.167273] (detected by 134, t=3283473 jiffies, g=30289, c=30288, q=837600)
[14564.664395] INFO: rcu_sched detected stalls on CPUs/tasks:
[14564.664427] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=255493
[14564.664470] (detected by 3, t=3299596 jiffies, g=30289, c=30288, q=840539)
[14628.665755] INFO: rcu_sched detected stalls on CPUs/tasks:
[14628.665790] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=256471
[14628.665824] (detected by 3, t=3315596 jiffies, g=30289, c=30288, q=843505)
[14692.759268] INFO: rcu_sched detected stalls on CPUs/tasks:
[14692.759299] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=257372
[14692.759330] (detected by 3, t=3331620 jiffies, g=30289, c=30288, q=846522)
[14755.916887] INFO: rcu_sched detected stalls on CPUs/tasks:
[14755.916918] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=258248
[14755.916943] (detected by 3, t=3347409 jiffies, g=30289, c=30288, q=849740)
[14820.174651] INFO: rcu_sched detected stalls on CPUs/tasks:
[14820.174680] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=259206
[14820.174710] (detected by 3, t=3363473 jiffies, g=30289, c=30288, q=852406)
[14884.688523] IO:N Fcr_suhecd 8 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=260089
[14884.688651] y 3, t=3379600 jiffies, g=30289, c=30288, q=855646)
[14948.674473] I stalls on CPUs/tasks:
[14948.674517] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=261047
[14948.674568] (detected by 3, t=3395597 jiffies, g=30289, c=30288, q=858963)
[15012.676503] INFO: rcu_sched detected stalls on CPUs/tasks:
[15012.676531] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=262011
[15012.676579] (detected by 3, t=3411596 jiffies, g=30289, c=30288, q=861906)
[15075.694581] INFO: rcu_sched detected stalls on CPUs/tasks:
[15075.694612] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=262943
[15075.694645] (detected by 7, t=3427351 jiffies, g=30289, c=30288, q=864958)
[15140.184 77P] 88FdelCU75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirqfqs=263861
[15140.184895] (detected by 132, t=3443473 jiffies, g=30289, c=30288, q=868102)
[15204.774985] Isched detected stalls on CPUs/tasks:
[15204.775026] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=264786
[15204.775075] (detected by 3, t=3459620 jiffies, g=30289, c=30288, q=871323)
[158.2706298 4]FOIl asksP
1526[870277]. 75-...: (5904ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=265714
[15268.709381] ce by 3, t=3475603 jiffies, g=30289, c=30288, q=874564)
[15332.707555] Iched detected stalls on CPUs/tasks:
[15332.707594] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=266614
[15332.707641] (detected by 3, t=3491602 jiffies, g=30289, c=30288, q=877578)
[15395.729876] INFO: rcu_sched detected stalls on CPUs/tasks:
[15395.729904] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=267583
[15395.729947] (detected by 7, t=3507357 jiffies, g=30289, c=30288, q=880546)
[15460.196261] INFO: rcu_sched detected stalls on CPUs/tasks:
[15460.196293] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=268585
[15460.196324] (detected by 3, t=3523473 jiffies, g=30289, c=30288, q=883448)
[15524.714688] INFO: rcu_sched detected stalls on CPUs/tasks:
[15524.714728] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=269506
[15524.714766] (detected by 3, t=3539601 jiffies, g=30289, c=30288, q=886049)
[15588.717133] INFO: rcu_sched detected stalls on CPUs/tasks:
[15588.717167] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=270465
[15588.717201] (detected by 6, t=3555602 jiffies, g=30289, c=30288, q=889197)
[15652.791581] INFO: rcu_sched detected stalls on CPUs/tasks:
[15652.791613] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=271387
[15652.791653] (detected by 3, t=3571620 jiffies, g=30289, c=30288, q=891919)
[15715.950025] INFO: rcu_sched detected stalls on CPUs/tasks:
[15715.950057] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=272317
[15715.950081] (detected by 3, t=3587409 jiffies, g=30289, c=30288, q=894689)
[15780.208530] INFO: rcu_sched detected stalls on CPUs/tasks:
[15780.208562] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=273264
[15780.208592] (detected by 3, t=3603473 jiffies, g=30289, c=30288, q=897742)
[15844.799063] INFO: rcu_sched detected stalls on CPUs/tasks:
[15844.799092] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=274155
[15844.799125] (detected by 3, t=3619620 jiffies, g=30289, c=30288, q=900825)
[15908.713583]I O: rcu_NFhsoas:
[15908.13616] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=275069
[15908.713715] 3 t=3635597 jiffies, g=30289, c=30288, q=903951)
[15972.712119] Ited stalls on CPUs/tasks:
[15972.712163] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=276037
[15972.712200] (detected by 3, t=3651597 jiffies, g=30289, c=30288, q=906870)
[16035.730638] INFO: rcu_sched detected stalls on CPUs/tasks:
[16035.730674] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=277022
[16035.730712] (detected by 130, t=3667351 jiffies, g=30289, c=30288, q=909454)
[16100.221204] INFO: rcu_sched detected stalls on CPUs/tasks:
[16100.221234] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=277993
[16100.221274] (detected by 3, t=3683473 jiffies, g=30289, c=30288, q=912612)
[16164.751789] INFO:rO _cuscd a:
s[1616.16164.751[]2 2 75-....: (504 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=278956
[16164.751920] td y 3, t=3699604 jiffies, g=30289, c=30288, q=915666)
[16228.722360] I stalls on CPUs/tasks:
[16228.722404] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=279926
[16228.722443] (detected by 3, t=3715596 jiffies, g=30289, c=30288, q=918675)
[16292.720966] INFO: rcu_sched detected stalls on CPUs/tasks:
[16292.721030] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=280867
[16292.721072] (detected by 25, t=3731596 jiffies, g=30289, c=30288, q=921494)
[16355.743495] INFO: rcu_sched detected stalls on CPUs/tasks:
[16355.743522] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=281853
[16355.743547] (detected by 6, t=3747351 jiffies, g=30289, c=30288, q=924261)
[16420.234094] INFO: rcu_sched detected stalls on CPUs/tasks:
[16420.234121] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=282785
[16420.234162] (detected by 3, t=3763473 jiffies, g=30289, c=30288, q=927318)
[16484.755k5] IN75-....:7 (9504t icks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=283749
[16484.752853] eted by3, t=3779601 jiffies, g=30289, c=30288, q=930422)
[16548.827505] Icted stalls on CPUs/tasks:
[16548.827547] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=284725
[16548.827592] (detected by 3, t=3795620 jiffies, g=30289, c=30288, q=933304)
[16612.758266] INFO: rcu_sched detected stalls on CPUs/tasks:
[16612.758301] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=285631
[16612.758345] (detected by 3, t=3811601 jiffies, g=30289, c=30288, q=936097)
[16675.780986] INFO: rcu_sched detected stalls on CPUs/tasks:
[16675.781016] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=286540
[16675.781053] (detected by 130, t=3827357 jiffies, g=30289, c=30288, q=938924)
[16740.247733] INFO: rcu_sched detected stalls on CPUs/tasks:
[16740.247767] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=287491
[16740.247807] (detected by 3, t=3843473 jiffies, g=30289, c=30288, q=942104)
[16804.762481] NIO:rFttslsa no PCUs/aks
[s160:4.2511] 7675-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=288410
[16804.762616] dteced by 3, t=3859601 jiffies, g=30289, c=30288, q=945426)
[16868.841201] I detected stalls on CPUs/tasks:
[16868.841242] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=289383
[16868.841281] (detected by 3, t=3875620 jiffies, g=30289, c=30288, q=948853)
[16932.751907] INFO: rcu_sched detected stalls on CPUs/tasks:
[16932.751938] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=290305
[16932.751978] (detected by 3, t=3891597 jiffies, g=30289, c=30288, q=952379)
[16995.770578] INFO: rcu_sched detected stalls on CPUs/tasks:
[16995.770613] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=291234
[16995.770651] (detected by 130, t=3907351 jiffies, g=30289, c=30288, q=955472)
[17060.261283] INFO: rcu_sched detected stalls on CPUs/tasks:
[17060.261314] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=292159
[17060.261354] (detected by 3, t=3923473 jiffies, g=30289, c=30288, q=959072)
[17124.791994] INFO: rcu_sched detected stalls on CPUs/tasks:
[17124.792029] 75-....: (5904 ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=293083
[17124.792069] (detected by 3, t=3939604 jiffies, g=30289, c=30288, q=962221)
71888.57141688]075- ..:.59 (0 4ticks this GP) idle=82a/140000000000000/0 softirq=2689/2689 fqs=294027
[17188.854815] t=3955620 jiffies, g=30289, c=30288, q=965699)

-----------------------------------
Keyword 'Check State' failed after retrying for 15 minutes. The last error was:
#(CDT) 2018/05/22 05:05:49.896460 - 10.672026 - **ERROR** The current state of the machine does NOT match the match state:
state:
state[rest]: 1
state[host]: Off
state[operating_system]: Inactive
state[boot_progress]: Unspecified
state[chassis]: On
state[os_login]: 0
state[os_run_cmd]: 0
state[os_ping]: 0

boot_end_time: (CDT) 2018/05/22 05:05:49.900677
#(CDT) 2018/05/22 05:05:49.900677 - 0.004217 - BOOT_FAILED: "REST OBMC Reboot (run)" failed.
build_name: 920.1808.20180518b
platform: witherspoon

== Comment: #2 - Application Cdeadmin <> - 2018-05-22 09:56:15 ==

==== State: Open by: swanman on 22 May 2018 09:53:43 ====

#=#=# 2018-05-22 09:53:40 (CDT) #=#=#
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#

== Comment: #3 - Application Cdeadmin <> - 2018-05-22 12:42:02 ==
==== State: Open by: swanman on 22 May 2018 12:31:06 ====

Unfortunately we lost the hang state on wsbmc016, but here is what I was able to save off prior ...

Looks like someone stopped the test on wsbmc016, so we lost the failed state
#(CDT) 2018/05/21 17:12:19.564819 - 0.014776 - Issuing: ppc64_cpu --frequency
#(CDT) 2018/05/22 11:32:36.223134 - 66386.779704 - Executing: obmc_boot::exit_function "EXIT" "0"
#(CDT) 2018/05/22 11:32:36.286650 - 0.063516 - Issuing: kill_autoend autoend_pid

root@witherspoon:~# tail -200 /var/log/obmc-console.log
c000000fdd92f720] [c0000000002061ac] tick_sched_timer+0x8c/0x1b0
[54800.623485] [c000000fdd92f760] [c0000000001ea198] __hrtimer_run_queues+0x118/0x3e0
[54800.623593] [c000000fdd92f800] [c0000000001eb3ac] hrtimer_interrupt+0xfc/0x310
[54800.623692] [c000000fdd92f8d0] [c000000000025ed0] __timer_interrupt+0x90/0x270
[54800.623791] [c000000fdd92f920] [c0000000000262e0] timer_interrupt+0xa0/0xe0
[54800.623856] [c000000fdd92f950] [c000000000008f84] decrementer_common+0x114/0x120
[54800.623973] --- interrupt: 901 at smp_call_function_single+0x180/0x2b0
[54800.623973] LR = smp_call_function_single+0x15c/0x2b0
[54800.624130] [c000000fdd92fc40] [c000000fdd92fc90] 0xc000000fdd92fc90 (unreliable)
[54800.624236] [c000000fdd92fca0] [c0000000002d7b64] perf_install_in_context+0x1d4/0x250
[54800.624323] [c000000fdd92fd30] [c0000000002e9784] SyS_perf_event_open+0xb14/0xda0
[54800.624430] [c000000fdd92fe30] [c00000000000b004] system_call+0x38/0xe0
[54980.670091] INFO: rcu_sched self-detected stall on CPU
[54980.670137] 49-...: (5479214 ticks this GP) idle=72b/140000000000001/0 softirq=706/706 fqs=2714915
[54980.670211] (t=5479215 jiffies g=5834 c=5833 q=3610838)
[54980.670267] Task dump for CPU 49:
[54980.670291] ppc64_cpu R running task 0 6588 6516 0x00042084
[54980.670401] Call Trace:
[54980.670429] [c000000fdd92f500] [c000000000bfdddc] sched_show_task.part.61+0x80/0x98 (unreliable)
[54980.670565] [c000000fdd92f570] [c000000000bfe50c] rcu_dump_cpu_stacks+0xd8/0x13c
[54980.670655] [c000000fdd92f5c0] [c0000000001dfc84] rcu_check_callbacks+0xbf4/0xcd0
[54980.670774] [c000000fdd92f6e0] [c0000000001e9264] update_process_times+0x84/0xf0
[54980.670890] [c000000fdd92f720] [c0000000002061ac] tick_sched_timer+0x8c/0x1b0
[54980.671009] [c000000fdd92f760] [c0000000001ea198] __hrtimer_run_queues+0x118/0x3e0
[54980.671154] [c000000fdd92f800] [c0000000001eb3ac] hrtimer_interrupt+0xfc/0x310
[54980.671270] [c000000fdd92f8d0] [c000000000025ed0] __timer_interrupt+0x90/0x270
[54980.671367] [c000000fdd92f920] [c0000000000262e0] timer_interrupt+0xa0/0xe0
[54980.671465] [c000000fdd92f950] [c000000000008f84] decrementer_common+0x114/0x120
[54980.671574] --- interrupt: 901 at smp_call_function_single+0x188/0x2b0
[54980.671574] LR = smp_call_function_single+0x15c/0x2b0
[54980.671689] [c000000fdd92fc40] [c000000fdd92fc90] 0xc000000fdd92fc90 (unreliable)
[54980.671806] [c000000fdd92fca0] [c0000000002d7b64] perf_install_in_context+0x1d4/0x250
[54980.671913] [c000000fdd92fd30] [c0000000002e9784] SyS_perf_event_open+0xb14/0xda0
[54980.672029] [c000000fdd92fe30] [c00000000000b004] system_call+0x38/0xe0
[55160.717672] INFO: rcu_sched self-detected stall on CPU
[55160.717720] 49-...: (5497218 ticks this GP) idle=72b/140000000000001/0 softirq=706/706 fqs=2723789
[55160.717803] (t=5497219 jiffies g=5834 c=5833 q=3621302)
[55160.717869] Task dump for CPU 49:
[55160.717903] ppc64_cpu R running task 0 6588 6516 0x00042084
[55160.718019] Call Trace:
[55160.718059] [c000000fdd92f500] [c000000000bfdddc] sched_show_task.part.61+0x80/0x98 (unreliable)
[55160.718193] [c000000fdd92f570] [c000000000bfe50c] rcu_dump_cpu_stacks+0xd8/0x13c
[55160.718290] [c000000fdd92f5c0] [c0000000001dfc84] rcu_check_callbacks+0xbf4/0xcd0
[55160.718421] [c000000fdd92f6e0] [c0000000001e9264] update_process_times+0x84/0xf0
[55160.718540] [c000000fdd92f720] [c0000000002061ac] tick_sched_timer+0x8c/0x1b0
[55160.718641] [c000000fdd92f760] [c0000000001ea198] __hrtimer_run_queues+0x118/0x3e0
[55160.718739] [c000000fdd92f800] [c0000000001eb3ac] hrtimer_interrupt+0xfc/0x310
[55160.718838] [c000000fdd92f8d0] [c000000000025ed0] __timer_interrupt+0x90/0x270
[55160.718934] [c000000fdd92f920] [c0000000000262e0] timer_interrupt+0xa0/0xe0
[55160.719013] [c000000fdd92f950] [c000000000008f84] decrementer_common+0x114/0x120
[55160.719113] --- interrupt: 901 at smp_call_function_single+0x184/0x2b0
[55160.719113] LR = smp_call_function_single+0x15c/0x2b0
[55160.719277] [c000000fdd92fc40] [c000000fdd92fc90] 0xc000000fdd92fc90 (unreliable)
[55160.719373] [c000000fdd92fca0] [c0000000002d7b64] perf_install_in_context+0x1d4/0x250
[55160.719463] [c000000fdd92fd30] [c0000000002e9784] SyS_perf_event_open+0xb14/0xda0
[55160.719559] [c000000fdd92fe30] [c00000000000b004] system_call+0x38/0xe0
[55340.765251] INFO: rcu_sched self-detected stall on CPU
[55340.765305] 49-...: (5515222 ticks this GP) idle=72b/140000000000001/0 softirq=706/706 fqs=2732670
[55340.765392] (t=5515223 jiffies g=5834 c=5833 q=3631900)
[55340.765458] Task dump for CPU 49:
[55340.765492] ppc64_cpu R running task 0 6588 6516 0x00042084
[55340.765575] Call Trace:
[55340.765603] [c000000fdd92f500] [c000000000bfdddc] sched_show_task.part.61+0x80/0x98 (unreliable)
[55340.765727] [c000000fdd92f570] [c000000000bfe50c] rcu_dump_cpu_stacks+0xd8/0x13c
[55340.765818] [c000000fdd92f5c0] [c0000000001dfc84] rcu_check_callbacks+0xbf4/0xcd0
[55340.765927] [c000000fdd92f6e0] [c0000000001e9264] update_process_times+0x84/0xf0
[55340.766023] [c000000fdd92f720] [c0000000002061ac] tick_sched_timer+0x8c/0x1b0
[55340.766123] [c000000fdd92f760] [c0000000001ea198] __hrtimer_run_queues+0x118/0x3e0
[55340.766239] [c000000fdd92f800] [c0000000001eb3ac] hrtimer_interrupt+0xfc/0x310
[55340.766368] [c000000fdd92f8d0] [c000000000025ed0] __timer_interrupt+0x90/0x270
[55340.766509] [c000000fdd92f920] [c0000000000262e0] timer_interrupt+0xa0/0xe0
[55340.766609] [c000000fdd92f950] [c000000000008f84] decrementer_common+0x114/0x120
[55340.766726] --- interrupt: 901 at smp_call_function_single+0x180/0x2b0
[55340.766726] LR = smp_call_function_single+0x15c/0x2b0
[55340.766886] [c000000fdd92fc40] [c000000fdd92fc90] 0xc000000fdd92fc90 (unreliable)
[55340.766993] [c000000fdd92fca0] [c0000000002d7b64] perf_install_in_context+0x1d4/0x250
[55340.767093] [c000000fdd92fd30] [c0000000002e9784] SyS_perf_event_open+0xb14/0xda0
[55340.767171] [c000000fdd92fe30] [c00000000000b004] system_call+0x38/0xe0
[55520.812831] INFO: rcu_sched self-detected stall on CPU
[55520.812888] 49-...: (5533226 ticks this GP) idle=72b/140000000000001/0 softirq=706/706 fqs=2741558
[55520.812971] (t=5533227 jiffies g=5834 c=5833 q=3642556)
[55520.813046] Task dump for CPU 49:
[55520.813090] ppc64_cpu R running task 0 6588 6516 0x00042084
[55520.813189] Call Trace:
[55520.813208] [c000000fdd92f500] [c000000000bfdddc] sched_show_task.part.61+0x80/0x98 (unreliable)
[55520.813344] [c000000fdd92f570] [c000000000bfe50c] rcu_dump_cpu_stacks+0xd8/0x13c
[55520.813461] [c000000fdd92f5c0] [c0000000001dfc84] rcu_check_callbacks+0xbf4/0xcd0
[55520.813551] [c000000fdd92f6e0] [c0000000001e9264] update_process_times+0x84/0xf0
[55520.813672] [c000000fdd92f720] [c0000000002061ac] tick_sched_timer+0x8c/0x1b0
[55520.813768] [c000000fdd92f760] [c0000000001ea198] __hrtimer_run_queues+0x118/0x3e0
[55520.813903] [c000000fdd92f800] [c0000000001eb3ac] hrtimer_interrupt+0xfc/0x310
[55520.814011] [c000000fdd92f8d0] [c000000000025ed0] __timer_interrupt+0x90/0x270
[55520.814110] [c000000fdd92f920] [c0000000000262e0] timer_interrupt+0xa0/0xe0
[55520.814198] [c000000fdd92f950] [c000000000008f84] decrementer_common+0x114/0x120
[55520.814296] --- interrupt: 901 at smp_call_function_single+0x180/0x2b0
[55520.814296] LR = smp_call_function_single+0x15c/0x2b0
[55520.814449] [c000000fdd92fc40] [c000000fdd92fc90] 0xc000000fdd92fc90 (unreliable)
[55520.814553] [c000000fdd92fca0] [c0000000002d7b64] perf_install_in_context+0x1d4/0x250
[55520.814658] [c000000fdd92fd30] [c0000000002e9784] SyS_perf_event_open+0xb14/0xda0
[55520.814756] [c000000fdd92fe30] [c00000000000b004] system_call+0x38/0xe0
[55700.860411] INFO: rcu_sched self-detected stall on CPU
[55700.860464] 49-...: (5551230 ticks this GP) idle=72b/140000000000001/0 softirq=706/706 fqs=2750435
[55700.860577] (t=5551231 jiffies g=5834 c=5833 q=3653059)
[55700.860635] Task dump for CPU 49:
[55700.860688] ppc64_cpu R running task 0 6588 6516 0x00042084
[55700.860777] Call Trace:
[55700.860815] [c000000fdd92f500] [c000000000bfdddc] sched_show_task.part.61+0x80/0x98 (unreliable)
[55700.860949] [c000000fdd92f570] [c000000000bfe50c] rcu_dump_cpu_stacks+0xd8/0x13c
[55700.861049] [c000000fdd92f5c0] [c0000000001dfc84] rcu_check_callbacks+0xbf4/0xcd0
[55700.861166] [c000000fdd92f6e0] [c0000000001e9264] update_process_times+0x84/0xf0
[55700.861286] [c000000fdd92f720] [c0000000002061ac] tick_sched_timer+0x8c/0x1b0
[55700.861386] [c000000fdd92f760] [c0000000001ea198] __hrtimer_run_queues+0x118/0x3e0
[55700.861484] [c000000fdd92f800] [c0000000001eb3ac] hrtimer_interrupt+0xfc/0x310
[55700.861588] [c000000fdd92f8d0] [c000000000025ed0] __timer_interrupt+0x90/0x270
[55700.861684] [c000000fdd92f920] [c0000000000262e0] timer_interrupt+0xa0/0xe0
[55700.861773] [c000000fdd92f950] [c000000000008f84] decrementer_common+0x114/0x120
[55700.861879] --- interrupt: 901 at smp_call_function_single+0x188/0x2b0
[55700.861879] LR = smp_call_function_single+0x15c/0x2b0
[55700.862026] [c000000fdd92fc40] [c000000fdd92fc90] 0xc000000fdd92fc90 (unreliable)
[55700.862127] [c000000fdd92fca0] [c0000000002d7b64] perf_install_in_context+0x1d4/0x250
[55700.862227] [c000000fdd92fd30] [c0000000002e9784] SyS_perf_event_open+0xb14/0xda0
[55700.862331] [c000000fdd92fe30] [c00000000000b004] system_call+0x38/0xe0
[55880.907991] INFO: rcu_sched self-detected stall on CPU
[55880.908047] 49-...: (5569234 ticks this GP) idle=72b/140000000000001/0 softirq=706/706 fqs=2759321
[55880.908131] (t=5569235 jiffies g=5834 c=5833 q=3663812)
[55880.908187] Task dump for CPU 49:
[55880.908230] ppc64_cpu R running task 0 6588 6516 0x00042084
[55880.908328] Call Trace:
[55880.908357] [c000000fdd92f500] [c000000000bfdddc] sched_show_task.part.61+0x80/0x98 (unreliable)
[55880.908471] [c000000fdd92f570] [c000000000bfe50c] rcu_dump_cpu_stacks+0xd8/0x13c
[55880.908593] [c000000fdd92f5c0] [c0000000001dfc84] rcu_check_callbacks+0xbf4/0xcd0
[55880.908707] [c000000fdd92f6e0] [c0000000001e9264] update_process_times+0x84/0xf0
[55880.908822] [c000000fdd92f720] [c0000000002061ac] tick_sched_timer+0x8c/0x1b0
[55880.908939] [c000000fdd92f760] [c0000000001ea198] __hrtimer_run_queues+0x118/0x3e0
[55880.909070] [c000000fdd92f800] [c0000000001eb3ac] hrtimer_interrupt+0xfc/0x310
[55880.909170] [c000000fdd92f8d0] [c000000000025ed0] __timer_interrupt+0x90/0x270
[55880.909263] [c000000fdd92f920] [c0000000000262e0] timer_interrupt+0xa0/0xe0
[55880.909350] [c000000fdd92f950] [c000000000008f84] decrementer_common+0x114/0x120
[55880.909443] --- interrupt: 901 at smp_call_function_single+0x188/0x2b0
[55880.909600] [c000000fdd92fc40] [c000000fdd92fc90] 0xc000000fdd92fc90 (unreliable)
[55880.909678] [c000000fdd92fca0] [c0000000002d7b64] perf_install_in_context+0x1d4/0x250
[55880.909809] [c000000fdd92fd30] [c0000000002e9784] SyS_perf_event_open+0xb14/0xda0
[55880.909906] [c000000fdd92fe30] [c00000000000b004] system_call+0x38/0xe0
[56060.955572] INFO: rcu_sched self-detected stall on CPU
[56060.955620] 49-...: (5587238 ticks this GP) idle=72b/140000000000001/0 softirq=706/706 fqs=2768201
[56060.955694] (t=5587239 jiffies g=5834 c=5833 q=3673981)
[56060.955760] Task dump for CPU 49:
[56060.955826] ppc64_cpu R running task 0 6588 6516 0x00042084
[56060.955944] Call Trace:
[56060.955993] [c000000fdd92f500] [c000000000bfdddc] sched_show_task.part.61+0x80/0x98 (unreliable)
[56060.956127] [c000000fdd92f570] [c000000000bfe50c] rcu_dump_cpu_stacks+0xd8/0x13c
[56060.956244] [c000000fdd92f5c0] [c0000000001dfc84] rcu_check_callbacks+0xbf4/0xcd0
[56060.956352] [c000000fdd92f6e0] [c0000000001e9264] update_process_times+0x84/0xf0
[56060.956455] [c000000fdd92f720] [c0000000002061ac] tick_sched_timer+0x8c/0x1b0
[56060.956550] [c000000fdd92f760] [c0000000001ea198] __hrtimer_run_queues+0x118/0x3e0
[56060.956647] [c000000fdd92f800] [c0000000001eb3ac] hrtimer_interrupt+0xfc/0x310
[56060.956749] [c000000fdd92f8d0] [c000000000025ed0] __timer_interrupt+0x90/0x270
[56060.956848] [c000000fdd92f920] [c0000000000262e0] timer_interrupt+0xa0/0xe0
[56060.956935] [c000000fdd92f950] [c000000000008f84] decrementer_common+0x114/0x120
[56060.957039] --- interrupt: 901 at smp_call_function_single+0x184/0x2b0
[56060.957039] LR = smp_call_function_single+0x15c/0x2b0
[56060.957194] [c000000fdd92fc40] [c000000fdd92fc90] 0xc000000fdd92fc90 (unreliable)
[56060.957293] [c000000fdd92fca0] [c0000000002d7b64] perf_install_in_context+0x1d4/0x250
[56060.957394] [c000000fdd92fd30] [c0000000002e9784] SyS_perf_event_open+0xb14/0xda0
[56060.957492] [c000000fdd92fe30] [c00000000000b004] system_call+0x38/0xe0
root@witherspoon:~#

== Comment: #9 - Shilpasri G. Bhat <email address hidden> - 2018-05-28 00:40:16 ==
Stewart has pointed to these missing kernel commits required for this bug:

1) 682e6b4da5cbe8e9a53f979a58c2a9d7dc997175 rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops
2) 34dd25de9fe3f60bfdb31b473bf04b28262d0896 powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-168095 severity-critical targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Changed in ubuntu-power-systems:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
tags: added: triage-g

------- Comment From <email address hidden> 2018-05-29 10:55 EDT-------
== Comment: #3 - Application Cdeadmin <> - 2018-05-22 12:42:02 ====== State: Open by: swanman on 22 May 2018 09:17:33 ====

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-29 11:05 EDT-------
== Comment: #1 - Application Cdeadmin <> - 2018-05-22 09:36:14 ====== State: Open by: swanman on 22 May 2018 09:17:33 ====

==== State: Working by: cde00 on 29 May 2018 09:57:15 ====

Changed in linux (Ubuntu):
importance: Undecided → Critical
status: New → Triaged
Changed in linux (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → Critical
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
status: Triaged → In Progress
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commits 34dd25de9fe3 and 682e6b4da5cb. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1773964

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-29 12:22 EDT-------
==== State: Working by: cde00 on 29 May 2018 11:22:32 ====

Changed in ubuntu-power-systems:
status: Triaged → In Progress
bugproxy (bugproxy) wrote :
Download full text (3.2 KiB)

------- Comment From <email address hidden> 2018-05-30 17:35 EDT-------
==== State: Working by: dlshaw on 30 May 2018 16:24:44 ====

#=#=#=# David L. Shaw (<email address hidden>) - 2018-05-30 16:24:41 (CDT) - POWER Firmware #=#=#=#

I think I installed this test kernel correctly on w34's OS (w34L.aus.stglabs.ibm.com). Let me know if I did something wrong and need to fix it.
Before installing this test kernel it had Ubuntu 18.04 (Bionic) kernel: 4.15.0-22-generic

Ran the following to install test kernel:
root@w34L:~/SW430227# ls -lt
total 48220
-rw-r--r-- 1 root root 31438592 May 29 11:14 linux-modules-extra-4.15.0-23-generic_4.15.0-23.26~lp1773964_ppc64el.deb
-rw-r--r-- 1 root root 6152468 May 29 11:14 linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1773964_ppc64el.deb
-rw-r--r-- 1 root root 11767420 May 29 11:14 linux-modules-4.15.0-23-generic_4.15.0-23.26~lp1773964_ppc64el.deb

root@w34L:~/SW430227# dpkg -i *.deb
Selecting previously unselected package linux-image-unsigned-4.15.0-23-generic.
(Reading database ... 175537 files and directories currently installed.)
Preparing to unpack linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1773964_ppc64el.deb ...
Unpacking linux-image-unsigned-4.15.0-23-generic (4.15.0-23.26~lp1773964) ...
Selecting previously unselected package linux-modules-4.15.0-23-generic.
Preparing to unpack linux-modules-4.15.0-23-generic_4.15.0-23.26~lp1773964_ppc64el.deb ...
Unpacking linux-modules-4.15.0-23-generic (4.15.0-23.26~lp1773964) ...
Selecting previously unselected package linux-modules-extra-4.15.0-23-generic.
Preparing to unpack linux-modules-extra-4.15.0-23-generic_4.15.0-23.26~lp1773964_ppc64el.deb ...
Unpacking linux-modules-extra-4.15.0-23-generic (4.15.0-23.26~lp1773964) ...
Setting up linux-modules-4.15.0-23-generic (4.15.0-23.26~lp1773964) ...
Setting up linux-image-unsigned-4.15.0-23-generic (4.15.0-23.26~lp1773964) ...
I: /boot/vmlinux.old is now a symlink to vmlinux-4.15.0-22-generic
I: /boot/initrd.img.old is now a symlink to initrd.img-4.15.0-22-generic
I: /boot/vmlinux is now a symlink to vmlinux-4.15.0-23-generic
I: /boot/initrd.img is now a symlink to initrd.img-4.15.0-23-generic
Setting up linux-modules-extra-4.15.0-23-generic (4.15.0-23.26~lp1773964) ...
Processing triggers for linux-image-unsigned-4.15.0-23-generic (4.15.0-23.26~lp1773964) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.15.0-23-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/zz-update-grub:
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.15.0-23-generic
Found initrd image: /boot/initrd.img-4.15.0-23-generic
Found linux image: /boot/vmlinux-4.15.0-22-generic
Found initrd image: /boot/initrd.img-4.15.0-22-generic
Found linux image: /boot/vmlinux-4.15.0-20-generic
Found initrd image: /boot/initrd.img-4.15.0-20-generic
Found linux image: /boot/vmlinux-4.15.0-15-generic
Found initrd image: /boot/initrd.img-4.15.0-15-generic
Found linux image: /boot/vmlinux-4.15.0-13-generic
Found initrd image: /boot/initrd.img-4.15.0-13-generic
done

Rebooted the system and the new kernel is:
Linux w34L 4.15.0-23-gen...

Read more...

Joseph Salisbury (jsalisbury) wrote :

It looks like you booted into the correct test kernel. Did it resolve the bug?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-05-30 19:55 EDT-------
==== State: Working by: dlshaw on 30 May 2018 18:45:42 ====

#=#=#=# David L. Shaw (<email address hidden>) - 2018-05-30 18:45:40 (CDT) - POWER Firmware #=#=#=#

The failure was intermittent. Automated tests are running on the system. Time will tell if this resolves the issue.

#=#=#=# end of update #=#=#=#

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-06-05 14:15 EDT-------
hi David,

> The failure was intermittent. Automated tests are running on the system.
> Time will tell if this resolves the issue.

Any update on the failures?

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-06-05 14:35 EDT-------
==== State: Working by: yadlapat on 05 June 2018 13:32:15 ====

Opened https://w3.rchland.ibm.com/projects/bestquest/?defect=SW432166 to track if this patch is needed for Pegas

Manoj Iyer (manjo) on 2018-06-11
Changed in ubuntu-power-systems:
status: In Progress → Incomplete
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-06-11 11:24 EDT-------
Stephanie Swanson has confirmed to have tested with the new kernel and the error was not reproduced.
She has also mentioned that the original bug was occurring intermittently and there has been no recreates of this issue with the new kernel.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-06-12 10:46 EDT-------
==== State: Working by: yadlapat on 12 June 2018 09:43:39 ====

#=#=# 2018-06-12 09:43:37 (CDT) #=#=#
New Priority = [2]
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#

description: updated
Changed in ubuntu-power-systems:
status: Incomplete → In Progress
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-06-15 08:07 EDT-------
I'm going to be on vacation next weeks, let the kernel backports team know if you all need any kind of support.

https://github.ibm.com/gwalbon/kernel-backports/wiki/Bugzilla-::-Request-Backports

Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments