[LTCTest][OPAL][OP920] INFO: rcu_sched self-detected stall on CPU

Bug #1777857 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
In Progress
High
Canonical Kernel Team
linux (Ubuntu)
In Progress
High
Unassigned
Bionic
In Progress
High
Unassigned

Bug Description

== SRU Justification ==
IBM is seeing kernel traces during testing. This is due to a missing
backport of some kernel fixes in the RTC driver, which is commit
682e6b4da5cb. Commit 682e6b4da5cb was also cc'd to upstream stable, but
it has not landed in Bionic as of yet. It is also a fix to upstream
commit 628daa8d5abf.

Commit 34dd25de9fe3 is also needed as a prereq to define
OPAL_BUSY_DELAY_MS.

== Fixes ==
34dd25de9fe3 ("powerpc/powernv: define a standard delay for OPAL_BUSY type retry loops")
682e6b4da5cb ("rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops")

== Regression Potential ==
Low. Limited to powerpc. Fixes a current regression.

== Test Case ==
A test kernel was built with these patches and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.

== Comment: #0 - PAVAMAN SUBRAMANIYAM <email address hidden> - 2018-05-23 01:15:30 ==
Install a P9 Open Power Hardware with the latest OP920 Firmware images provided in the following link:
http://pfd.austin.ibm.com/releasenotes/openpower9/OP920/OP920_1808A/OP920_1808N_RelNote_Main.html

root@witherspoon:~# cat /etc/os-release
ID="openbmc-phosphor"
NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro)"
VERSION="ibm-v2.1"
VERSION_ID="ibm-v2.1-438-g0030304-r12-0-g5ee4fb0"
PRETTY_NAME="Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) ibm-v2.1"
BUILD_ID="ibm-v2.1-438-g0030304-r12"
root@witherspoon:~# cat /var/lib/phosphor-software-manager/pnor/ro/VERSION
IBM-witherspoon-ibm-OP9-v2.0-2.14
        op-build-v2.0-11-gb248194-dirty
        buildroot-2018.02.1-6-ga8d1126
        skiboot-v6.0.1
        hostboot-8ab6717d-pfc036fa
        occ-77bb5e6
        linux-4.16.8-openpower2-pb532d68
        petitboot-v1.7.1-p1188545
        machine-xml-7cd20a6
        hostboot-binaries-276bb70
        capp-ucode-p9-dd2-v4
        sbe-a596975
        hcode-b8173e8

Seeing the following messages in the dmesg logs.

[ 16.377405] ipmi_si: Unable to find any System Interface(s)
[ 17.384118] nf_conntrack version 0.5.0 (65536 buckets, 262144 max)
[ 1372.711730] INFO: rcu_sched self-detected stall on CPU
[ 1372.711787] 32-....: (5249 ticks this GP) idle=182/140000000000001/0 softirq=1093/1093 fqs=2623
[ 1372.711863] (t=5250 jiffies g=22430 c=22429 q=953)
[ 1372.711921] Task dump for CPU 32:
[ 1372.711922] kworker/32:1 R running task 0 1123 2 0x00000804
[ 1372.711930] Workqueue: events rtc_timer_do_work
[ 1372.711931] Call Trace:
[ 1372.711934] [c000003fd2b97350] [c00000000014a8f8] sched_show_task.part.16+0xd8/0x110 (unreliable)
[ 1372.711939] [c000003fd2b973c0] [c0000000001aa8bc] rcu_dump_cpu_stacks+0xd4/0x138
[ 1372.711942] [c000003fd2b97410] [c0000000001a9988] rcu_check_callbacks+0x8e8/0xb40
[ 1372.711945] [c000003fd2b97540] [c0000000001b7c28] update_process_times+0x48/0x90
[ 1372.711948] [c000003fd2b97570] [c0000000001cf974] tick_sched_handle.isra.5+0x34/0xd0
[ 1372.711950] [c000003fd2b975a0] [c0000000001cfa70] tick_sched_timer+0x60/0xe0
[ 1372.711953] [c000003fd2b975e0] [c0000000001b87d4] __hrtimer_run_queues+0x144/0x370
[ 1372.711956] [c000003fd2b97660] [c0000000001b972c] hrtimer_interrupt+0xfc/0x350
[ 1372.711959] [c000003fd2b97730] [c0000000000249f0] __timer_interrupt+0x90/0x260
[ 1372.711962] [c000003fd2b97780] [c000000000024e08] timer_interrupt+0x98/0xe0
[ 1372.711965] [c000003fd2b977b0] [c000000000009054] decrementer_common+0x114/0x120
[ 1372.711970] --- interrupt: 901 at opal_get_rtc_time+0x98/0x110
                   LR = opal_return+0x14/0x48
[ 1372.711972] [c000003fd2b97aa0] [c000000000a457b8] opal_get_rtc_time+0x98/0x110 (unreliable)
[ 1372.711975] [c000003fd2b97ae0] [c000000000a3f98c] __rtc_read_time+0x7c/0x180
[ 1372.711977] [c000003fd2b97b60] [c000000000a41738] rtc_timer_do_work+0x78/0x250
[ 1372.711980] [c000003fd2b97c90] [c000000000134378] process_one_work+0x298/0x5a0
[ 1372.711982] [c000003fd2b97d20] [c000000000134718] worker_thread+0x98/0x630
[ 1372.711985] [c000003fd2b97dc0] [c00000000013d348] kthread+0x1a8/0x1b0
[ 1372.711988] [c000003fd2b97e30] [c00000000000b658] ret_from_kernel_thread+0x5c/0x84

== Comment: #1 - PAVAMAN SUBRAMANIYAM <email address hidden> - 2018-05-23 01:31:06 ==

== Comment: #2 - Application Cdeadmin <email address hidden> - 2018-05-23 01:33:40 ==
cde00 (<email address hidden>) added native attachment /tmp/AIXOS07311082/dmesg.txt on 2018-05-23 01:33:33

== Comment: #3 - Application Cdeadmin <email address hidden> - 2018-05-24 16:45:41 ==
==== State: Open by: jayeshp on 24 May 2018 16:42:57 ====

#=#=# 2018-05-24 16:42:54 (CDT) #=#=#
New Fix_Potential = [P920.10W]
#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#=#

== Comment: #4 - Stewart Smith <email address hidden> - 2018-05-30 21:15:15 ==
This'll be a missing backport of some kernel fixes in the RTC driver.

It's at least this commit:
commit 682e6b4da5cbe8e9a53f979a58c2a9d7dc997175
Author: Nicholas Piggin <email address hidden>
Date: Tue Apr 10 21:49:32 2018 +1000

    rtc: opal: Fix OPAL RTC driver OPAL_BUSY loops

    The OPAL RTC driver does not sleep in case it gets OPAL_BUSY or
    OPAL_BUSY_EVENT from firmware, which causes large scheduling
    latencies, up to 50 seconds have been observed here when RTC stops
    responding (BMC reboot can do it).

    Fix this by converting it to the standard form OPAL_BUSY loop that
    sleeps.

    Fixes: 628daa8d5abf ("powerpc/powernv: Add RTC and NVRAM support plus RTAS fallbacks")
    Cc: <email address hidden> # v3.2+
    Signed-off-by: Nicholas Piggin <email address hidden>
    Acked-by: Alexandre Belloni <email address hidden>
    Signed-off-by: Michael Ellerman <email address hidden>

Revision history for this message
bugproxy (bugproxy) wrote : dmesg log is attached

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-168118 severity-high targetmilestone-inin1804
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → kernel-package (Ubuntu)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
affects: kernel-package (Ubuntu) → linux (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → High
tags: added: p9 triage-g
Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
Changed in linux (Ubuntu):
status: New → Triaged
importance: Undecided → High
Changed in linux (Ubuntu Bionic):
status: New → Triaged
importance: Undecided → High
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Joseph Salisbury (jsalisbury)
status: Triaged → In Progress
Changed in linux (Ubuntu Bionic):
status: Triaged → In Progress
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit 682e6b4da5cbe8e9a53f979a58c2a9d7dc997175. Commit 34dd25de9fe3f was also required as a prereq.

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1777857

Can you test this kernel and see if it resolves this bug?

Note about installing test kernels:
• If the test kernel is prior to 4.15(Bionic) you need to install the linux-image and linux-image-extra .deb packages.
• If the test kernel is 4.15(Bionic) or newer, you need to install the linux-modules, linux-modules-extra and linux-image-unsigned .deb packages.

Thanks in advance!

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-06-20 10:13 EDT-------
== Comment: #3 - Application Cdeadmin <email address hidden> - 2018-05-24 16:45:41 ====== State: Open by: cde00 on 20 June 2018 09:13:00 ====

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (5.3 KiB)

------- Comment From <email address hidden> 2018-06-22 01:58 EDT-------
I have downloaded and installed the test kernel on the machine.

root@ltc-wspoon11:~/lp1777857# ls -l
total 48232
-rwxrwxrwx 1 root root 6153264 Jun 20 09:02 linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1777857_ppc64el.deb
-rwxrwxrwx 1 root root 11767916 Jun 20 09:02 linux-modules-4.15.0-23-generic_4.15.0-23.26~lp1777857_ppc64el.deb
-rwxrwxrwx 1 root root 31445160 Jun 20 09:02 linux-modules-extra-4.15.0-23-generic_4.15.0-23.26~lp1777857_ppc64el.deb

root@ltc-wspoon11:~/lp1777857# dpkg -i linux-modules-extra-4.15.0-23-generic_4.15.0-23.26~lp1777857_ppc64el.deb
(Reading database ... 203277 files and directories currently installed.)
Preparing to unpack linux-modules-extra-4.15.0-23-generic_4.15.0-23.26~lp1777857_ppc64el.deb ...
Unpacking linux-modules-extra-4.15.0-23-generic (4.15.0-23.26~lp1777857) over (4.15.0-23.25) ...
Setting up linux-modules-extra-4.15.0-23-generic (4.15.0-23.26~lp1777857) ...
Processing triggers for linux-image-4.15.0-23-generic (4.15.0-23.25) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.15.0-23-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/kdump-tools:
kdump-tools: Generating /var/lib/kdump/initrd.img-4.15.0-23-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/zz-update-grub:
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.15.0-23-generic
Found initrd image: /boot/initrd.img-4.15.0-23-generic
Found linux image: /boot/vmlinux-4.15.0-22-generic
Found initrd image: /boot/initrd.img-4.15.0-22-generic
done

root@ltc-wspoon11:~/lp1777857# dpkg -i linux-modules-4.15.0-23-generic_4.15.0-23.26~lp1777857_ppc64el.deb
(Reading database ... 203277 files and directories currently installed.)
Preparing to unpack linux-modules-4.15.0-23-generic_4.15.0-23.26~lp1777857_ppc64el.deb ...
Unpacking linux-modules-4.15.0-23-generic (4.15.0-23.26~lp1777857) over (4.15.0-23.25) ...
Setting up linux-modules-4.15.0-23-generic (4.15.0-23.26~lp1777857) ...
Processing triggers for linux-image-4.15.0-23-generic (4.15.0-23.25) ...
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-4.15.0-23-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/kdump-tools:
kdump-tools: Generating /var/lib/kdump/initrd.img-4.15.0-23-generic
W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
/etc/kernel/postinst.d/zz-update-grub:
Generating grub configuration file ...
Found linux image: /boot/vmlinux-4.15.0-23-generic
Found initrd image: /boot/initrd.img-4.15.0-23-generic
Found linux image: /boot/vmlinux-4.15.0-22-generic
Found initrd image: /boot/initrd.img-4.15.0-22-generic
done

root@ltc-wspoon11:~/lp1777857# dpkg -i --force-all linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1777857_ppc64el.deb
dpkg: regarding linux-image-unsigned-4.15.0-23-generic_4.15.0-23.26~lp1777857_ppc64el.deb containing linux-image-unsigned-4.15.0-23-generic:
linux-image-unsigned-4.15.0-23-generic conflicts with li...

Read more...

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :
description: updated
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2018-08-07 02:43 EDT-------
(In reply to comment #12)
> SRU Request Submitted:
> https://lists.ubuntu.com/archives/kernel-team/2018-June/093483.html

Any update on this

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

This bug has been marked as a duplicate of LP#1773964 in Launchpad. Please see that bug for updates.
Thanks.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-03-15 02:15 EDT-------
*** This bug has been marked as a duplicate of bug 168095 ***

Brad Figg (brad-figg)
tags: added: cscc
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.