Comment 35 for bug 1762844

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2018-04-20 16:48 EDT-------
Boslcp3 is back with the new kernel from #94.

root@boslcp3:~# cat /proc/cmdline
root=UUID=bab108a0-d0a6-4609-87f1-6e33d0ad633c ro splash quiet crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:4096M@128M

I will launch our test soon.

------- Comment From <email address hidden> 2018-04-20 20:05 EDT-------
(In reply to comment #98)
> Boslcp3 is back with the new kernel from #94.
>
> root@boslcp3:~# uname -a
> Linux boslcp3 4.15.0-18-generic #19 SMP Fri Apr 20 12:45:38 CDT 2018 ppc64le
> ppc64le ppc64le GNU/Linux
> root@boslcp3:~# cat /proc/cmdline
> root=UUID=bab108a0-d0a6-4609-87f1-6e33d0ad633c ro splash quiet
> crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:
> 4096M@128M
>
> I will launch our test soon.

It is not looking good on boslcp3. After I start test, within 3 hours run, system is still pingable but I cannot ssh to it. Looking at the console, I see these on all over....
************************************************************
[ 8785.370897] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 8785.370962] 1-...0: (4 GPs behind) idle=ca2/140000000000001/0 softirq=15273/15273 fqs=1075891
[ 8785.371035] (detected by 3, t=2179442 jiffies, g=2107, c=2106, q=386665)
[ 8785.371090] Task dump for CPU 1:
[ 8785.371123] kworker/1:3 R running task 0 4111 2 0x00000804
[ 8785.371195] Call Trace:
[ 8785.371221] [c0000000d5c4fa00] [c000000008133cf8] worker_thread+0x98/0x630 (unreliable)
[ 8848.390897] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 8848.390964] 1-...0: (4 GPs behind) idle=ca2/140000000000001/0 softirq=15273/15273 fqs=1083603
[ 8848.391037] (detected by 3, t=2195197 jiffies, g=2107, c=2106, q=389679)
[ 8848.391092] Task dump for CPU 1:
[ 8848.391125] kworker/1:3 R running task 0 4111 2 0x00000804
[ 8848.391197] Call Trace:
[ 8848.391223] [c0000000d5c4fa00] [c000000008133cf8] worker_thread+0x98/0x630 (unreliable)
[ 8857.031091] systemd[1]: systemd-journald.service: Start operation timed out. Terminating.
***********************************************************

root@boslcp3:~# uname -a
Linux boslcp3 4.15.0-18-generic #19 SMP Fri Apr 20 12:45:38 CDT 2018 ppc64le ppc64le ppc64le GNU/Linux

------- Comment From <email address hidden> 2018-04-21 08:08 EDT-------
The two guests are impacted due to (In reply to comment #103)
> Updated boslcp3 with latest PNOR:0420 & restarted tests on guests with
> kernel '4.15.0-18-generic'.
>
> $ ./ipmis bmc-boslcp3 fru print 47
> Product Name : OpenPOWER Firmware
> Product Version : open-power-SUPERMICRO-P9DSU-V1.11-20180420-imp
> Product Extra : op-build-4d27fab
> Product Extra : skiboot-v5.11-70-g5307c0ec7899-pc34e21f
> Product Extra : hostboot-742640c
> Product Extra : linux-4.15.14-openpower1-p81c2d44
> Product Extra : petitboot-v1.7.1-p8b80147
> Product Extra : machine-xml-32ce616
> Product Extra : occ-4f49f6
>
> root@boslcp3:~# uname -a
> Linux boslcp3 4.15.0-18-generic #19 SMP Fri Apr 20 12:45:38 CDT 2018 ppc64le
> ppc64le ppc64le GNU/Linux
> root@boslcp3:~# uname -r
> 4.15.0-18-generic
>
> Guests kernel:
> ****************
> root@boslcp3g3:~# uname -a
> Linux boslcp3g3 4.15.0-15-generic #16+bug166877 SMP Wed Apr 18 14:47:30 CDT
> 2018 ppc64le ppc64le ppc64le GNU/Linux
> root@boslcp3g3:~# uname -r
> 4.15.0-15-generic
>
> Regards,
> Indira

The two guests are impacted in the new run today (bug# 167104) and the 3rd one is not able to reach, but not dumping any console logs, we are waiting to see what happens!

The host is running fine, so far, but w/o guest run I'm not sure how soon can we verify this?

Please check on bug#167104 to make this recreate/verify fast.
The host continue