------- Comment From <email address hidden> 2018-04-20 20:05 EDT-------
(In reply to comment #98)
> Boslcp3 is back with the new kernel from #94.
>
> root@boslcp3:~# uname -a
> Linux boslcp3 4.15.0-18-generic #19 SMP Fri Apr 20 12:45:38 CDT 2018 ppc64le
> ppc64le ppc64le GNU/Linux
> root@boslcp3:~# cat /proc/cmdline
> root=UUID=bab108a0-d0a6-4609-87f1-6e33d0ad633c ro splash quiet
> crashkernel=2G-4G:320M,4G-32G:512M,32G-64G:1024M,64G-128G:2048M,128G-:
> 4096M@128M
>
> I will launch our test soon.
It is not looking good on boslcp3. After I start test, within 3 hours run, system is still pingable but I cannot ssh to it. Looking at the console, I see these on all over....
************************************************************
[ 8785.370897] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 8785.370962] 1-...0: (4 GPs behind) idle=ca2/140000000000001/0 softirq=15273/15273 fqs=1075891
[ 8785.371035] (detected by 3, t=2179442 jiffies, g=2107, c=2106, q=386665)
[ 8785.371090] Task dump for CPU 1:
[ 8785.371123] kworker/1:3 R running task 0 4111 2 0x00000804
[ 8785.371195] Call Trace:
[ 8785.371221] [c0000000d5c4fa00] [c000000008133cf8] worker_thread+0x98/0x630 (unreliable)
[ 8848.390897] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 8848.390964] 1-...0: (4 GPs behind) idle=ca2/140000000000001/0 softirq=15273/15273 fqs=1083603
[ 8848.391037] (detected by 3, t=2195197 jiffies, g=2107, c=2106, q=389679)
[ 8848.391092] Task dump for CPU 1:
[ 8848.391125] kworker/1:3 R running task 0 4111 2 0x00000804
[ 8848.391197] Call Trace:
[ 8848.391223] [c0000000d5c4fa00] [c000000008133cf8] worker_thread+0x98/0x630 (unreliable)
[ 8857.031091] systemd[1]: systemd-journald.service: Start operation timed out. Terminating.
***********************************************************
root@boslcp3:~# uname -a
Linux boslcp3 4.15.0-18-generic #19 SMP Fri Apr 20 12:45:38 CDT 2018 ppc64le ppc64le ppc64le GNU/Linux
------- Comment From <email address hidden> 2018-04-21 08:08 EDT-------
The two guests are impacted due to (In reply to comment #103)
> Updated boslcp3 with latest PNOR:0420 & restarted tests on guests with
> kernel '4.15.0-18-generic'.
>
> $ ./ipmis bmc-boslcp3 fru print 47
> Product Name : OpenPOWER Firmware
> Product Version : open-power-SUPERMICRO-P9DSU-V1.11-20180420-imp
> Product Extra : op-build-4d27fab
> Product Extra : skiboot-v5.11-70-g5307c0ec7899-pc34e21f
> Product Extra : hostboot-742640c
> Product Extra : linux-4.15.14-openpower1-p81c2d44
> Product Extra : petitboot-v1.7.1-p8b80147
> Product Extra : machine-xml-32ce616
> Product Extra : occ-4f49f6
>
> root@boslcp3:~# uname -a
> Linux boslcp3 4.15.0-18-generic #19 SMP Fri Apr 20 12:45:38 CDT 2018 ppc64le
> ppc64le ppc64le GNU/Linux
> root@boslcp3:~# uname -r
> 4.15.0-18-generic
>
> Guests kernel:
> ****************
> root@boslcp3g3:~# uname -a
> Linux boslcp3g3 4.15.0-15-generic #16+bug166877 SMP Wed Apr 18 14:47:30 CDT
> 2018 ppc64le ppc64le ppc64le GNU/Linux
> root@boslcp3g3:~# uname -r
> 4.15.0-15-generic
>
> Regards,
> Indira
The two guests are impacted in the new run today (bug# 167104) and the 3rd one is not able to reach, but not dumping any console logs, we are waiting to see what happens!
The host is running fine, so far, but w/o guest run I'm not sure how soon can we verify this?
Please check on bug#167104 to make this recreate/verify fast.
The host continue
------- Comment From <email address hidden> 2018-04-20 16:48 EDT-------
Boslcp3 is back with the new kernel from #94.
root@boslcp3:~# cat /proc/cmdline bab108a0- d0a6-4609- 87f1-6e33d0ad63 3c ro splash quiet crashkernel= 2G-4G:320M, 4G-32G: 512M,32G- 64G:1024M, 64G-128G: 2048M,128G- :4096M@ 128M
root=UUID=
I will launch our test soon.
------- Comment From <email address hidden> 2018-04-20 20:05 EDT------- bab108a0- d0a6-4609- 87f1-6e33d0ad63 3c ro splash quiet 2G-4G:320M, 4G-32G: 512M,32G- 64G:1024M, 64G-128G: 2048M,128G- :
(In reply to comment #98)
> Boslcp3 is back with the new kernel from #94.
>
> root@boslcp3:~# uname -a
> Linux boslcp3 4.15.0-18-generic #19 SMP Fri Apr 20 12:45:38 CDT 2018 ppc64le
> ppc64le ppc64le GNU/Linux
> root@boslcp3:~# cat /proc/cmdline
> root=UUID=
> crashkernel=
> 4096M@128M
>
> I will launch our test soon.
It is not looking good on boslcp3. After I start test, within 3 hours run, system is still pingable but I cannot ssh to it. Looking at the console, I see these on all over.... ******* ******* ******* ******* ******* ******* ******* **** 140000000000001 /0 softirq=15273/15273 fqs=1075891 thread+ 0x98/0x630 (unreliable) 140000000000001 /0 softirq=15273/15273 fqs=1083603 thread+ 0x98/0x630 (unreliable) journald. service: Start operation timed out. Terminating. ******* ******* ******* ******* ******* ******* ******* ***
*******
[ 8785.370897] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 8785.370962] 1-...0: (4 GPs behind) idle=ca2/
[ 8785.371035] (detected by 3, t=2179442 jiffies, g=2107, c=2106, q=386665)
[ 8785.371090] Task dump for CPU 1:
[ 8785.371123] kworker/1:3 R running task 0 4111 2 0x00000804
[ 8785.371195] Call Trace:
[ 8785.371221] [c0000000d5c4fa00] [c000000008133cf8] worker_
[ 8848.390897] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 8848.390964] 1-...0: (4 GPs behind) idle=ca2/
[ 8848.391037] (detected by 3, t=2195197 jiffies, g=2107, c=2106, q=389679)
[ 8848.391092] Task dump for CPU 1:
[ 8848.391125] kworker/1:3 R running task 0 4111 2 0x00000804
[ 8848.391197] Call Trace:
[ 8848.391223] [c0000000d5c4fa00] [c000000008133cf8] worker_
[ 8857.031091] systemd[1]: systemd-
*******
root@boslcp3:~# uname -a
Linux boslcp3 4.15.0-18-generic #19 SMP Fri Apr 20 12:45:38 CDT 2018 ppc64le ppc64le ppc64le GNU/Linux
------- Comment From <email address hidden> 2018-04-21 08:08 EDT------- 18-generic' . SUPERMICRO- P9DSU-V1. 11-20180420- imp v5.11-70- g5307c0ec7899- pc34e21f 15.14-openpower 1-p81c2d44 v1.7.1- p8b80147
The two guests are impacted due to (In reply to comment #103)
> Updated boslcp3 with latest PNOR:0420 & restarted tests on guests with
> kernel '4.15.0-
>
> $ ./ipmis bmc-boslcp3 fru print 47
> Product Name : OpenPOWER Firmware
> Product Version : open-power-
> Product Extra : op-build-4d27fab
> Product Extra : skiboot-
> Product Extra : hostboot-742640c
> Product Extra : linux-4.
> Product Extra : petitboot-
> Product Extra : machine-xml-32ce616
> Product Extra : occ-4f49f6
>
> root@boslcp3:~# uname -a
> Linux boslcp3 4.15.0-18-generic #19 SMP Fri Apr 20 12:45:38 CDT 2018 ppc64le
> ppc64le ppc64le GNU/Linux
> root@boslcp3:~# uname -r
> 4.15.0-18-generic
>
> Guests kernel:
> ****************
> root@boslcp3g3:~# uname -a
> Linux boslcp3g3 4.15.0-15-generic #16+bug166877 SMP Wed Apr 18 14:47:30 CDT
> 2018 ppc64le ppc64le ppc64le GNU/Linux
> root@boslcp3g3:~# uname -r
> 4.15.0-15-generic
>
> Regards,
> Indira
The two guests are impacted in the new run today (bug# 167104) and the 3rd one is not able to reach, but not dumping any console logs, we are waiting to see what happens!
The host is running fine, so far, but w/o guest run I'm not sure how soon can we verify this?
Please check on bug#167104 to make this recreate/verify fast.
The host continue