Ubuntu16.04.3: System running network stress crashes with Alignment exception

Bug #1710690 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Invalid
High
Canonical Kernel Team
linux (Ubuntu)
Invalid
High
Canonical Kernel Team

Bug Description

==== State: Open by: nguyenp on 11 August 2017 11:03:32 ====

Contact:
=======
Paul Nguyen
<email address hidden>

BMC:
====
bos1u1

Firmware Revision : 00.25
Firmware Build Time : 20170807 BMC MAC address : 0c:c4:7a:f4:4d:60
PNOR Build Time : 20170729
CPLD Version : B2.91.00

Ubuntu 16.04.3:
===========
bos1u1p1

       ver 1.5.4.5 - OS, HTX, Firmware and Machine details

                           OS: GNU/Linux
                   OS Version: Ubuntu 16.04.3 LTS \n \l
               Kernel Version: 4.11.0-12-generic
                  HTX Version: htxubuntu-448
                    Host Name: bos1u1p1
            Machine Serial No: C819UAF32B00002
           Machine Type/Model: 9006-12C

root@bos1u1p1:~# dpkg -l |grep mlx
ii libmlx4-1 41mlnx1-OFED.4.1.0.1.0.41014 ppc64el Userspace driver for Mellanox ConnectX

InfiniBand HCAs
ii libmlx4-1-dbg 41mlnx1-OFED.4.1.0.1.0.41014 ppc64el Debugging symbols for the libmlx4 driver
ii libmlx4-dev 41mlnx1-OFED.4.1.0.1.0.41014 ppc64el Development files for the libmlx4 driver
ii libmlx5-1 41mlnx1-OFED.4.1.0.1.3.0.1.41014 ppc64el Userspace driver for Mellanox ConnectX

InfiniBand HCAs
ii libmlx5-1-dbg 41mlnx1-OFED.4.1.0.1.3.0.1.41014 ppc64el Debugging symbols for the libmlx5 driver
ii libmlx5-dev 41mlnx1-OFED.4.1.0.1.3.0.1.41014 ppc64el Development files for the libmlx5 driver

root@bos1u1p1:~# lsscsi
[0:2:0:0] disk SEAGATE ST4000NM0034 E005 /dev/sda
[0:3:123:0] enclosu ADAPTEC Smart Adapter 2.99 -
root@bos1u1p1:~# lspci
0000:00:00.0 PCI bridge: IBM Device 04c1
0001:00:00.0 PCI bridge: IBM Device 04c1
0002:00:00.0 PCI bridge: IBM Device 04c1
0002:01:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0003:00:00.0 PCI bridge: IBM Device 04c1
0003:01:00.0 Serial Attached SCSI controller: Adaptec Series 8 12G SAS/PCIe 3 (rev 01)
0004:00:00.0 PCI bridge: IBM Device 04c1
0004:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0004:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0005:00:00.0 PCI bridge: IBM Device 04c1
0005:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0030:00:00.0 PCI bridge: IBM Device 04c1
0030:01:00.0 Infiniband controller: Mellanox Technologies Device 1019
0030:01:00.1 Infiniband controller: Mellanox Technologies Device 1019
0031:00:00.0 PCI bridge: IBM Device 04c1
0031:01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
0031:01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
0032:00:00.0 PCI bridge: IBM Device 04c1
0033:00:00.0 PCI bridge: IBM Device 04c1
0033:01:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
0033:01:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)

root@bos1u1p1:~# ifconfig -a
enP2p1s0f0 Link encap:Ethernet HWaddr ac:1f:6b:09:c0:9e
          inet addr:9.3.20.217 Bcast:9.3.21.255 Mask:255.255.254.0
          inet6 addr: fe80::ae1f:6bff:fe09:c09e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:62603 errors:0 dropped:0 overruns:0 frame:0
          TX packets:105 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:4784741 (4.7 MB) TX bytes:14043 (14.0 KB)

enP2p1s0f1 Link encap:Ethernet HWaddr ac:1f:6b:09:c0:9f
          BROADCAST MULTICAST MTU:1500 Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

enP2p1s0f2 Link encap:Ethernet HWaddr ac:1f:6b:09:c0:a0
          inet addr:108.1.1.217 Bcast:108.1.1.255 Mask:255.255.255.0
          inet6 addr: fe80::ae1f:6bff:fe09:c0a0/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:127350969 errors:0 dropped:65 overruns:0 frame:0
          TX packets:124182822 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:186712761859 (186.7 GB) TX bytes:181731375235 (181.7 GB)

enP2p1s0f3 Link encap:Ethernet HWaddr ac:1f:6b:09:c0:a1
          inet addr:108.1.2.217 Bcast:108.1.2.255 Mask:255.255.255.0
          inet6 addr: fe80::ae1f:6bff:fe09:c0a1/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:124182726 errors:0 dropped:0 overruns:0 frame:0
          TX packets:127351289 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:181731250053 (181.7 GB) TX bytes:186713217880 (186.7 GB)

enP49p1s0f0 Link encap:Ethernet HWaddr 0c:c4:7a:eb:17:ea
          inet addr:104.1.1.217 Bcast:104.1.1.255 Mask:255.255.255.0
          inet6 addr: fe80::ec4:7aff:feeb:17ea/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:126415178 errors:0 dropped:0 overruns:0 frame:0
          TX packets:124809946 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:185222061670 (185.2 GB) TX bytes:183034803740 (183.0 GB)

enP49p1s0f1 Link encap:Ethernet HWaddr 0c:c4:7a:eb:17:eb
          inet addr:104.1.2.217 Bcast:104.1.2.255 Mask:255.255.255.0
          inet6 addr: fe80::ec4:7aff:feeb:17eb/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:124809938 errors:0 dropped:0 overruns:0 frame:0
          TX packets:126415188 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:183034803062 (183.0 GB) TX bytes:185222062528 (185.2 GB)

enP51p1s0f0 Link encap:Ethernet HWaddr 0c:c4:7a:b4:28:6c
          inet addr:105.1.1.217 Bcast:105.1.1.255 Mask:255.255.255.0
          inet6 addr: fe80::ec4:7aff:feb4:286c/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:8067390 errors:0 dropped:0 overruns:0 frame:0
          TX packets:10234208 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:8355184401 (8.3 GB) TX bytes:13586437864 (13.5 GB)
          Memory:620c180800000-620c18081ffff

enP51p1s0f1 Link encap:Ethernet HWaddr 0c:c4:7a:b4:28:6d
          inet addr:102.1.2.217 Bcast:102.1.2.255 Mask:255.255.255.0
          inet6 addr: fe80::ec4:7aff:feb4:286d/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:10234110 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8067404 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:13586422574 (13.5 GB) TX bytes:8355185459 (8.3 GB)
          Memory:620c180820000-620c18083ffff

ib0 Link encap:UNSPEC HWaddr 00-00-00-86-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:103.1.1.217 Bcast:103.1.1.255 Mask:255.255.255.0
          inet6 addr: fe80::268a:703:a3:2a38/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
          RX packets:24 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:2400 (2.4 KB) TX bytes:1160 (1.1 KB)

ib1 Link encap:UNSPEC HWaddr 00-00-08-46-FE-80-00-00-00-00-00-00-00-00-00-00
          inet addr:103.1.2.217 Bcast:103.1.2.255 Mask:255.255.255.0
          inet6 addr: fe80::268a:703:a3:2a39/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
          RX packets:24 errors:0 dropped:0 overruns:0 frame:0
          TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:256
          RX bytes:2400 (2.4 KB) TX bytes:1220 (1.2 KB)

lo Link encap:Local Loopback
          inet addr:127.0.0.1 Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING MTU:65536 Metric:1
          RX packets:2173 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2173 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:349296 (349.2 KB) TX bytes:349296 (349.2 KB)

root@bos1u1p1:~#

               HTX Device Status Summary Current Time: 222 08/10/17 15:35:12
Cycle Count(Min/Max)=0/183 System Start Time: 222 08/10/17 15:11:22
Page Number(Cur/Max)=1/2
--------------------------------------------------------------------------------
            Last Update Cycle Curr. | Last Update Cycle Curr.
ST Device Day Time Count Stanza | ST Device Day Time Count Stanza
RN cache0 222 15:34:50 15 6 | RN cpu15 222 15:35:03 5 4
RN cache1 222 15:35:03 51 5 | RN enP2p1s0f222 15:35:11 183 1
RN cpu0 222 15:34:47 2 4 | RN enP2p1s0f222 15:35:11 173 1
RN cpu1 222 15:33:59 2 13 | RN enP49p1s0222 15:35:11 181 1
RN cpu2 222 15:34:05 2 8 | RN enP49p1s0222 15:35:10 177 1
RN cpu3 222 15:32:59 2 13 | RN enP51p1s0222 15:34:45 4 1
RN cpu4 222 15:34:44 2 10 | RN enP51p1s0222 15:34:46 10 1
RN cpu5 222 15:34:33 2 13 | RN fpu0 222 15:35:06 3 25
RN cpu6 222 15:35:01 2 6 | RN fpu1 222 15:34:58 4 5
RN cpu7 222 15:32:42 2 13 | RN fpu2 222 15:35:10 3 30
RN cpu8 222 15:35:08 3 10 | RN fpu3 222 15:34:59 4 19
RN cpu9 222 15:35:02 3 13 | RN fpu4 222 15:34:57 3 35
RN cpu10 222 15:35:03 5 2 | RN fpu5 222 15:35:07 4 11
RN cpu11 222 15:34:47 5 1 | RN fpu6 222 15:35:05 3 28
RN cpu12 222 15:34:53 5 2 | RN fpu7 222 15:35:09 4 19
RN cpu13 222 15:35:09 5 2 | RN fpu8 222 15:35:04 6 5
RN cpu14 222 15:35:10 5 3 | RN fpu9 222 15:35:09 6 17
RN fpu10 222 15:35:28 9 2 | RN sctu10 222 15:35:24 55 1
RN fpu11 222 15:35:29 8 41 | RN sctu11 222 15:35:28 62 4
RN fpu12 222 15:35:25 8 43 | RN sctu13 222 15:35:28 66 3
RN fpu13 222 15:35:26 8 43 | RN sctu14 222 15:35:31 59 2
RN fpu14 222 15:35:29 9 7 | RN sctu15 222 15:35:28 71 2
RN fpu15 222 15:35:28 9 10 | RN tlbie 222 15:34:48 26 2
RN mem 222 15:35:10 1 4 |
RN mlx5_0 222 15:35:22 6 1 |
RN mlx5_1 222 15:35:25 6 1 |
RN rng 222 15:35:21 196 1 |
RN sctu1 222 15:35:20 64 2 |
RN sctu2 222 15:35:25 50 4 |
RN sctu3 222 15:35:27 67 4 |
RN sctu5 222 15:35:29 70 3 |
RN sctu6 222 15:35:22 57 4 |
RN sctu7 222 15:35:23 68 4 |
RN sctu9 222 15:35:20 62 1 |

Problem Description:
====================
On my Boston LC system, I running with firmware BMC 0.25 and PNOR 0807. System is running with Ubuntu16.04.3

- The system was running network stress then it crashed with an alligment exception. The system is currently in xmon debugger and is available for developer to look.

[ 1105.304668] Unable to handle kernel paging request for unaligned access at address 0xc00a000000000122
[ 1105.304850] Faulting instruction address: 0xc000000000a1e6b4
1e:mon>

1e:mon> e
cpu 0x1e: Vector: 600 (Alignment) at [c000000007f0b1c0]
    pc: c000000000a1e6b4: skb_release_data+0xd4/0x1b0
    lr: c000000000a1e82c: __kfree_skb+0x2c/0x50
    sp: c000000007f0b440
   msr: 9000000000009033
   dar: c00a000000000122
 dsisr: 8000000
  current = 0xc000001a93574400
  paca = 0xc000000007b90e00 softe: 0 irq_happened: 0x01on 4.11.0-12-generic (buildd@bos01-ppc64el-026) (gcc version 5.4.0 20160609 (Ubuntu/IBM 5.4.0-6ubuntu1~16.04.4) ) #17~16.04.1-Ubuntu SMP Fri Jul 28 13:52:51 UTC 2017 (Ubuntu 4.11.0-12.17~16.04.1-generic 4.11.12)
1e:mon> t
[c000000007f0b480] c000000000a1e82c __kfree_skb+0x2c/0x50
[c000000007f0b4b0] c000000000abd77c tcp_clean_rtx_queue+0x2cc/0xd30
[c000000007f0b5d0] c000000000ac0f68 tcp_ack+0x5a8/0xa30
[c000000007f0b710] c000000000ac3004 tcp_rcv_established+0x1b4/0x830
[c000000007f0b790] c000000000ad05d4 tcp_v4_do_rcv+0x1b4/0x2f0
[c000000007f0b7d0] c000000000ad4168 tcp_v4_rcv+0xe18/0xe50
[c000000007f0b8c0] c000000000a9e7f0 ip_local_deliver_finish+0x170/0x350
[c000000007f0b910] c000000000a9f120 ip_local_deliver+0x60/0x130
[c000000007f0b970] c000000000a9ec28 ip_rcv_finish+0x258/0x510
[c000000007f0ba00] c000000000a9f550 ip_rcv+0x360/0x470
[c000000007f0ba70] c000000000a38e5c __netif_receive_skb_core+0x97c/0xdf0
[c000000007f0bb30] c000000000a3cbc8 netif_receive_skb_internal+0x38/0xd0
[c000000007f0bb70] c000000000a3db6c napi_gro_receive+0x11c/0x1d0
[c000000007f0bbb0] c008000010430c1c i40e_clean_rx_irq+0x74c/0xb00 [i40e]
[c000000007f0bca0] c00800001043134c i40e_napi_poll+0x37c/0x8f0 [i40e]
[c000000007f0bd50] c000000000a3d47c net_rx_action+0x39c/0x4a0
[c000000007f0be50] c000000000bcdc9c __do_softirq+0x19c/0x3fc
[c000000007f0bf40] c0000000000f3748 irq_exit+0xe8/0x120
[c000000007f0bf60] c000000000016b00 __do_irq+0x90/0x1d0
[c000000007f0bf90] c00000000002a5d0 call_do_irq+0x14/0x24
[c000001a935dfde0] c000000000016ce0 do_IRQ+0xa0/0x150
[c000001a935dfe30] c000000000009b94 h_virt_irq_common+0x114/0x120
--- Exception: ea1 at 00003fff920d8ce4
SP (3fff9184e5c0) is in userspace
1e:mon> r
R00 = c000000000a1e82c R16 = c000000007f0b678
R01 = c000000007f0b440 R17 = c000000007f0b660
R02 = c0000000014fdf00 R18 = 00000000000000694
R04 = c000001c86fad600 R20 = 0000000000000000
R05 = 00000000003417f0 R21 = 00000000001fffff
R06 = c000001d9701b2a0 R22 = c000001c7aae3d58
R07 = c000001d9701b000 R23 = ffffffffffffff92
R08 = 0000000000000280 R24 = 000000000001fe28
R09 = 000000000120c00a R25 = 00000000f8eb2d17
R10 = c00a000000000122 R26 = 0003126941e19a70
R11 = 0003126941e19a26 R27 = 0000000000000000
R12 = 0000000039059303 R28 = 0003126941e199ed
R13 = c000000007b90e00 R29 = c000001c86fad600
R14 = c000001c7aae3c00 R30 = c000001d9701b280
R15 = c000001c86fad600 R31 = 0000000000000000
pc = c000000000a1e6b4 skb_release_data+0xd4/0x1b0
cfar= c000000000a1e670 skb_release_data+0x90/0x1b0
lr = c000000000a1e82c __kfree_skb+0x2c/0x50
msr = 9000000000009033 cr = 39059305
ctr = c000000000a97200 xer = 00000000a0000000 trap = 600
dar = c00a000000000122 dsisr = 08000000
1e:mon> S
msr = 9000000000001033 sprg0 = 000000000000091c
pvr = 00000000004e0100 sprg1 = 0000000000000000
dec = ffff6e56be3f98df sprg2 = 0000000000000000
sp = c000000007f0ac00 sprg3 = 000000000000001e
toc = c0000000014fdf00 dar = c00a000000000122
srr0 = 0000000000091484 srr1 = 0000000000001033 dsisr = 08000000
dscr = 0000000000000000 ppr = 0000000000000000 pir = 0000005e
sdr1 = 0000000000000000 hdar = 0000000000000000 hdsisr = 00000000
hsrr0 = 00000000300050b0 hsrr1 = 0000000000001002 hdec = 7600c75f
lpcr = 0000000001d2f012 pcr = 0000000000000000 lpidr = 00000000
hsprg0 = 0000000007b90e00 hsprg1 = 0000000007b90e00
dabr = 0000000000000000 dabrx = 0000000000091484
dpdes = 0000000000000000 tir = 0000000000000002 cir = 00000000
fscr = 0000000000000180 tar = 0000000000000000 pspb = 00000000
mmcr0 = 0000000000000000 mmcr1 = 0000000000000000 mmcr2 = 0000000000000000
pmc1 = 00000000 pmc2 = 00000000 pmc3 = 00000000 pmc4 = 00000000
mmcra = 0000000000000000 siar = 0000000000000000 pmc5 = 80000001
sdar = 0000000000000000 sier = 0000000000000000 pmc6 = 8000000b
ebbhr = 0000000000000000 ebbrr = 0000000000000000 bescr = 0000000000000000
hfscr = 000000000000059f dhdes = 0000000000091484 rpr = 0000000000000000
dawr = 0000000000000000 dawrx = 000000000000fc00 ciabr = 0000000000000000
1e:mon>

== Comment: #7 - VIPIN K. PARASHAR <email address hidden> - 2017-08-14 12:50:36 ==

1e:mon> di %pc
c000000000a1e6b4 7d205028 lwarx r9,0,r10
c000000000a1e6b8 3129ffff addic r9,r9,-1
c000000000a1e6bc 7d20512d stwcx. r9,0,r10
c000000000a1e6c0 40c2fff4 bne- c000000000a1e6b4 # skb_release_data+0xd4/0x1b0
c000000000a1e6c4 7d2907b4 extsw r9,r9
c000000000a1e6c8 7c0004ac hwsync
c000000000a1e6cc 2fa90000 cmpdi cr7,r9,0
c000000000a1e6d0 409effb0 bne cr7,c000000000a1e680 # skb_release_data+0xa0/0x1b0
c000000000a1e6d4 4b88b495 bl c0000000002a9b68 # __put_page+0x8/0x80
c000000000a1e6d8 60000000 nop
c000000000a1e6dc 893e0000 lbz r9,0(r30)
c000000000a1e6e0 395f0001 addi r10,r31,1
c000000000a1e6e4 7d5f07b4 extsw r31,r10
c000000000a1e6e8 7f895000 cmpw cr7,r9,r10
c000000000a1e6ec 419dffa8 bgt cr7,c000000000a1e694 # skb_release_data+0xb4/0x1b0
c000000000a1e6f0 893e0001 lbz r9,1(r30)
1e:mon>

>> c000000000a1e6b4 7d205028 lwarx r9,0,r10 <---

R10 = c00a000000000122

R10 is being used for load but it doesn't contain word aligned address.
Thus Alignment interrupt is getting triggered.

== Comment: #11 - VIPIN K. PARASHAR <email address hidden> - 2017-08-14 13:12:54 ==
(In reply to comment #5)

> Before I did apt-get update and upgrade on this system, I was running with
> this level:
>
> 4.10.0-22-generic #24~16.04.1-Ubuntu SMP Mon May 22 22:11:01 UTC 2017
> ppc64le ppc64le ppc64le GNU/Linux
>
> With the previous level above, I did not see the problem.

Crash is being seen with kernel 4.11.0-12-generic in TCP/IP stack code
Its likely due to some code changes introduced after >= 4.10.0-22-generic.

Revision history for this message
bugproxy (bugproxy) wrote : xmon crash info

Default Comment by Bridge

tags: added: architecture-ppc64le bugnameltc-157626 severity-high targetmilestone-inin16043
Revision history for this message
bugproxy (bugproxy) wrote : Kernel log

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2017-08-14 14:42 EDT-------

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
assignee: nobody → Canonical Server Team (canonical-server)
importance: Undecided → High
Changed in ubuntu-power-systems:
assignee: Canonical Server Team (canonical-server) → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.13 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.13-rc4

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-da-key needs-bisect
Changed in linux (Ubuntu):
status: New → Incomplete
Manoj Iyer (manjo)
tags: added: triage-g
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-08-30 16:34 EDT-------
This was root caused to be a bug in the aacraid driver, which is used on all Boston systems. It is issue that can cause random memory corruption and is resolved with this patch:

https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git/commit/?h=4.13/scsi-fixes&id=1ae948fa4f00f3a2823e7cb19a3049ef27dd6947
("scsi: aacraid: Fix command send race condition")

Can we get a Ubuntu 16.04.3 test kernel built with this change so that we can load this on our test systems?

Changed in linux (Ubuntu):
status: Incomplete → Triaged
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Zesty test kernel with commit 1ae948fa4f00f3a2823e7cb19a3049ef27dd6947 from linux-next. The test kernel can be downloaded from:

http://kernel.ubuntu.com/~jsalisbury/lp1710690/

Can you test this kernel and see if it resolves this bug?

Frank Heimes (fheimes)
Changed in ubuntu-power-systems:
status: New → Triaged
Manoj Iyer (manjo)
Changed in linux (Ubuntu):
status: Triaged → Incomplete
Changed in ubuntu-power-systems:
status: Triaged → Incomplete
Manoj Iyer (manjo)
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2017-12-11 11:00 EDT-------
Closing/rejecting this bug.
No testing available for now.

Please continue to bug Breno in case of problems :- )

Revision history for this message
Manoj Iyer (manjo) wrote :

Closing the bug based on IBM's comments.

Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in ubuntu-power-systems:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.