Ubuntu16.04.3: System running network stress crashes with Alignment exception
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
The Ubuntu-power-systems project |
Invalid
|
High
|
Canonical Kernel Team | ||
linux (Ubuntu) |
Invalid
|
High
|
Canonical Kernel Team |
Bug Description
==== State: Open by: nguyenp on 11 August 2017 11:03:32 ====
Contact:
=======
Paul Nguyen
<email address hidden>
BMC:
====
bos1u1
Firmware Revision : 00.25
Firmware Build Time : 20170807 BMC MAC address : 0c:c4:7a:f4:4d:60
PNOR Build Time : 20170729
CPLD Version : B2.91.00
Ubuntu 16.04.3:
===========
bos1u1p1
ver 1.5.4.5 - OS, HTX, Firmware and Machine details
Machine Serial No: C819UAF32B00002
Machine Type/Model: 9006-12C
root@bos1u1p1:~# dpkg -l |grep mlx
ii libmlx4-1 41mlnx1-
InfiniBand HCAs
ii libmlx4-1-dbg 41mlnx1-
ii libmlx4-dev 41mlnx1-
ii libmlx5-1 41mlnx1-
InfiniBand HCAs
ii libmlx5-1-dbg 41mlnx1-
ii libmlx5-dev 41mlnx1-
root@bos1u1p1:~# lsscsi
[0:2:0:0] disk SEAGATE ST4000NM0034 E005 /dev/sda
[0:3:123:0] enclosu ADAPTEC Smart Adapter 2.99 -
root@bos1u1p1:~# lspci
0000:00:00.0 PCI bridge: IBM Device 04c1
0001:00:00.0 PCI bridge: IBM Device 04c1
0002:00:00.0 PCI bridge: IBM Device 04c1
0002:01:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0003:00:00.0 PCI bridge: IBM Device 04c1
0003:01:00.0 Serial Attached SCSI controller: Adaptec Series 8 12G SAS/PCIe 3 (rev 01)
0004:00:00.0 PCI bridge: IBM Device 04c1
0004:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0004:02:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0005:00:00.0 PCI bridge: IBM Device 04c1
0005:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0030:00:00.0 PCI bridge: IBM Device 04c1
0030:01:00.0 Infiniband controller: Mellanox Technologies Device 1019
0030:01:00.1 Infiniband controller: Mellanox Technologies Device 1019
0031:00:00.0 PCI bridge: IBM Device 04c1
0031:01:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
0031:01:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
0032:00:00.0 PCI bridge: IBM Device 04c1
0033:00:00.0 PCI bridge: IBM Device 04c1
0033:01:00.0 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
0033:01:00.1 Ethernet controller: Intel Corporation 82575EB Gigabit Network Connection (rev 02)
root@bos1u1p1:~# ifconfig -a
enP2p1s0f0 Link encap:Ethernet HWaddr ac:1f:6b:09:c0:9e
inet addr:9.3.20.217 Bcast:9.3.21.255 Mask:255.255.254.0
inet6 addr: fe80::ae1f:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:62603 errors:0 dropped:0 overruns:0 frame:0
TX packets:105 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:4784741 (4.7 MB) TX bytes:14043 (14.0 KB)
enP2p1s0f1 Link encap:Ethernet HWaddr ac:1f:6b:09:c0:9f
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)
enP2p1s0f2 Link encap:Ethernet HWaddr ac:1f:6b:09:c0:a0
inet addr:108.1.1.217 Bcast:108.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::ae1f:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:127350969 errors:0 dropped:65 overruns:0 frame:0
TX packets:124182822 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:186712761859 (186.7 GB) TX bytes:181731375235 (181.7 GB)
enP2p1s0f3 Link encap:Ethernet HWaddr ac:1f:6b:09:c0:a1
inet addr:108.1.2.217 Bcast:108.1.2.255 Mask:255.255.255.0
inet6 addr: fe80::ae1f:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:124182726 errors:0 dropped:0 overruns:0 frame:0
TX packets:127351289 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:181731250053 (181.7 GB) TX bytes:186713217880 (186.7 GB)
enP49p1s0f0 Link encap:Ethernet HWaddr 0c:c4:7a:eb:17:ea
inet addr:104.1.1.217 Bcast:104.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::ec4:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:126415178 errors:0 dropped:0 overruns:0 frame:0
TX packets:124809946 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:185222061670 (185.2 GB) TX bytes:183034803740 (183.0 GB)
enP49p1s0f1 Link encap:Ethernet HWaddr 0c:c4:7a:eb:17:eb
inet addr:104.1.2.217 Bcast:104.1.2.255 Mask:255.255.255.0
inet6 addr: fe80::ec4:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:124809938 errors:0 dropped:0 overruns:0 frame:0
TX packets:126415188 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:183034803062 (183.0 GB) TX bytes:185222062528 (185.2 GB)
enP51p1s0f0 Link encap:Ethernet HWaddr 0c:c4:7a:b4:28:6c
inet addr:105.1.1.217 Bcast:105.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::ec4:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8067390 errors:0 dropped:0 overruns:0 frame:0
TX packets:10234208 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:8355184401 (8.3 GB) TX bytes:13586437864 (13.5 GB)
enP51p1s0f1 Link encap:Ethernet HWaddr 0c:c4:7a:b4:28:6d
inet addr:102.1.2.217 Bcast:102.1.2.255 Mask:255.255.255.0
inet6 addr: fe80::ec4:
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:10234110 errors:0 dropped:0 overruns:0 frame:0
TX packets:8067404 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:13586422574 (13.5 GB) TX bytes:8355185459 (8.3 GB)
ib0 Link encap:UNSPEC HWaddr 00-00-00-
inet addr:103.1.1.217 Bcast:103.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::268a:
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:24 errors:0 dropped:0 overruns:0 frame:0
TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:2400 (2.4 KB) TX bytes:1160 (1.1 KB)
ib1 Link encap:UNSPEC HWaddr 00-00-08-
inet addr:103.1.2.217 Bcast:103.1.2.255 Mask:255.255.255.0
inet6 addr: fe80::268a:
UP BROADCAST RUNNING MULTICAST MTU:2044 Metric:1
RX packets:24 errors:0 dropped:0 overruns:0 frame:0
TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:2400 (2.4 KB) TX bytes:1220 (1.2 KB)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:2173 errors:0 dropped:0 overruns:0 frame:0
TX packets:2173 errors:0 dropped:0 overruns:0 carrier:0
RX bytes:349296 (349.2 KB) TX bytes:349296 (349.2 KB)
root@bos1u1p1:~#
HTX Device Status Summary Current Time: 222 08/10/17 15:35:12
Cycle Count(Min/
Page Number(Cur/Max)=1/2
-------
Last Update Cycle Curr. | Last Update Cycle Curr.
ST Device Day Time Count Stanza | ST Device Day Time Count Stanza
RN cache0 222 15:34:50 15 6 | RN cpu15 222 15:35:03 5 4
RN cache1 222 15:35:03 51 5 | RN enP2p1s0f222 15:35:11 183 1
RN cpu0 222 15:34:47 2 4 | RN enP2p1s0f222 15:35:11 173 1
RN cpu1 222 15:33:59 2 13 | RN enP49p1s0222 15:35:11 181 1
RN cpu2 222 15:34:05 2 8 | RN enP49p1s0222 15:35:10 177 1
RN cpu3 222 15:32:59 2 13 | RN enP51p1s0222 15:34:45 4 1
RN cpu4 222 15:34:44 2 10 | RN enP51p1s0222 15:34:46 10 1
RN cpu5 222 15:34:33 2 13 | RN fpu0 222 15:35:06 3 25
RN cpu6 222 15:35:01 2 6 | RN fpu1 222 15:34:58 4 5
RN cpu7 222 15:32:42 2 13 | RN fpu2 222 15:35:10 3 30
RN cpu8 222 15:35:08 3 10 | RN fpu3 222 15:34:59 4 19
RN cpu9 222 15:35:02 3 13 | RN fpu4 222 15:34:57 3 35
RN cpu10 222 15:35:03 5 2 | RN fpu5 222 15:35:07 4 11
RN cpu11 222 15:34:47 5 1 | RN fpu6 222 15:35:05 3 28
RN cpu12 222 15:34:53 5 2 | RN fpu7 222 15:35:09 4 19
RN cpu13 222 15:35:09 5 2 | RN fpu8 222 15:35:04 6 5
RN cpu14 222 15:35:10 5 3 | RN fpu9 222 15:35:09 6 17
RN fpu10 222 15:35:28 9 2 | RN sctu10 222 15:35:24 55 1
RN fpu11 222 15:35:29 8 41 | RN sctu11 222 15:35:28 62 4
RN fpu12 222 15:35:25 8 43 | RN sctu13 222 15:35:28 66 3
RN fpu13 222 15:35:26 8 43 | RN sctu14 222 15:35:31 59 2
RN fpu14 222 15:35:29 9 7 | RN sctu15 222 15:35:28 71 2
RN fpu15 222 15:35:28 9 10 | RN tlbie 222 15:34:48 26 2
RN mem 222 15:35:10 1 4 |
RN mlx5_0 222 15:35:22 6 1 |
RN mlx5_1 222 15:35:25 6 1 |
RN rng 222 15:35:21 196 1 |
RN sctu1 222 15:35:20 64 2 |
RN sctu2 222 15:35:25 50 4 |
RN sctu3 222 15:35:27 67 4 |
RN sctu5 222 15:35:29 70 3 |
RN sctu6 222 15:35:22 57 4 |
RN sctu7 222 15:35:23 68 4 |
RN sctu9 222 15:35:20 62 1 |
Problem Description:
=======
On my Boston LC system, I running with firmware BMC 0.25 and PNOR 0807. System is running with Ubuntu16.04.3
- The system was running network stress then it crashed with an alligment exception. The system is currently in xmon debugger and is available for developer to look.
[ 1105.304668] Unable to handle kernel paging request for unaligned access at address 0xc00a000000000122
[ 1105.304850] Faulting instruction address: 0xc000000000a1e6b4
1e:mon>
1e:mon> e
cpu 0x1e: Vector: 600 (Alignment) at [c000000007f0b1c0]
pc: c000000000a1e6b4: skb_release_
lr: c000000000a1e82c: __kfree_
sp: c000000007f0b440
msr: 9000000000009033
dar: c00a000000000122
dsisr: 8000000
current = 0xc000001a93574400
paca = 0xc000000007b90e00 softe: 0 irq_happened: 0x01on 4.11.0-12-generic (buildd@
1e:mon> t
[c000000007f0b480] c000000000a1e82c __kfree_
[c000000007f0b4b0] c000000000abd77c tcp_clean_
[c000000007f0b5d0] c000000000ac0f68 tcp_ack+0x5a8/0xa30
[c000000007f0b710] c000000000ac3004 tcp_rcv_
[c000000007f0b790] c000000000ad05d4 tcp_v4_
[c000000007f0b7d0] c000000000ad4168 tcp_v4_
[c000000007f0b8c0] c000000000a9e7f0 ip_local_
[c000000007f0b910] c000000000a9f120 ip_local_
[c000000007f0b970] c000000000a9ec28 ip_rcv_
[c000000007f0ba00] c000000000a9f550 ip_rcv+0x360/0x470
[c000000007f0ba70] c000000000a38e5c __netif_
[c000000007f0bb30] c000000000a3cbc8 netif_receive_
[c000000007f0bb70] c000000000a3db6c napi_gro_
[c000000007f0bbb0] c008000010430c1c i40e_clean_
[c000000007f0bca0] c00800001043134c i40e_napi_
[c000000007f0bd50] c000000000a3d47c net_rx_
[c000000007f0be50] c000000000bcdc9c __do_softirq+
[c000000007f0bf40] c0000000000f3748 irq_exit+0xe8/0x120
[c000000007f0bf60] c000000000016b00 __do_irq+0x90/0x1d0
[c000000007f0bf90] c00000000002a5d0 call_do_
[c000001a935dfde0] c000000000016ce0 do_IRQ+0xa0/0x150
[c000001a935dfe30] c000000000009b94 h_virt_
--- Exception: ea1 at 00003fff920d8ce4
SP (3fff9184e5c0) is in userspace
1e:mon> r
R00 = c000000000a1e82c R16 = c000000007f0b678
R01 = c000000007f0b440 R17 = c000000007f0b660
R02 = c0000000014fdf00 R18 = 00000000000000694
R04 = c000001c86fad600 R20 = 0000000000000000
R05 = 00000000003417f0 R21 = 00000000001fffff
R06 = c000001d9701b2a0 R22 = c000001c7aae3d58
R07 = c000001d9701b000 R23 = ffffffffffffff92
R08 = 0000000000000280 R24 = 000000000001fe28
R09 = 000000000120c00a R25 = 00000000f8eb2d17
R10 = c00a000000000122 R26 = 0003126941e19a70
R11 = 0003126941e19a26 R27 = 0000000000000000
R12 = 0000000039059303 R28 = 0003126941e199ed
R13 = c000000007b90e00 R29 = c000001c86fad600
R14 = c000001c7aae3c00 R30 = c000001d9701b280
R15 = c000001c86fad600 R31 = 0000000000000000
pc = c000000000a1e6b4 skb_release_
cfar= c000000000a1e670 skb_release_
lr = c000000000a1e82c __kfree_
msr = 9000000000009033 cr = 39059305
ctr = c000000000a97200 xer = 00000000a0000000 trap = 600
dar = c00a000000000122 dsisr = 08000000
1e:mon> S
msr = 9000000000001033 sprg0 = 000000000000091c
pvr = 00000000004e0100 sprg1 = 0000000000000000
dec = ffff6e56be3f98df sprg2 = 0000000000000000
sp = c000000007f0ac00 sprg3 = 000000000000001e
toc = c0000000014fdf00 dar = c00a000000000122
srr0 = 0000000000091484 srr1 = 0000000000001033 dsisr = 08000000
dscr = 0000000000000000 ppr = 0000000000000000 pir = 0000005e
sdr1 = 0000000000000000 hdar = 0000000000000000 hdsisr = 00000000
hsrr0 = 00000000300050b0 hsrr1 = 0000000000001002 hdec = 7600c75f
lpcr = 0000000001d2f012 pcr = 0000000000000000 lpidr = 00000000
hsprg0 = 0000000007b90e00 hsprg1 = 0000000007b90e00
dabr = 0000000000000000 dabrx = 0000000000091484
dpdes = 0000000000000000 tir = 0000000000000002 cir = 00000000
fscr = 0000000000000180 tar = 0000000000000000 pspb = 00000000
mmcr0 = 0000000000000000 mmcr1 = 0000000000000000 mmcr2 = 0000000000000000
pmc1 = 00000000 pmc2 = 00000000 pmc3 = 00000000 pmc4 = 00000000
mmcra = 0000000000000000 siar = 0000000000000000 pmc5 = 80000001
sdar = 0000000000000000 sier = 0000000000000000 pmc6 = 8000000b
ebbhr = 0000000000000000 ebbrr = 0000000000000000 bescr = 0000000000000000
hfscr = 000000000000059f dhdes = 0000000000091484 rpr = 0000000000000000
dawr = 0000000000000000 dawrx = 000000000000fc00 ciabr = 0000000000000000
1e:mon>
== Comment: #7 - VIPIN K. PARASHAR <email address hidden> - 2017-08-14 12:50:36 ==
1e:mon> di %pc
c000000000a1e6b4 7d205028 lwarx r9,0,r10
c000000000a1e6b8 3129ffff addic r9,r9,-1
c000000000a1e6bc 7d20512d stwcx. r9,0,r10
c000000000a1e6c0 40c2fff4 bne- c000000000a1e6b4 # skb_release_
c000000000a1e6c4 7d2907b4 extsw r9,r9
c000000000a1e6c8 7c0004ac hwsync
c000000000a1e6cc 2fa90000 cmpdi cr7,r9,0
c000000000a1e6d0 409effb0 bne cr7,c000000000a
c000000000a1e6d4 4b88b495 bl c0000000002a9b68 # __put_page+0x8/0x80
c000000000a1e6d8 60000000 nop
c000000000a1e6dc 893e0000 lbz r9,0(r30)
c000000000a1e6e0 395f0001 addi r10,r31,1
c000000000a1e6e4 7d5f07b4 extsw r31,r10
c000000000a1e6e8 7f895000 cmpw cr7,r9,r10
c000000000a1e6ec 419dffa8 bgt cr7,c000000000a
c000000000a1e6f0 893e0001 lbz r9,1(r30)
1e:mon>
>> c000000000a1e6b4 7d205028 lwarx r9,0,r10 <---
R10 = c00a000000000122
R10 is being used for load but it doesn't contain word aligned address.
Thus Alignment interrupt is getting triggered.
== Comment: #11 - VIPIN K. PARASHAR <email address hidden> - 2017-08-14 13:12:54 ==
(In reply to comment #5)
> Before I did apt-get update and upgrade on this system, I was running with
> this level:
>
> 4.10.0-22-generic #24~16.04.1-Ubuntu SMP Mon May 22 22:11:01 UTC 2017
> ppc64le ppc64le ppc64le GNU/Linux
>
> With the previous level above, I did not see the problem.
Crash is being seen with kernel 4.11.0-12-generic in TCP/IP stack code
Its likely due to some code changes introduced after >= 4.10.0-22-generic.
Changed in ubuntu-power-systems: | |
assignee: | nobody → Canonical Server Team (canonical-server) |
importance: | Undecided → High |
Changed in ubuntu-power-systems: | |
assignee: | Canonical Server Team (canonical-server) → Canonical Kernel Team (canonical-kernel-team) |
tags: | added: triage-g |
Changed in linux (Ubuntu): | |
status: | Incomplete → Triaged |
Changed in ubuntu-power-systems: | |
status: | New → Triaged |
Changed in linux (Ubuntu): | |
status: | Triaged → Incomplete |
Changed in ubuntu-power-systems: | |
status: | Triaged → Incomplete |
Changed in linux (Ubuntu): | |
assignee: | Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team) |
Default Comment by Bridge