Ubuntu
linux package

"Kernel unaligned access at TPC" causing network/system to become slow and/or unresponsive

Bug #569610 reported by Luke J Militello on 2010-04-25

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Incomplete	Medium	Unassigned

Bug Description

Binary package hint: linux-image-2.6.24-27-sparc64-smp

These few lines kept showing up in dmesg/logs over the past few weeks and then started to cause the system/network in question to become slow and unresponsive leading to a hard reboot to remedy.

Apr 7 16:01:48 Hal kernel: [1076377.260268] Kernel unaligned access at TPC[6a1504] tcp_transmit_skb+0x1ac/0x8c0
Apr 7 16:01:48 Hal kernel: [1076377.260330] Kernel unaligned access at TPC[6a150c] tcp_transmit_skb+0x1b4/0x8c0
Apr 7 16:01:48 Hal kernel: [1076377.260355] Kernel unaligned access at TPC[68e724] ip_queue_xmit+0x14c/0x5a0
Apr 7 16:01:48 Hal kernel: [1076377.260390] Kernel unaligned access at TPC[68e734] ip_queue_xmit+0x15c/0x5a0
Apr 7 16:01:48 Hal kernel: [1076377.260418] Kernel unaligned access at TPC[58e904] ip_fast_csum+0xc/0x80
Apr 7 16:32:21 Hal kernel: [1078210.103332] Kernel unaligned access at TPC[6a1504] tcp_transmit_skb+0x1ac/0x8c0
Apr 7 16:32:21 Hal kernel: [1078210.103406] Kernel unaligned access at TPC[6a150c] tcp_transmit_skb+0x1b4/0x8c0
Apr 7 16:32:21 Hal kernel: [1078210.103437] Kernel unaligned access at TPC[68e724] ip_queue_xmit+0x14c/0x5a0
Apr 7 16:32:21 Hal kernel: [1078210.103474] Kernel unaligned access at TPC[68e734] ip_queue_xmit+0x15c/0x5a0
Apr 7 16:32:21 Hal kernel: [1078210.103501] Kernel unaligned access at TPC[58e904] ip_fast_csum+0xc/0x80

I'm not sure if this is kernel related or the fact that I am running ifenslave with interface bonding in fault-tolerant mode.

My system is a Sun Enterprise 420R.
I'm running Ubuntu 8.04.4 w/4GB of RAM.

CPU Info...

cpu : TI UltraSparc II (BlackBird)
fpu : UltraSparc II integrated FPU
prom : OBP 3.23.0 1999/06/30 13:53
type : sun4u
ncpus probed : 4
ncpus active : 4
D$ parity tl1 : 0
I$ parity tl1 : 0
Cpu0ClkTck : 000000001ad35932
Cpu1ClkTck : 000000001ad35932
Cpu2ClkTck : 000000001ad35932
Cpu3ClkTck : 000000001ad35932
MMU Type : Spitfire
State:
CPU0: online
CPU1: online
CPU2: online
CPU3: online

Network card info via dmesg...

[ 139.086540] PCI: Enabling device: (0000:02:00.0), cmd 2
[ 139.092436] eth1: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[15] 00:03:ba:85:5b:01
[ 139.266151] PCI: Enabling device: (0000:02:01.0), cmd 2
[ 139.272054] eth2: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[16] 00:03:ba:85:5b:02
[ 139.446977] PCI: Enabling device: (0000:03:02.0), cmd 2
[ 139.452954] eth3: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[17] 00:03:ba:85:5b:03
[ 139.627883] PCI: Enabling device: (0000:03:03.0), cmd 2
[ 139.633963] eth4: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[18] 00:03:ba:85:5b:04
[ 139.807491] PCI: Enabling device: (0000:05:00.0), cmd 2
[ 139.813690] eth5: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[19] 00:03:ba:85:20:e5
[ 139.990262] PCI: Enabling device: (0000:05:01.0), cmd 2
[ 139.996483] eth6: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[20] 00:03:ba:85:20:e6
[ 140.170039] PCI: Enabling device: (0000:06:02.0), cmd 2
[ 140.176496] eth7: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[21] 00:03:ba:85:20:e7
[ 140.350911] PCI: Enabling device: (0000:06:03.0), cmd 2
[ 140.357394] eth8: Sun Cassini+ (64bit/33MHz PCI/Cu) Ethernet[22] 00:03:ba:85:20:e8

Bonding info via dmesg...

[ 143.262268] Ethernet Channel Bonding Driver: v3.2.3 (December 6, 2007)
[ 143.339438] bonding: MII link monitoring set to 100 ms

Bonding options set in '/etc/network/interfaces'...

post-up ifenslave bond0 eth1 eth5
pre-down ifenslave -d bond0 eth1 eth5

Bonding options loaded via '/etc/modules'...

bonding mode=active-backup miimon=100 max_bonds=4

Tags:

Dave Gilbert (ubuntu-treblig) on 2010-04-25

tags:

added: sparc

Revision history for this message

Jeremy Foshee (jeremyfoshee) wrote on 2010-04-28:

Hi Luke,

Please be sure to confirm this issue exists with the latest development release of Ubuntu. ISO CD images are available from http://cdimage.ubuntu.com/releases/ . If the issue remains, please run the following command from a Terminal (Applications->Accessories->Terminal). It will automatically gather and attach updated debug information to this report.

apport-collect -p linux 569610

Also, if you could test the latest upstream kernel available that would be great. It will allow additional upstream developers to examine the issue. Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Once you've tested the upstream kernel, please remove the 'needs-upstream-testing' tag. This can be done by clicking on the yellow pencil icon next to the tag located at the bottom of the bug description and deleting the 'needs-upstream-testing' text. Please let us know your results.

Thanks in advance.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags:	added: needs-kernel-logs
tags:	added: needs-upstream-testing
tags:	added: kj-triage
Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Luke J Militello (kilahurtz) wrote on 2010-04-28:

Thanks for the quick response, Jeremy. Unfortunately, I cannot test the latest development release image as this is a production system. However, I can test the latest upstream kernel for you. I might also add the current kernel I am running is 2.6.24-27.69. I poked around the kernel upstream site and was wondering which kernel you would like me to try.

I'm guessing ... 2.6.34-999.201004261005 (daily/current)

As I don't see any recent build of mainline 2.6.24.6 (2.6.24-27.69).

Either way, it looks as I may have to compile it as there is no "sparc64-smp" deb package.

Also, I have no real way to replicate this issue as it just starts appearing all of a sudden. I only caught it when my system became sluggish after a while. I'll have to watch the logs to catch it in action. When I do, shall I run the command you posted above?

Thanks again.

Revision history for this message

Luke J Militello (kilahurtz) wrote on 2010-05-01:

Download full text (4.2 KiB)

I did some searching and found these two entries in some kernel change logs. I have no idea if it helps any but I figured it couldn't hurt to ask...

~ ChangeLog-2.6.28 ~

11696 commit 33cf71cee14743185305c61625c4544885055733
11697 Author: Petr Tesarik <email address hidden>
11698 Date: Fri Nov 21 16:42:58 2008 -0800
11699
11700 tcp: Do not use TSO/GSO when there is urgent data
11701
11702 This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
11703
11704 Since most (if not all) implementations of TSO and even the in-kernel
11705 software GSO do not update the urgent pointer when splitting a large
11706 segment, it is necessary to turn off TSO/GSO for all outgoing traffic
11707 with the URG pointer set.
11708
11709 Looking at tcp_current_mss (and the preceding comment) I even think
11710 this was the original intention. However, this approach is insufficient,
11711 because TSO/GSO is turned off only for newly created frames, not for
11712 frames which were already pending at the arrival of a message with
11713 MSG_OOB set. These frames were created when TSO/GSO was enabled,
11714 so they may be large, and they will have the urgent pointer set
11715 in tcp_transmit_skb().
11716
11717 With this patch, such large packets will be fragmented again before
11718 going to the transmit routine.
11719
11720 As a side note, at least the following NICs are known to screw up
11721 the urgent pointer in the TCP header when doing TSO:
11722
11723 Intel 82566MM (PCI ID 8086:1049)
11724 Intel 82566DC (PCI ID 8086:104b)
11725 Intel 82541GI (PCI ID 8086:1076)
11726 Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
11727
11728 Signed-off-by: Petr Tesarik <email address hidden>
11729 Signed-off-by: David S. Miller <email address hidden>

~ ChangeLog-2.6.29 ~

168533 commit ef711cf1d156428d4c2911b8c86c6ce90519dc45
168534 Author: Eric Dumazet <email address hidden>
168535 Date: Fri Nov 14 00:53:54 2008 -0800
168536
168537 net: speedup dst_release()
168538
168539 During tbench/oprofile sessions, I found that dst_release() was in third position.
168540
168541 CPU: Core 2, speed 2999.68 MHz (estimated)
168542 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
168543 samples % symbol name
168544 483726 9.0185 __copy_user_zeroing_intel
168545 191466 3.5697 __copy_user_intel
168546 185475 3.4580 dst_release
168547 175114 3.2648 ip_queue_xmit
168548 153447 2.8608 tcp_sendmsg
168549 108775 2.0280 tcp_recvmsg
168550 102659 1.9140 sysenter_past_esp
168551 101450 1.8914 tcp_current_mss
168552 95067 1.7724 __copy_from_user_ll
168553 86531 1.6133 tcp_transmit_skb
168554
168555 Of course, all CPUS fight on the dst_entry associated with 127.0.0.1
168556
168557 Instead of first checking the refcount value, then decrement it,
168558 we use atomic_dec_return() to help CPU to make the right memory tran...

I did some searching and found these two entries in some kernel change logs.  I have no idea if it helps any but I figured it couldn't hurt to ask...

~ ChangeLog-2.6.28 ~

11696	commit 33cf71cee14743185305c61625c4544885055733
 11697	Author: Petr Tesarik <ptesarik@suse.cz>
 11698	Date:   Fri Nov 21 16:42:58 2008 -0800
 11699	
 11700	    tcp: Do not use TSO/GSO when there is urgent data
 11701	    
 11702	    This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
 11703	    
 11704	    Since most (if not all) implementations of TSO and even the in-kernel
 11705	    software GSO do not update the urgent pointer when splitting a large
 11706	    segment, it is necessary to turn off TSO/GSO for all outgoing traffic
 11707	    with the URG pointer set.
 11708	    
 11709	    Looking at tcp_current_mss (and the preceding comment) I even think
 11710	    this was the original intention. However, this approach is insufficient,
 11711	    because TSO/GSO is turned off only for newly created frames, not for
 11712	    frames which were already pending at the arrival of a message with
 11713	    MSG_OOB set. These frames were created when TSO/GSO was enabled,
 11714	    so they may be large, and they will have the urgent pointer set
 11715	    in tcp_transmit_skb().
 11716	    
 11717	    With this patch, such large packets will be fragmented again before
 11718	    going to the transmit routine.
 11719	    
 11720	    As a side note, at least the following NICs are known to screw up
 11721	    the urgent pointer in the TCP header when doing TSO:
 11722	    
 11723	    	Intel 82566MM (PCI ID 8086:1049)
 11724	    	Intel 82566DC (PCI ID 8086:104b)
 11725	    	Intel 82541GI (PCI ID 8086:1076)
 11726	    	Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
 11727	    
 11728	    Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
 11729	    Signed-off-by: David S. Miller <davem@davemloft.net>

~ ChangeLog-2.6.29 ~

168533	commit ef711cf1d156428d4c2911b8c86c6ce90519dc45
168534	Author: Eric Dumazet <dada1@cosmosbay.com>
168535	Date:   Fri Nov 14 00:53:54 2008 -0800
168536	
168537	    net: speedup dst_release()
168538	    
168539	    During tbench/oprofile sessions, I found that dst_release() was in third position.
168540	    
168541	    CPU: Core 2, speed 2999.68 MHz (estimated)
168542	    Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
168543	    samples  %        symbol name
168544	    483726    9.0185  __copy_user_zeroing_intel
168545	    191466    3.5697  __copy_user_intel
168546	    185475    3.4580  dst_release
168547	    175114    3.2648  ip_queue_xmit
168548	    153447    2.8608  tcp_sendmsg
168549	    108775    2.0280  tcp_recvmsg
168550	    102659    1.9140  sysenter_past_esp
168551	    101450    1.8914  tcp_current_mss
168552	    95067     1.7724  __copy_from_user_ll
168553	    86531     1.6133  tcp_transmit_skb
168554	    
168555	    Of course, all CPUS fight on the dst_entry associated with 127.0.0.1
168556	    
168557	    Instead of first checking the refcount value, then decrement it,
168558	    we use atomic_dec_return() to help CPU to make the right memory transaction
168559	    (ie getting the cache line in exclusive mode)
168560	    
168561	    dst_release() is now at the fifth position, and tbench a litle bit faster ;)
168562	    
168563	    CPU: Core 2, speed 3000.1 MHz (estimated)
168564	    Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
168565	    samples  %        symbol name
168566	    647107    8.8072  __copy_user_zeroing_intel
168567	    258840    3.5229  ip_queue_xmit
168568	    258302    3.5155  __copy_user_intel
168569	    209629    2.8531  tcp_sendmsg
168570	    165632    2.2543  dst_release
168571	    149232    2.0311  tcp_current_mss
168572	    147821    2.0119  tcp_recvmsg
168573	    137893    1.8767  sysenter_past_esp
168574	    127473    1.7349  __copy_from_user_ll
168575	    121308    1.6510  ip_finish_output
168576	    118510    1.6129  tcp_transmit_skb
168577	    109295    1.4875  tcp_v4_rcv
168578	    
168579	    Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
168580	    Signed-off-by: David S. Miller <davem@davemloft.net>

Revision history for this message

Luke J Militello (kilahurtz) wrote on 2010-05-01:

Kernel Change Log Entries Edit (4.0 KiB, text/plain)

Changed in linux (Ubuntu):
status:	Incomplete → In Progress

Revision history for this message

Luke J Militello (kilahurtz) wrote on 2010-05-01:

Also, marking this "In Progress" as I do not want LP Janitor to wipe it out. I will need a bit of guidance on how to proceed seeing as I can not simply install a deb package for the upstream kernel as I am using a Sparc port. If someone could tell me which upstream version to try and where to find the proper "Ubuntu" source with the Ubuntu kernel config files, I have no problem compiling it and testing it out for you.

Revision history for this message

Luke J Militello (kilahurtz) wrote on 2010-05-01:

Oh and this is also a headless server install / no GUI.

As for the Apport command, how shall I go about running that so I don't have to babysit the log files?

I read Apport is not enabled on stable releases by default. I've never used it before either.

Revision history for this message

Jeremy Foshee (jeremyfoshee) wrote on 2010-05-01:

Hi Luke,
The In Progress status is for use when a bug is assigned to a specific Kernel Team member and they are working the issue.

I've set the bug to triaged. Thank you for the further data.

Thanks!

~JFo

Changed in linux (Ubuntu):
importance:	Undecided → Medium
status:	In Progress → Triaged

Revision history for this message

Luke J Militello (kilahurtz) wrote on 2010-05-04:

Jeremy, is there anything you would like me to do in the mean time? Like test a different kernel or similar?

Revision history for this message

Jeremy Foshee (jeremyfoshee) wrote on 2010-05-04:

Luke,
Nothing that I can think of currently. The Kernel Team may have some requests of you once they've had the opportunity to investigate this bug further.

Thanks!

~JFo

Revision history for this message

Luke J Militello (kilahurtz) wrote on 2010-05-24:

#10

Was cruising along for 10 days since my last reboot and it showed up again, however, only once and the system is still stable.

[100375.580397] Kernel unaligned access at TPC[6a1504] tcp_transmit_skb+0x1ac/0x8c0
[100375.580458] Kernel unaligned access at TPC[6a150c] tcp_transmit_skb+0x1b4/0x8c0
[100375.580483] Kernel unaligned access at TPC[68e724] ip_queue_xmit+0x14c/0x5a0
[100375.580514] Kernel unaligned access at TPC[68e734] ip_queue_xmit+0x15c/0x5a0
[100375.580536] Kernel unaligned access at TPC[58e904] ip_fast_csum+0xc/0x80

Revision history for this message

penalvch (penalvch) wrote on 2013-09-30:

#11

Luke J Militello, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily kernel folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.12-rc2

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.