I did some searching and found these two entries in some kernel change logs. I have no idea if it helps any but I figured it couldn't hurt to ask...
~ ChangeLog-2.6.28 ~
11696 commit 33cf71cee14743185305c61625c4544885055733
11697 Author: Petr Tesarik <email address hidden>
11698 Date: Fri Nov 21 16:42:58 2008 -0800
11699
11700 tcp: Do not use TSO/GSO when there is urgent data
11701
11702 This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
11703
11704 Since most (if not all) implementations of TSO and even the in-kernel
11705 software GSO do not update the urgent pointer when splitting a large
11706 segment, it is necessary to turn off TSO/GSO for all outgoing traffic
11707 with the URG pointer set.
11708
11709 Looking at tcp_current_mss (and the preceding comment) I even think
11710 this was the original intention. However, this approach is insufficient,
11711 because TSO/GSO is turned off only for newly created frames, not for
11712 frames which were already pending at the arrival of a message with
11713 MSG_OOB set. These frames were created when TSO/GSO was enabled,
11714 so they may be large, and they will have the urgent pointer set
11715 in tcp_transmit_skb().
11716
11717 With this patch, such large packets will be fragmented again before
11718 going to the transmit routine.
11719
11720 As a side note, at least the following NICs are known to screw up
11721 the urgent pointer in the TCP header when doing TSO:
11722
11723 Intel 82566MM (PCI ID 8086:1049)
11724 Intel 82566DC (PCI ID 8086:104b)
11725 Intel 82541GI (PCI ID 8086:1076)
11726 Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
11727
11728 Signed-off-by: Petr Tesarik <email address hidden>
11729 Signed-off-by: David S. Miller <email address hidden>
~ ChangeLog-2.6.29 ~
168533 commit ef711cf1d156428d4c2911b8c86c6ce90519dc45
168534 Author: Eric Dumazet <email address hidden>
168535 Date: Fri Nov 14 00:53:54 2008 -0800
168536
168537 net: speedup dst_release()
168538
168539 During tbench/oprofile sessions, I found that dst_release() was in third position.
168540
168541 CPU: Core 2, speed 2999.68 MHz (estimated)
168542 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
168543 samples % symbol name
168544 483726 9.0185 __copy_user_zeroing_intel
168545 191466 3.5697 __copy_user_intel
168546 185475 3.4580 dst_release
168547 175114 3.2648 ip_queue_xmit
168548 153447 2.8608 tcp_sendmsg
168549 108775 2.0280 tcp_recvmsg
168550 102659 1.9140 sysenter_past_esp
168551 101450 1.8914 tcp_current_mss
168552 95067 1.7724 __copy_from_user_ll
168553 86531 1.6133 tcp_transmit_skb
168554
168555 Of course, all CPUS fight on the dst_entry associated with 127.0.0.1
168556
168557 Instead of first checking the refcount value, then decrement it,
168558 we use atomic_dec_return() to help CPU to make the right memory transaction
168559 (ie getting the cache line in exclusive mode)
168560
168561 dst_release() is now at the fifth position, and tbench a litle bit faster ;)
168562
168563 CPU: Core 2, speed 3000.1 MHz (estimated)
168564 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
168565 samples % symbol name
168566 647107 8.8072 __copy_user_zeroing_intel
168567 258840 3.5229 ip_queue_xmit
168568 258302 3.5155 __copy_user_intel
168569 209629 2.8531 tcp_sendmsg
168570 165632 2.2543 dst_release
168571 149232 2.0311 tcp_current_mss
168572 147821 2.0119 tcp_recvmsg
168573 137893 1.8767 sysenter_past_esp
168574 127473 1.7349 __copy_from_user_ll
168575 121308 1.6510 ip_finish_output
168576 118510 1.6129 tcp_transmit_skb
168577 109295 1.4875 tcp_v4_rcv
168578
168579 Signed-off-by: Eric Dumazet <email address hidden>
168580 Signed-off-by: David S. Miller <email address hidden>
I did some searching and found these two entries in some kernel change logs. I have no idea if it helps any but I figured it couldn't hurt to ask...
~ ChangeLog-2.6.28 ~
11696 commit 33cf71cee147431 85305c61625c454 4885055733 bugzilla. kernel. org/show_ bug.cgi? id=12014
11697 Author: Petr Tesarik <email address hidden>
11698 Date: Fri Nov 21 16:42:58 2008 -0800
11699
11700 tcp: Do not use TSO/GSO when there is urgent data
11701
11702 This patch fixes http://
11703
11704 Since most (if not all) implementations of TSO and even the in-kernel
11705 software GSO do not update the urgent pointer when splitting a large
11706 segment, it is necessary to turn off TSO/GSO for all outgoing traffic
11707 with the URG pointer set.
11708
11709 Looking at tcp_current_mss (and the preceding comment) I even think
11710 this was the original intention. However, this approach is insufficient,
11711 because TSO/GSO is turned off only for newly created frames, not for
11712 frames which were already pending at the arrival of a message with
11713 MSG_OOB set. These frames were created when TSO/GSO was enabled,
11714 so they may be large, and they will have the urgent pointer set
11715 in tcp_transmit_skb().
11716
11717 With this patch, such large packets will be fragmented again before
11718 going to the transmit routine.
11719
11720 As a side note, at least the following NICs are known to screw up
11721 the urgent pointer in the TCP header when doing TSO:
11722
11723 Intel 82566MM (PCI ID 8086:1049)
11724 Intel 82566DC (PCI ID 8086:104b)
11725 Intel 82541GI (PCI ID 8086:1076)
11726 Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
11727
11728 Signed-off-by: Petr Tesarik <email address hidden>
11729 Signed-off-by: David S. Miller <email address hidden>
~ ChangeLog-2.6.29 ~
168533 commit ef711cf1d156428 d4c2911b8c86c6c e90519dc45 user_zeroing_ intel user_zeroing_ intel
168534 Author: Eric Dumazet <email address hidden>
168535 Date: Fri Nov 14 00:53:54 2008 -0800
168536
168537 net: speedup dst_release()
168538
168539 During tbench/oprofile sessions, I found that dst_release() was in third position.
168540
168541 CPU: Core 2, speed 2999.68 MHz (estimated)
168542 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
168543 samples % symbol name
168544 483726 9.0185 __copy_
168545 191466 3.5697 __copy_user_intel
168546 185475 3.4580 dst_release
168547 175114 3.2648 ip_queue_xmit
168548 153447 2.8608 tcp_sendmsg
168549 108775 2.0280 tcp_recvmsg
168550 102659 1.9140 sysenter_past_esp
168551 101450 1.8914 tcp_current_mss
168552 95067 1.7724 __copy_from_user_ll
168553 86531 1.6133 tcp_transmit_skb
168554
168555 Of course, all CPUS fight on the dst_entry associated with 127.0.0.1
168556
168557 Instead of first checking the refcount value, then decrement it,
168558 we use atomic_dec_return() to help CPU to make the right memory transaction
168559 (ie getting the cache line in exclusive mode)
168560
168561 dst_release() is now at the fifth position, and tbench a litle bit faster ;)
168562
168563 CPU: Core 2, speed 3000.1 MHz (estimated)
168564 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
168565 samples % symbol name
168566 647107 8.8072 __copy_
168567 258840 3.5229 ip_queue_xmit
168568 258302 3.5155 __copy_user_intel
168569 209629 2.8531 tcp_sendmsg
168570 165632 2.2543 dst_release
168571 149232 2.0311 tcp_current_mss
168572 147821 2.0119 tcp_recvmsg
168573 137893 1.8767 sysenter_past_esp
168574 127473 1.7349 __copy_from_user_ll
168575 121308 1.6510 ip_finish_output
168576 118510 1.6129 tcp_transmit_skb
168577 109295 1.4875 tcp_v4_rcv
168578
168579 Signed-off-by: Eric Dumazet <email address hidden>
168580 Signed-off-by: David S. Miller <email address hidden>