Comment 3 for bug 569610

Revision history for this message
Luke J Militello (kilahurtz) wrote :

I did some searching and found these two entries in some kernel change logs. I have no idea if it helps any but I figured it couldn't hurt to ask...

 ~ ChangeLog-2.6.28 ~

 11696 commit 33cf71cee14743185305c61625c4544885055733
 11697 Author: Petr Tesarik <email address hidden>
 11698 Date: Fri Nov 21 16:42:58 2008 -0800
 11699
 11700 tcp: Do not use TSO/GSO when there is urgent data
 11701
 11702 This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
 11703
 11704 Since most (if not all) implementations of TSO and even the in-kernel
 11705 software GSO do not update the urgent pointer when splitting a large
 11706 segment, it is necessary to turn off TSO/GSO for all outgoing traffic
 11707 with the URG pointer set.
 11708
 11709 Looking at tcp_current_mss (and the preceding comment) I even think
 11710 this was the original intention. However, this approach is insufficient,
 11711 because TSO/GSO is turned off only for newly created frames, not for
 11712 frames which were already pending at the arrival of a message with
 11713 MSG_OOB set. These frames were created when TSO/GSO was enabled,
 11714 so they may be large, and they will have the urgent pointer set
 11715 in tcp_transmit_skb().
 11716
 11717 With this patch, such large packets will be fragmented again before
 11718 going to the transmit routine.
 11719
 11720 As a side note, at least the following NICs are known to screw up
 11721 the urgent pointer in the TCP header when doing TSO:
 11722
 11723 Intel 82566MM (PCI ID 8086:1049)
 11724 Intel 82566DC (PCI ID 8086:104b)
 11725 Intel 82541GI (PCI ID 8086:1076)
 11726 Broadcom NetXtreme II BCM5708 (PCI ID 14e4:164c)
 11727
 11728 Signed-off-by: Petr Tesarik <email address hidden>
 11729 Signed-off-by: David S. Miller <email address hidden>

 ~ ChangeLog-2.6.29 ~

168533 commit ef711cf1d156428d4c2911b8c86c6ce90519dc45
168534 Author: Eric Dumazet <email address hidden>
168535 Date: Fri Nov 14 00:53:54 2008 -0800
168536
168537 net: speedup dst_release()
168538
168539 During tbench/oprofile sessions, I found that dst_release() was in third position.
168540
168541 CPU: Core 2, speed 2999.68 MHz (estimated)
168542 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
168543 samples % symbol name
168544 483726 9.0185 __copy_user_zeroing_intel
168545 191466 3.5697 __copy_user_intel
168546 185475 3.4580 dst_release
168547 175114 3.2648 ip_queue_xmit
168548 153447 2.8608 tcp_sendmsg
168549 108775 2.0280 tcp_recvmsg
168550 102659 1.9140 sysenter_past_esp
168551 101450 1.8914 tcp_current_mss
168552 95067 1.7724 __copy_from_user_ll
168553 86531 1.6133 tcp_transmit_skb
168554
168555 Of course, all CPUS fight on the dst_entry associated with 127.0.0.1
168556
168557 Instead of first checking the refcount value, then decrement it,
168558 we use atomic_dec_return() to help CPU to make the right memory transaction
168559 (ie getting the cache line in exclusive mode)
168560
168561 dst_release() is now at the fifth position, and tbench a litle bit faster ;)
168562
168563 CPU: Core 2, speed 3000.1 MHz (estimated)
168564 Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000
168565 samples % symbol name
168566 647107 8.8072 __copy_user_zeroing_intel
168567 258840 3.5229 ip_queue_xmit
168568 258302 3.5155 __copy_user_intel
168569 209629 2.8531 tcp_sendmsg
168570 165632 2.2543 dst_release
168571 149232 2.0311 tcp_current_mss
168572 147821 2.0119 tcp_recvmsg
168573 137893 1.8767 sysenter_past_esp
168574 127473 1.7349 __copy_from_user_ll
168575 121308 1.6510 ip_finish_output
168576 118510 1.6129 tcp_transmit_skb
168577 109295 1.4875 tcp_v4_rcv
168578
168579 Signed-off-by: Eric Dumazet <email address hidden>
168580 Signed-off-by: David S. Miller <email address hidden>