Comment 79 for bug 60764

Revision history for this message
Kit Scuzz (kitsczud) wrote :

I'm also suffering from this bug, and I'm willing to do as much as is humanly possible to fix it in a reasonable time frame.
---------------------------------------------------------------------------
So first and foremost, I am suffering from this bug on two separate platforms: I have an Ubuntu 10.04 laptop (32bit). Information relevant to the laptop:
kit@kacertop:~$ uname -a
Linux kacertop 2.6.32-28-generic #55-Ubuntu SMP Mon Jan 10 21:21:01 UTC 2011 i686 GNU/Linux
kit@kacertop:~$ lspci | grep -i eth
02:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8071 PCI-E Gigabit Ethernet Controller (rev 16)

And a file server running Debian squeeze (64bit). Information relevant to the laptop:
kit@AlfredTCP:~$ uname -a
Linux AlfredTCP 2.6.32-5-amd64 #1 SMP Wed Jan 12 03:40:32 UTC 2011 x86_64 GNU/Linux
kit@AlfredTCP:~$ lspci | grep -i eth
03:07.0 Ethernet controller: D-Link System Inc DGE-530T Gigabit Ethernet Adapter (rev 11) (rev 11)
---------------------------------------------------------------------------
As you can see, both of these NICs use Marvell chipsets, and both are gigabit. I had the exact same issue with a Realtek gigabit chipset (who's model number I have forgotten).

I've tested this under a number of conditions and like the others here it seems to depend on network saturation (speeds > 2Mbit). The period between these failures is highly variable. And seem to be related more to reception than to transmission. I've tried to proposed fix of turning off TCP checksumming, which did not solve the issue.

I have replaced all of the components in the chain; the router, switches, MoCA, cables, and ethernet card, and I still have the issue, so if it is a hardware issue than it is with the wiring in my house, and it's propagating through the MoCA.

So as I've been trying to understand the issue, I've whipped up a couple tests. Netcat will transfer a whole file, but the file will regularly contain corruption (and also different areas corrupted) when transferring large volumes at high speeds. I checked using md5sums on both ends of the transfer. Rsync and scp will fail with "corrupted mac on input" or "connection reset by peer" depending on which end of the transfer you're on (the computers I listed above always see the "corrupted mac on input"). I ended up creating the following program to try and hunt down the corruption. It causes both computers to transfer blocks of data with a crc32 at the end of each packet. When the machine detects a corrupt packet it prints the contents of the packet and dies. Anyone interested in taking a look can download it here: http://www.scuzzstuff.org/temp/check_network_interface.zip

Sloshing through a packet which should only contain either 0xDEADBEEF 0xABADBABE 0xCAFEF00D or 0xDEFEC8ED I received the following:
"6d6435736d6435736d6435736d6435736d6435736d6435736d643573
6d6435736d6435736d6435736d6435736d6435736d6435736d643573
6d643573aaaa30aaaa30aaaa30aaaa306d6435736d6435736d643573
6d6435736d6435736d6435736d6435736d6435736d6435736d643573
6d6435736d6435736d643573"
I'm uncertain why I have "0x6d643573" but the corruption is obvious at the 0xaaaa

If there's anyone who can help me in trying to track this down more quickly, completely, or intelligently I'm desperate for advice. I would be willing to take my first foray into linux kernel programming to be able to make use of my fileserver the way I want to.

Thanks.