network device (nVidia MCP55, forcedeth) stops sending packets

Bug #131737 reported by A. Karl Kornel
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned
linux-meta (Ubuntu)
Invalid
Undecided
Unassigned
linux-source-2.6.15 (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: linux-source-2.6.15

The Sun Fire X2200 M2 (BIOS version S39_3B16, running Dapper and kernel 2.6.15-28-amd64-generic) has four built-in network interfaces. Two network interfaces (eth0 and eth1) are nVidia MCP55 devices, and two (eth2 and eth3) are Broadcom devices. eth0 and eth1 are connected to Cisco switches, each on a different VLAN (although the interfaces don't know that, as VLAN tagging is not in use). eth3 is unused by the OS (it is active as the interface to the remote access system), and eth2 is directly connected to another machine (via crossover cable). eth1 is the primary interface, where most of the traffic flows.

eth0 and eth1 have been working well until a few weeks ago, when their network connections were moved. They originally started out connected to gigabit ports on a blade of a Cisco switch. The problem began when they were moved to a 10/100 blade on the same switch. The switch ports are hard-coded to 100MBps full-duplex, but the `ethtool` command reports that the network interface is running at 100Mbps half-duplex. Also, during certain circumstances, eth0 and eth1 can stop functioning. When eth0 and eth1 stop working, no outgoing traffic is actually sent, and incoming traffic can only be received on eth1. The problem always affects both interfaces at the same time, never just eth0 or eth1, always both eth0 and eth1. I have not experienced any problems (so far) with eth2 and eth3 (although the OS doesn't use eth3).

I have found two ways at present to trigger the problem. The first method is to flood the interface with a large amount of data. For example, downloading many large files at once will eventually expose the problem. Most recently, the problem appeared during a reinstallation of Ubuntu: I had installed the OS from DVD, and was in the process of upgrading/installing about 100 packages.

Another method is to attempt to change the network configuration. The network devices, as I said, seem to run at 100MBps half-duplex, even though the switch ports are set to run at 100MBps full-duplex. If I run the command `ethtool -s eth1 autoneg off speed 100 duplex full` to force the interface to 100MBps full-duplex, the connection on eth1 is lost. Restarting autonegotiation does nothing.

This problem was noticed several weeks ago, when the machine's network connections were switched from gigabit ports to 10/100 ports on the Cisco switch. The diagnosis was temporarily delayed by a hard drive failure (see Sun Online Support Center SR#65605769).

In an attempt to fix the problem, thinking that the Cisco blade may be at fault, I have had the malfunctioning network interfaces moved to a different 10/100 blade on our Cisco switch. The problem is still present after the move; the network devices show the same problems regardless of which 10/100 blade they are on. Unfortunately, all the gigabit ports have been taken since the move, so I am unable to move eth0 and eth1 back to gigabit ports to see if this fixes the problem.

To get the network interfaces working agin, besides moving network connections, I have tried four different things. First, I tried executing the command `/etc/init.d/networking restart`, which is supposed to deactivate and then reactivate the network devices, but that didn't work. I then tried executing the commands `/etc/init.d/networking stop`, `rmmod forcedeth`, `modprobe forcedeth`, and `/etc/init.d/networking start`. The commands should deactivate the network devices, unload and then reload the driver for eth0 and eth1, and then restart networking. This also didn't work. I eventually just tried shutting down the machine (using the command `shutdown -h now`) and restarting it, but that also did not work! Eventually, the only way to get eth0 and eth1 working again was to shut down the system and actually unplug it for approximately five minutes. That fixed the problem temporarily.

Examining the /var/log/messages file, which records pretty much everything reported to syslog, I found a number of messages relating to eth0 and eth1:

[Boot time]
Aug 8 15:15:12 npbnagb kernel: [ 54.426790] forcedeth.c: Reverse Engineered nForce ethernet driver. Version 0.54.
Aug 8 15:15:12 npbnagb kernel: [ 54.686201] forcedeth: using HIGHDMA
Aug 8 15:15:12 npbnagb kernel: [ 55.204363] eth0: forcedeth.c: subsystem: 0108e:534b bound to 0000:00:08.0
Aug 8 15:15:12 npbnagb kernel: [ 55.204811] forcedeth: using HIGHDMA
Aug 8 15:15:12 npbnagb kernel: [ 55.723467] eth1: forcedeth.c: subsystem: 0108e:534b bound to 0000:00:09.0

[When both network devices stop working]
Aug 8 16:42:24 npbnagb kernel: [ 5307.159987] NETDEV WATCHDOG: eth0: transmit timed out
Aug 8 16:42:24 npbnagb kernel: [ 5307.159992] eth0: Got tx_timeout. irq: 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.159995] eth0: Ring at 7e300000: next 2090 nic 1834
Aug 8 16:42:24 npbnagb kernel: [ 5307.159997] eth0: Dumping tx registers
Aug 8 16:42:24 npbnagb kernel: [ 5307.160003] 0: 00002000 000000ff 00000003 011e03ca 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160008] 20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160013] 40: 0420e20e 0000a855 00002e20 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160021] 60: 00000000 00000000 00000000 0000ffff 0000ffff 0000ffff 0000ffff 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160033] 80: 003b0f3e 40000001 00040000 007f0028 0000061c 00000001 00000000 00002dd9
Aug 8 16:42:24 npbnagb kernel: [ 5307.160044] a0: 0016070f 00000016 e0361600 00000a7f 00000001 00000000 1f00cccd 0000f480
Aug 8 16:42:24 npbnagb kernel: [ 5307.160055] c0: 10000101 00000001 00000001 00000001 00000001 00000001 00000001 00000001
Aug 8 16:42:24 npbnagb kernel: [ 5307.160065] e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001
Aug 8 16:42:24 npbnagb kernel: [ 5307.160073] 100: 7e300800 7e300000 007f00ff 00000000 00010064 00000000 00000010 7e300e80
Aug 8 16:42:24 npbnagb kernel: [ 5307.160080] 120: 7e300000 7e657802 a0000029 7a1d9010 8000061c 7e300aac 7e300720 00200010
Aug 8 16:42:24 npbnagb kernel: [ 5307.160085] 140: 00304120 00002600 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160091] 160: 00000000 00000000 00000000 00000000 01000080 0000c000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160102] 180: 00000016 00000008 0294796d 00008103 00000045 00000081 00000080 00008183
Aug 8 16:42:24 npbnagb kernel: [ 5307.160112] 1a0: 00000016 00000008 0294796d 00008103 00000045 00000081 00000080 00008183
Aug 8 16:42:24 npbnagb kernel: [ 5307.160123] 1c0: 00000016 00000008 0294796d 00008103 00000045 00000081 00000080 00008183
Aug 8 16:42:24 npbnagb kernel: [ 5307.160134] 1e0: 00000016 00000008 0294796d 00008103 00000045 00000081 00000080 00008183
Aug 8 16:42:24 npbnagb kernel: [ 5307.160144] 200: 00007770 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160154] 220: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160164] 240: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160175] 260: 00000000 00000000 fe027001 00000100 00000011 000000a3 fe027011 000001a3
Aug 8 16:42:24 npbnagb kernel: [ 5307.160186] 280: 0007530c 0000072b 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160195] 2a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160206] 2c0: 00000000 00000000 00000737 00000000 00019203 000000d8 0000072b 0082aed6
Aug 8 16:42:24 npbnagb kernel: [ 5307.160216] 2e0: 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000001
Aug 8 16:42:24 npbnagb kernel: [ 5307.160221] 300: 80212000 00000000 00000000 00000000 00000000 00002000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160226] 320: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160232] 340: 00000000 00000000 00000000 00000000 00000000 00000020 00a23dbc 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160236] 360: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160241] 380: 00000000 00000000 00000000 00000000 00000000 00000000 00000002 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160246] 3a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160250] 3c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160255] 3e0: 02211000 00000001 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160265] 400: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160275] 420: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160285] 440: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160295] 460: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160305] 480: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160314] 4a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160324] 4c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160334] 4e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160344] 500: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160354] 520: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160364] 540: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160373] 560: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160383] 580: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160393] 5a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160403] 5c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160413] 5e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160423] 600: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Aug 8 16:42:24 npbnagb kernel: [ 5307.160425] eth0: Dumping tx ring
Aug 8 16:42:24 npbnagb kernel: [ 5307.160430] 000: 00000000 74de0202 a0000029 // 00000000 79819a02 a0000029 // 00000000 79819c02 a0000029 // 00000000 79819602 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160437] 004: 00000000 79819402 a0000029 // 00000000 776a8e02 a0000029 // 00000000 79ed6602 a0000029 // 00000000 776a8c02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160443] 008: 00000000 776a8802 a0000029 // 00000000 776a8602 a0000029 // 00000000 776a8202 a0000029 // 00000000 79843e02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160450] 00c: 00000000 7779e602 a0000029 // 00000000 7e634602 a0000029 // 00000000 7e65ba02 a0000029 // 00000000 7e634002 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160457] 010: 00000000 7766d002 a0000029 // 00000000 79843602 a0000029 // 00000000 7766d402 a0000029 // 00000000 7766d802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160463] 014: 00000000 7766da02 a0000029 // 00000000 7766d602 a0000029 // 00000000 7766dc02 a0000029 // 00000000 7766de02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160470] 018: 00000000 79843c02 a0000029 // 00000000 776a8402 a0000029 // 00000000 79843402 a0000029 // 00000000 776a8a02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160476] 01c: 00000000 776a8002 a0000029 // 00000000 79819e02 a0000029 // 00000000 74de0802 a0000029 // 00000000 7766d202 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160483] 020: 00000000 79843a02 a0000029 // 00000000 7a501e02 a0000029 // 00000000 7a501c02 a0000029 // 00000000 7a501802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160490] 024: 00000000 7a501a02 a0000029 // 00000000 7a501602 a0000029 // 00000000 7a501402 a0000029 // 00000000 7a501202 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160496] 028: 00000000 7a501002 a0000029 // 00000000 7be98e02 a0000029 // 00000000 79832c02 a000011c // 00000000 7702dc02 a000011c
Aug 8 16:42:24 npbnagb kernel: [ 5307.160503] 02c: 00000000 7702d002 a000011c // 00000000 79832402 a000011c // 00000000 7d1f8402 a000011c // 00000000 7e6fb802 a000011c
Aug 8 16:42:24 npbnagb kernel: [ 5307.160509] 030: 00000000 79832802 a000011c // 00000000 7d1f8002 a000011c // 00000000 7e6fbc02 a000011c // 00000000 7bce6c02 a000011c
Aug 8 16:42:24 npbnagb kernel: [ 5307.160516] 034: 00000000 7e193402 a000011c // 00000000 7bce6002 a000011c // 00000000 7e69d002 a000011c // 00000000 7e69d402 a000011c
Aug 8 16:42:24 npbnagb kernel: [ 5307.160522] 038: 00000000 7e6d1a02 a0000029 // 00000000 7e698c02 a000011c // 00000000 7e6d1002 a0000029 // 00000000 7e698802 a000011c
Aug 8 16:42:24 npbnagb kernel: [ 5307.160529] 03c: 00000000 7e6d1e02 a0000029 // 00000000 7cb62c02 a000011c // 00000000 7e6d1c02 a0000029 // 00000000 7e115402 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160536] 040: 00000000 7d065602 a0000029 // 00000000 7d065002 a0000029 // 00000000 7d065802 a0000029 // 00000000 7d065402 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160542] 044: 00000000 7e662002 a0000029 // 00000000 7e662e02 a0000029 // 00000000 7e662802 a0000029 // 00000000 7e662a02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160549] 048: 00000000 7e662402 a0000029 // 00000000 7d18be02 a0000029 // 00000000 7d18b802 a0000029 // 00000000 7d18b602 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160555] 04c: 00000000 7d18bc02 a0000029 // 00000000 7e6d9202 a0000029 // 00000000 7e6d9e02 a0000029 // 00000000 7e6d9602 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160562] 050: 00000000 7e6d9a02 a0000029 // 00000000 7e6d9402 a0000029 // 00000000 7e664402 a0000029 // 00000000 7e664c02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160568] 054: 00000000 7e664a02 a0000029 // 00000000 7e664002 a0000029 // 00000000 7e664e02 a0000029 // 00000000 7e664802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160575] 058: 00000000 37b5aa02 a0000029 // 00000000 7e435e02 a0000029 // 00000000 7e6d1802 a0000029 // 00000000 7e657802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160582] 05c: 00000000 7e657a02 a0000029 // 00000000 7e657002 a0000029 // 00000000 7ef5a002 a0000029 // 00000000 37b78002 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160588] 060: 00000000 7d66da02 a0000029 // 00000000 7d66d802 a0000029 // 00000000 7d66d002 a0000029 // 00000000 7d66dc02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160595] 064: 00000000 7d66d602 a0000029 // 00000000 7e641002 a0000029 // 00000000 7e641a02 a0000029 // 00000000 7e637a02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160601] 068: 00000000 7e637802 a0000029 // 00000000 7d065a02 a0000029 // 00000000 7d065e02 a0000029 // 00000000 7cb61602 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160608] 06c: 00000000 7cb61e02 a0000029 // 00000000 7cb61402 a0000029 // 00000000 7cb61802 a0000029 // 00000000 7e6d6c02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160614] 070: 00000000 7e6d6e02 a0000029 // 00000000 7e6d6602 a0000029 // 00000000 7e6d6202 a0000029 // 00000000 7e6d6a02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160621] 074: 00000000 37b6fc02 a0000029 // 00000000 37b6f002 a0000029 // 00000000 37b6f202 a0000029 // 00000000 7e657e02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160627] 078: 00000000 7e657402 a0000029 // 00000000 7e657c02 a0000029 // 00000000 7e657602 a0000029 // 00000000 7e63ca02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160634] 07c: 00000000 7e65a202 a0000029 // 00000000 7e65a002 a0000029 // 00000000 7e65a402 a0000029 // 00000000 7e65a602 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160641] 080: 00000000 7e65f202 a0000029 // 00000000 7e65aa02 a0000029 // 00000000 7e65fa02 a0000029 // 00000000 7e65fe02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160647] 084: 00000000 7e65f002 a0000029 // 00000000 7e63bc02 a0000029 // 00000000 7e63b402 a0000029 // 00000000 7e6d5a02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160654] 088: 00000000 7e6d5002 a0000029 // 00000000 7e6d5e02 a0000029 // 00000000 7e6d5402 a0000029 // 00000000 7d0ea602 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160660] 08c: 00000000 7d0ea802 a0000029 // 00000000 7d0eaa02 a0000029 // 00000000 7d0ea002 a0000029 // 00000000 7d0ea402 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160667] 090: 00000000 7e649602 a0000029 // 00000000 7e649e02 a0000029 // 00000000 7e649c02 a0000029 // 00000000 7e649402 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160674] 094: 00000000 7cb61a02 a0000029 // 00000000 7cb61c02 a0000029 // 00000000 7e115002 a0000029 // 00000000 7e6d2802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160680] 098: 00000000 7e6d2002 a0000029 // 00000000 7e6d2602 a0000029 // 00000000 7e6d2402 a0000029 // 00000000 7e65b602 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160687] 09c: 00000000 7e65be02 a0000029 // 00000000 7e6d2a02 a0000029 // 00000000 7e634202 a0000029 // 00000000 7e634c02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160693] 0a0: 00000000 7e65e202 a0000029 // 00000000 7e65e802 a0000029 // 00000000 7e65ea02 a0000029 // 00000000 7e115202 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160700] 0a4: 00000000 7e115e02 a0000029 // 00000000 7e115602 a0000029 // 00000000 7e115802 a0000029 // 00000000 7e115a02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160706] 0a8: 00000000 77068e02 a0000029 // 00000000 77068c02 a0000029 // 00000000 77068a02 a0000029 // 00000000 77068802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160713] 0ac: 00000000 77068602 a0000029 // 00000000 77068402 a0000029 // 00000000 77068202 a0000029 // 00000000 74982e02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160720] 0b0: 00000000 77068002 a0000029 // 00000000 74982c02 a0000029 // 00000000 74982a02 a0000029 // 00000000 74982802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160726] 0b4: 00000000 74982602 a0000029 // 00000000 74982402 a0000029 // 00000000 74982202 a0000029 // 00000000 74982002 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160733] 0b8: 00000000 74980e02 a0000029 // 00000000 74980a02 a0000029 // 00000000 74980c02 a0000029 // 00000000 74980802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160739] 0bc: 00000000 74980602 a0000029 // 00000000 74980402 a0000029 // 00000000 74980202 a0000029 // 00000000 74980002 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160746] 0c0: 00000000 776cae02 a0000029 // 00000000 776cac02 a0000029 // 00000000 776ca602 a0000029 // 00000000 776ca802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160752] 0c4: 00000000 776ca402 a0000029 // 00000000 776ca202 a0000029 // 00000000 776ca002 a0000029 // 00000000 7e63b802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160759] 0c8: 00000000 7e63be02 a0000029 // 00000000 7e63b602 a0000029 // 00000000 7779ee02 a0000029 // 00000000 7779e802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160766] 0cc: 00000000 7779ea02 a0000029 // 00000000 7779e402 a0000029 // 00000000 7779e002 a0000029 // 00000000 7779e202 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160772] 0d0: 00000000 7e6d5202 a0000029 // 00000000 7779ec02 a0000029 // 00000000 7e63ba02 a0000029 // 00000000 776caa02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160779] 0d4: 00000000 79d7dc02 a0000029 // 00000000 79d7de02 a0000029 // 00000000 79d7da02 a0000029 // 00000000 79d7d802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160785] 0d8: 00000000 79d7d602 a0000029 // 00000000 79d7d402 a0000029 // 00000000 79d7d202 a0000029 // 00000000 7e6d5c02 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160792] 0dc: 00000000 79d7d002 a0000029 // 00000000 79843002 a0000029 // 00000000 79843802 a0000029 // 00000000 79843202 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160799] 0e0: 00000000 77012e02 a0000029 // 00000000 77012c02 a0000029 // 00000000 77012a02 a0000029 // 00000000 77012802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160805] 0e4: 00000000 77012602 a0000029 // 00000000 77012402 a0000029 // 00000000 77012202 a0000029 // 00000000 77012002 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160812] 0e8: 00000000 79ed6c02 a0000029 // 00000000 79ed6e02 a0000029 // 00000000 79ed6a02 a0000029 // 00000000 79ed6802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160818] 0ec: 00000000 79ed6402 a0000029 // 00000000 79ed6202 a0000029 // 00000000 79ed6002 a0000029 // 00000000 79819002 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160825] 0f0: 00000000 79819202 a0000029 // 00000000 74de0c02 a0000029 // 00000000 74de0a02 a0000029 // 00000000 74de0402 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160831] 0f4: 00000000 74de0e02 a0000029 // 00000000 79819802 a0000029 // 00000000 74de0002 a0000029 // 00000000 74de0602 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160838] 0f8: 00000000 74cbee02 a0000029 // 00000000 74cbec02 a0000029 // 00000000 74cbea02 a0000029 // 00000000 74cbe802 a0000029
Aug 8 16:42:24 npbnagb kernel: [ 5307.160845] 0fc: 00000000 74cbe402 a0000029 // 00000000 74cbe602 a0000029 // 00000000 74cbe202 a0000029 // 00000000 74cbe002 a0000029
[The above messages repeat every few minutes, as network packets attempt to go out]
[IMPORTANT NOTE: Although both eth0 and eth1 go down, I never see the above dumps for eth1, just eth0]

Here is how eth0 and eth1 are identified by the command `lspci -v` (run as root):
0000:00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
        Subsystem: Sun Microsystems Computer Corp.: Unknown device 534b
        Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 58
        Memory at fcff8000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at b880 [size=8]
        Memory at fcffa800 (32-bit, non-prefetchable) [size=256]
        Memory at fcffa400 (32-bit, non-prefetchable) [size=16]
        Capabilities: [44] Power Management version 2
        Capabilities: [70] #11 [8007]
        Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
        Capabilities: [6c] #08 [a802]
0000:00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3)
        Subsystem: Sun Microsystems Computer Corp.: Unknown device 534b
        Flags: bus master, 66MHz, fast devsel, latency 0, IRQ 50
        Memory at fcff7000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at b800 [size=8]
        Memory at fcffa000 (32-bit, non-prefetchable) [size=256]
        Memory at fcff6c00 (32-bit, non-prefetchable) [size=16]
        Capabilities: [44] Power Management version 2
        Capabilities: [70] #11 [8007]
        Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/3 Enable-
        Capabilities: [6c] #08 [a802]

Here is how the system, eth0, and eth1 are identified by the command `lshw` (run as root):
npbnagb
    description: Rack Mount Chassis
    product: Sun Fire X2200 M2
    vendor: Sun Microsystems
    version: Rev 50
    serial: 0710QAT050
    width: 32 bits
    capabilities: smbios-2.4 dmi-2.4
    configuration: boot=normal chassis=rackmount uuid=80E78BB6-BA79-0010-84C1-001636E08DAC
  *-core
       description: Motherboard
       product: S39
       vendor: Sun Microsystems
       physical id: 0
       version: Rev 50
       serial: 2029QTF0702MT1206
       slot: To Be Filled By O.E.M.
     *-firmware
          description: BIOS
          vendor: Sun Microsystems
          physical id: 0
          version: S39_3B16 (02/16/2007)
          size: 64KB
          capacity: 448KB
          capabilities: isa pci pnp apm upgrade shadowing escd cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer int10video acpi usb ls120boot zipboot biosbootspecification
<<snip information on CPU, memory, serial, and USB>>
     *-bridge:0
          description: Ethernet interface
          product: MCP55 Ethernet
          vendor: nVidia Corporation
          physical id: 8
          bus info: pci@00:08.0
          logical name: eth0
          version: a3
          serial: 00:16:36:e0:7f:0a
          size: 100000000
          capacity: 1000000000
          width: 32 bits
          clock: 66MHz
          capabilities: bridge bus_master cap_list ethernet physical mii 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegociation
          configuration: autonegociation=on broadcast=yes driver=forcedeth driverversion=0.54 duplex=half ip=10.231.25.86 link=yes multicast=yes port=MII speed=100MB/s
          resources: iomemory:fcff8000-fcff8fff ioport:b880-b887 iomemory:fcffa800-fcffa8ff iomemory:fcffa400-fcffa40f irq:58
     *-bridge:1
          description: Ethernet interface
          product: MCP55 Ethernet
          vendor: nVidia Corporation
          physical id: 9
          bus info: pci@00:09.0
          logical name: eth1
          version: a3
          serial: 00:16:36:e0:7f:0b
          size: 100000000
          capacity: 1000000000
          width: 32 bits
          clock: 66MHz
          capabilities: bridge bus_master cap_list ethernet physical mii 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegociation
          configuration: autonegociation=on broadcast=yes driver=forcedeth driverversion=0.54 duplex=half ip=10.1.16.85 link=yes multicast=yes port=MII speed=100MB/s
          resources: iomemory:fcff7000-fcff7fff ioport:b800-b807 iomemory:fcffa000-fcffa0ff iomemory:fcff6c00-fcff6c0f irq:50

Tags: cft-2.6.27
Revision history for this message
A. Karl Kornel (akkornel) wrote :

FYI, I also opened an SR with Sun. It's Sun Online Support Center SR#65611163. I'll relay reports as appropriate!

Revision history for this message
A. Karl Kornel (akkornel) wrote :

BTW, I've got an identical Sun Fire X2200 M2, which is not showing this problem. At least, I think they're identical: I purchased them and configured them at exactly the same time. As they are the members of a two-node cluster, I've always kept their configurations essentially identical.

Revision history for this message
A. Karl Kornel (akkornel) wrote :

Update: I've had the problem reappear two more times in the last 4 hours. The first time, I was `apt-get`ing about 50 packages; the network link (eth1) died partway through the download. The second time, I wasn't transferring much data, but I did have lots of connections open to various places.

When the network links did fail (again, eth0 and eth1 failed at the same time), eth1 recorded a number of error packets (36) immediately. Shortly afterwards, both eth0 and eth1 started racking up dropped packets until I shut the machine down. Five minutes of the power cord out, and it was good to go again!

Revision history for this message
A. Karl Kornel (akkornel) wrote :

As a stopgap measure, I've installed an Intel PRO/1000 dual-port PCI Express card, giving me two Ethernet ports to supplant the ones that are malfunctioning.

Revision history for this message
Paul Weaver (paul-weaver-uk) wrote :

I've had a similar issue with an MCP55 ethernet driver, on a different motherboard (a generic PC one) in a more recent kernel -- after about 250 days of uptime, receiving about 70-80GBytes/day, transmitting about 90-100GBytes per day the network just fell off, not even responding to pings

ifdown eth1 worked, but ifup eth1 hung, a reboot brought the machine back.

Jul 11 12:52:00 newsjtcfs99 -- MARK --
Jul 11 13:12:01 newsjtcfs99 -- MARK --
Jul 11 13:25:42 newsjtcfs99 kernel: [1482177.289006] NETDEV WATCHDOG: eth1: transmit timed out
Jul 11 13:25:42 newsjtcfs99 kernel: [1482177.289012] eth1: Got tx_timeout. irq: 00000036
Jul 11 13:25:42 newsjtcfs99 kernel: [1482177.289014] eth1: Ring at 7bfa2000
Jul 11 13:25:42 newsjtcfs99 kernel: [1482177.289015] eth1: Dumping tx registers
Jul 11 13:25:42 newsjtcfs99 kernel: [1482177.289020] 0: 00002036 000000ff 00000003 030903ca 00000000 00000000 00000000 00000000
Jul 11 13:25:42 newsjtcfs99 kernel: [1482177.289025] 20: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
...
Jul 11 13:25:42 newsjtcfs99 kernel: [1482177.289422] eth1: Dumping tx ring
Jul 11 13:25:42 newsjtcfs99 kernel: [1482177.289426] 000: 00000000 3661494a 00000000 // 00000000 217a3b40 20000117 // 00000000 387ab8da 20000040 // 00000000 73dbf8ce 20000046

Attached lshw, lspci -vvv, and var/log/messages

I'm also running the x86_64 version, but a later kernel (7.10's)

Linux newsjtcfs99 2.6.22-14-generic #1 SMP Sun Oct 14 21:45:15 GMT 2007 x86_64 GNU/Linux

Network is full duplex 1gbit plugged into a cisco 6513.

Revision history for this message
seisen1 (seisen-deactivatedaccount-deactivatedaccount) wrote :

Is this still a problem in the latest release of Ubuntu, Hardy Heron?

Revision history for this message
Barney Livingston (ubuntu-barnoid) wrote :

Yes. I'm seeing it on a Tyan Tomcat n3400b motherboard with two nVidia MCP55 interfaces running Hardy.

Revision history for this message
Zooko Wilcox-O'Hearn (zooko) wrote :

I'm having this problem with Kernel 2.6.26.1 (from the kernel stable team). Note that this means that patch http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.26.y.git;a=commit;h=4db0ee176e256444695ee2d7b004552e82fec987 must not fix the problem, since that patch is already in my kernel. There are some new patches in Linus's tree that might help:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9a33e883564c2db8e1b3b645de4579a98ac084d2

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=9c6624352cdba7ef4859dae44eb48d538ac78d1b

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=1ef6841b4c4d9cc26e53271016c1d432ea65ed24

I think I will try these next.

Here's my lscpi:

00:00.0 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a1)
00:01.0 ISA bridge: nVidia Corporation MCP55 LPC Bridge (rev a2)
00:01.1 SMBus: nVidia Corporation MCP55 SMBus (rev a2)
00:01.2 RAM memory: nVidia Corporation MCP55 Memory Controller (rev a2)
00:02.0 USB Controller: nVidia Corporation MCP55 USB Controller (rev a1)
00:02.1 USB Controller: nVidia Corporation MCP55 USB Controller (rev a2)
00:04.0 IDE interface: nVidia Corporation MCP55 IDE (rev a1)
00:05.0 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
00:05.1 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
00:05.2 IDE interface: nVidia Corporation MCP55 SATA Controller (rev a2)
00:06.0 PCI bridge: nVidia Corporation MCP55 PCI bridge (rev a2)
00:06.1 Audio device: nVidia Corporation MCP55 High Definition Audio (rev a2)
00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a2)
00:0f.0 PCI bridge: nVidia Corporation MCP55 PCI Express bridge (rev a2)
00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration
00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map
00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller
00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
02:00.0 VGA compatible controller: ATI Technologies Inc RV610 video device [Radeon HD 2400 PRO]
02:00.1 Audio device: ATI Technologies Inc RV610 audio device [Radeon HD 2400 PRO]

Revision history for this message
seisen1 (seisen-deactivatedaccount-deactivatedaccount) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. Unfortunately we can't fix it, because your description does not yet have enough information.

Please include the following additional information, if you have not already done so (pay attention to lspci's additional options), as required by the Ubuntu Kernel Team:
1. Please include the output of the command "uname -a" in your next response. It should be one, long line of text which includes the exact kernel version you're running, as well as the CPU architecture.
2. Please run the command "dmesg > dmesg.log" after a fresh boot and attach the resulting file "dmesg.log" to this bug report.
3. Please run the command "sudo lspci -vvnn > lspci-vvnn.log" and attach the resulting file "lspci-vvnn.log" to this bug report.

For your reference, the full description of procedures for kernel-related bug reports is available at https://wiki.ubuntu.com/KernelTeamBugPolicies Thanks in advance!

Changed in linux-source-2.6.15:
status: New → Invalid
Changed in linux:
status: New → Incomplete
Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Zooko,

Thanks for testing the upstream kernel, that's great and very helpful. If the new patches you test don't prove to help, care to also open an upstream bug report at bugzilla.kernel.org as it seems you've shown that this exists in the upstream kernel as well. Thanks.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Barney Livingston (ubuntu-barnoid) wrote :

I can confirm that with the Intrepid 2.6.27-3-server kernel our server which used to have this bug is still working after 14 days uptime and several GB through each interface.

Revision history for this message
jaduncan (jaduncan) wrote :

Success!

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
Kevin Lyda (lyda) wrote :

I'm getting this error with Ubuntu 12.04.2 with linux-image 3.5.0-32.

I get a tx_timeout and then the ethernet card is unresponsive.

Revision history for this message
Thomas Hotz (thotz-deactivatedaccount) wrote :

I also found a similar error with Ubuntu 12.04, so I confirm this bug.

Changed in linux-meta (Ubuntu):
status: New → Invalid
Revision history for this message
C de-Avillez (hggdh2) wrote :

contact on #ubuntu-bugs:

08:00:46 thotz | hello, I have a question on bug #131737: it's marked as fixed release, but someone reported that it happens in Ubuntu 12.04 LTS again. Is it possible to reopen a fixed bug? Thank you for the info!
08:00:47 ubot2` | Launchpad bug 131737 in linux (Ubuntu) "network device (nVidia MCP55, forcedeth) stops sending packets" [Undecided,Fix released] https://launchpad.net/bugs/131737
08:34:45 hggdh | thotz: given the age of this bug, I strongly recommend opening a new bug related to this (you can refer to this bug for completeness)
08:35:55 hggdh | thotz: after 4 years, the kernel code, API, and ABI have changed a lot

So: @kevin Lyda, @Thomas Hotz: please open a new bug.

Revision history for this message
Thomas Hotz (thotz-deactivatedaccount) wrote :

I just found this question http://askubuntu.com/questions/254195/network-not-working-on-nvidia-mcp55-under-ubuntu-12-04-12-10 that's why it seems that more people are affected.

Kevin Lyda please make a new bug report as mentioned above. Please mention the new bug report number here. Thank you!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.