e1000 locks up under load in Feisty.

Bug #106869 reported by Dylan
10
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Undecided
Unassigned
linux-source-2.6.20 (Ubuntu)
Won't Fix
Undecided
Unassigned

Bug Description

Binary package hint: linux-image-2.6.20-15-generic

On bootup, everything is normal. My server, a Slackware machine running 2.6.20.7 responds fine (it also worked fine under 2.6.19.1).

My client is an Ubuntu machine, recently upgraded to Feisty Fawn from Edgy Eft. Under Edgy (and 2 previous Ubuntu installations(, it was able to access NFS resources from the server fine. Since the upgrade, however, the network has been less than reliable. After a few days of trouble shooting (including upgrading the kernel on the server and testing with a number of other machines), I've determined the problem to be the e1000 module and kernel included in Ubuntu.

Here's an example of it working: if I yank the power on the switch connecting the machines, the Slackware server gets this in the dmesg:
e1000: eth0: e1000_watchdog: NIC Link is Down
e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex

While the client gets this:
[ 198.768000] e1000: eth0: e1000_watchdog: NIC Link is Down
[ 201.072000] e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
[ 201.072000] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 202.168000] e1000: eth0: e1000_watchdog: NIC Link is Down
[ 204.528000] e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex

However, after executing something like a "find . -name thumbs.db -print0 | xargs -0 rm" or trying to move a few gb of files around via NFS, the client no longer seems to have a connection. The dmesg doesn't report the link as going down, but all network connections fail. Removing and reinsterting the kernel module does not fix it. Neither does manually configuring an address. Other clients are still able to access the server just fine, however.

Most telling is that, while this is occurring, I can yank the power on the switch, and the client machine (Ubuntu) will not detect it. Instead, it will sit there spinning its wheels, complaining into the dmesg that the NFS server is not responding.

The only way to get the machine back on to the network is to reboot it.

Full dmesg of the affected kernel/machine combination:

[ 0.000000] Linux version 2.6.20-15-generic (root@vernadsky) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #2 SMP Sat Apr 14 00:54:01 UTC 2007 (Ubuntu 2.6.20-15.25-generic)
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] sanitize start
[ 0.000000] sanitize end
[ 0.000000] copy_e820_map() start: 0000000000000000 size: 000000000009fc00 end: 000000000009fc00 type: 1
[ 0.000000] copy_e820_map() type is E820_RAM
[ 0.000000] copy_e820_map() start: 000000000009fc00 size: 0000000000000400 end: 00000000000a0000 type: 2
[ 0.000000] copy_e820_map() start: 00000000000e8000 size: 0000000000018000 end: 0000000000100000 type: 2
[ 0.000000] copy_e820_map() start: 0000000000100000 size: 000000007feb0000 end: 000000007ffb0000 type: 1
[ 0.000000] copy_e820_map() type is E820_RAM
[ 0.000000] copy_e820_map() start: 000000007ffb0000 size: 0000000000010000 end: 000000007ffc0000 type: 3
[ 0.000000] copy_e820_map() start: 000000007ffc0000 size: 0000000000030000 end: 000000007fff0000 type: 4
[ 0.000000] copy_e820_map() start: 000000007fff0000 size: 0000000000010000 end: 0000000080000000 type: 2
[ 0.000000] copy_e820_map() start: 00000000ff380000 size: 0000000000c80000 end: 0000000100000000 type: 2
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009fc00 (usable)
[ 0.000000] BIOS-e820: 000000000009fc00 - 00000000000a0000 (reserved)
[ 0.000000] BIOS-e820: 00000000000e8000 - 0000000000100000 (reserved)
[ 0.000000] BIOS-e820: 0000000000100000 - 000000007ffb0000 (usable)
[ 0.000000] BIOS-e820: 000000007ffb0000 - 000000007ffc0000 (ACPI data)
[ 0.000000] BIOS-e820: 000000007ffc0000 - 000000007fff0000 (ACPI NVS)
[ 0.000000] BIOS-e820: 000000007fff0000 - 0000000080000000 (reserved)
[ 0.000000] BIOS-e820: 00000000ff380000 - 0000000100000000 (reserved)
[ 0.000000] 1151MB HIGHMEM available.
[ 0.000000] 896MB LOWMEM available.
[ 0.000000] found SMP MP-table at 000ff780
[ 0.000000] Entering add_active_range(0, 0, 524208) 0 entries of 256 used
[ 0.000000] Zone PFN ranges:
[ 0.000000] DMA 0 -> 4096
[ 0.000000] Normal 4096 -> 229376
[ 0.000000] HighMem 229376 -> 524208
[ 0.000000] early_node_map[1] active PFN ranges
[ 0.000000] 0: 0 -> 524208
[ 0.000000] On node 0 totalpages: 524208
[ 0.000000] DMA zone: 32 pages used for memmap
[ 0.000000] DMA zone: 0 pages reserved
[ 0.000000] DMA zone: 4064 pages, LIFO batch:0
[ 0.000000] Normal zone: 1760 pages used for memmap
[ 0.000000] Normal zone: 223520 pages, LIFO batch:31
[ 0.000000] HighMem zone: 2303 pages used for memmap
[ 0.000000] HighMem zone: 292529 pages, LIFO batch:31
[ 0.000000] DMI present.
[ 0.000000] ACPI: RSDP (v000 ACPIAM ) @ 0x000fa8e0
[ 0.000000] ACPI: RSDT (v001 A M I OEMRSDT 0x08000622 MSFT 0x00000097) @ 0x7ffb0000
[ 0.000000] ACPI: FADT (v002 A M I OEMFACP 0x08000622 MSFT 0x00000097) @ 0x7ffb0200
[ 0.000000] ACPI: MADT (v001 A M I OEMAPIC 0x08000622 MSFT 0x00000097) @ 0x7ffb0390
[ 0.000000] ACPI: MCFG (v001 A M I OEMMCFG 0x08000622 MSFT 0x00000097) @ 0x7ffb0400
[ 0.000000] ACPI: OEMB (v001 A M I AMI_OEM 0x08000622 MSFT 0x00000097) @ 0x7ffc0040
[ 0.000000] ACPI: DSDT (v001 939DV 939DV111 0x00000111 INTL 0x20051117) @ 0x00000000
[ 0.000000] ACPI: PM-Timer IO Port: 0x808
[ 0.000000] ACPI: Local APIC address 0xfee00000
[ 0.000000] ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
[ 0.000000] Processor #0 15:11 APIC version 16
[ 0.000000] ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled)
[ 0.000000] Processor #1 15:11 APIC version 16
[ 0.000000] ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0])
[ 0.000000] IOAPIC[0]: apic_id 2, version 17, address 0xfec00000, GSI 0-23
[ 0.000000] ACPI: IOAPIC (id[0x03] address[0xfec10000] gsi_base[24])
[ 0.000000] IOAPIC[1]: apic_id 3, version 17, address 0xfec10000, GSI 24-39
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[ 0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[ 0.000000] ACPI: IRQ0 used by override.
[ 0.000000] ACPI: IRQ2 used by override.
[ 0.000000] ACPI: IRQ9 used by override.
[ 0.000000] Enabling APIC mode: Flat. Using 2 I/O APICs
[ 0.000000] Using ACPI (MADT) for SMP configuration information
[ 0.000000] Allocating PCI resources starting at 88000000 (gap: 80000000:7f380000)
[ 0.000000] Detected 2000.096 MHz processor.
[ 18.135681] Built 1 zonelists. Total pages: 520113
[ 18.135685] Kernel command line: root=/dev/md0 ro quiet splash
[ 18.135811] mapped APIC to ffffd000 (fee00000)
[ 18.135814] mapped IOAPIC to ffffc000 (fec00000)
[ 18.135816] mapped IOAPIC to ffffb000 (fec10000)
[ 18.135819] Enabling fast FPU save and restore... done.
[ 18.135821] Enabling unmasked SIMD FPU exception support... done.
[ 18.135829] Initializing CPU#0
[ 18.135874] PID hash table entries: 4096 (order: 12, 16384 bytes)
[ 18.137961] Console: colour VGA+ 80x25
[ 18.138372] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[ 18.138788] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[ 18.200612] Memory: 2066492k/2096832k available (1992k kernel code, 29040k reserved, 893k data, 328k init, 1179328k highmem)
[ 18.200622] virtual kernel memory layout:
[ 18.200623] fixmap : 0xfff4e000 - 0xfffff000 ( 708 kB)
[ 18.200624] pkmap : 0xff800000 - 0xffc00000 (4096 kB)
[ 18.200626] vmalloc : 0xf8800000 - 0xff7fe000 ( 111 MB)
[ 18.200627] lowmem : 0xc0000000 - 0xf8000000 ( 896 MB)
[ 18.200628] .init : 0xc03d7000 - 0xc0429000 ( 328 kB)
[ 18.200629] .data : 0xc02f2264 - 0xc03d16d4 ( 893 kB)
[ 18.200630] .text : 0xc0100000 - 0xc02f2264 (1992 kB)
[ 18.200633] Checking if this processor honours the WP bit even in supervisor mode... Ok.
[ 18.279611] Calibrating delay using timer specific routine.. 4002.87 BogoMIPS (lpj=8005743)
[ 18.279653] Security Framework v1.0.0 initialized
[ 18.279660] SELinux: Disabled at boot.
[ 18.279675] Mount-cache hash table entries: 512
[ 18.279808] CPU: After generic identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003
[ 18.279816] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 18.279819] CPU: L2 Cache: 512K (64 bytes/line)
[ 18.279821] CPU 0(2) -> Core 0
[ 18.279823] CPU: After all inits, caps: 178bfbff e3d3fbff 00000000 00000410 00000001 00000000 00000003
[ 18.279833] Compat vDSO mapped to ffffe000.
[ 18.279837] Remapping vsyscall page to ffffe000
[ 18.279847] Checking 'hlt' instruction... OK.
[ 18.295720] SMP alternatives: switching to UP code
[ 18.296112] Early unpacking initramfs... done
[ 18.623972] ACPI: Core revision 20060707
[ 18.624493] ACPI: Looking for DSDT in initramfs... file /DSDT.aml not found, using machine DSDT.
[ 18.626263] CPU0: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 01
[ 18.626284] SMP alternatives: switching to SMP code
[ 18.626403] Booting processor 1/1 eip 3000
[ 18.636431] Initializing CPU#1
[ 18.714663] Calibrating delay using timer specific routine.. 4000.44 BogoMIPS (lpj=8000886)
[ 18.714670] CPU: After generic identify, caps: 178bfbff e3d3fbff 00000000 00000000 00000001 00000000 00000003
[ 18.714677] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[ 18.714679] CPU: L2 Cache: 512K (64 bytes/line)
[ 18.714681] CPU 1(2) -> Core 1
[ 18.714682] CPU: After all inits, caps: 178bfbff e3d3fbff 00000000 00000410 00000001 00000000 00000003
[ 18.715012] CPU1: AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ stepping 01
[ 18.715023] Total of 2 processors activated (8003.31 BogoMIPS).
[ 18.715571] ENABLING IO-APIC IRQs
[ 18.715809] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
[ 18.862528] checking TSC synchronization across 2 CPUs:
[ 0.000001] CPU#0 had 67 usecs TSC skew, fixed it up.
[ 0.000003] CPU#1 had -67 usecs TSC skew, fixed it up.
[ 0.003991] Brought up 2 CPUs
[ 0.097830] migration_cost=277
[ 0.098075] Booting paravirtualized kernel on bare hardware
[ 0.098141] Time: 22:23:08 Date: 03/15/107
[ 0.098172] NET: Registered protocol family 16
[ 0.098254] EISA bus registered
[ 0.098257] ACPI: bus type pci registered
[ 0.099068] PCI: PCI BIOS revision 3.00 entry at 0xf0031, last bus=4
[ 0.099070] PCI: Using configuration type 1
[ 0.099071] Setting up standard PCI resources
[ 0.109164] ACPI: Interpreter enabled
[ 0.109167] ACPI: Using IOAPIC for interrupt routing
[ 0.109813] ACPI: PCI Root Bridge [PCI0] (0000:00)
[ 0.109819] PCI: Probing PCI hardware (bus 00)
[ 0.110226] PCI quirk: region 0800-083f claimed by ali7101 ACPI
[ 0.110595] Boot video device is 0000:03:00.0
[ 0.110822] PCI: Transparent bridge - 0000:00:06.0
[ 0.110857] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0._PRT]
[ 0.122748] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.P0P4._PRT]
[ 0.122995] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.HTT_._PRT]
[ 0.127321] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEB1._PRT]
[ 0.127524] ACPI: PCI Interrupt Routing Table [\_SB_.PCI0.PEB2._PRT]
[ 0.128240] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 *5 6 7 10 11 12 14 15)
[ 0.128521] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 6 7 *10 11 12 14 15)
[ 0.128796] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 6 7 *10 11 12 14 15), disabled.
[ 0.129075] ACPI: PCI Interrupt Link [LNKD] (IRQs *3 4 5 6 7 10 11 12 14 15)
[ 0.129351] ACPI: PCI Interrupt Link [LNKE] (IRQs 3 4 5 6 7 10 *11 12 14 15)
[ 0.129630] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 *7 10 11 12 14 15)
[ 0.129908] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 10 *11 12 14 15)
[ 0.130187] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 10 11 12 14 15) *9
[ 0.130517] ACPI: PCI Interrupt Link [LNKP] (IRQs 3 *4 5 6 7 10 11 12 14 15)
[ 0.130592] Linux Plug and Play Support v0.97 (c) Adam Belay
[ 0.130602] pnp: PnP ACPI init
[ 0.133505] pnp: PnP ACPI: found 11 devices
[ 0.133509] PnPBIOS: Disabled by ACPI PNP
[ 0.133560] PCI: Using ACPI for IRQ routing
[ 0.133562] PCI: If a device doesn't work, try "pci=routeirq". If it helps, post a report
[ 0.140969] NET: Registered protocol family 8
[ 0.140971] NET: Registered protocol family 20
[ 0.141639] pnp: 00:08: ioport range 0x290-0x29f has been reserved
[ 0.141860] PCI: Bridge: 0000:00:01.0
[ 0.141862] IO window: disabled.
[ 0.141865] MEM window: disabled.
[ 0.141868] PREFETCH window: d7e00000-d7efffff
[ 0.141871] PCI: Bridge: 0000:00:02.0
[ 0.141872] IO window: disabled.
[ 0.141875] MEM window: disabled.
[ 0.141877] PREFETCH window: d7f00000-d7ffffff
[ 0.141881] PCI: Bridge: 0000:00:05.0
[ 0.141882] IO window: disabled.
[ 0.141887] MEM window: f5000000-f7efffff
[ 0.141890] PREFETCH window: d8000000-dfffffff
[ 0.141896] PCI: Bridge: 0000:00:06.0
[ 0.141898] IO window: e000-efff
[ 0.141903] MEM window: f7f00000-f7ffffff
[ 0.141907] PREFETCH window: 88000000-880fffff
[ 0.141923] ACPI: PCI Interrupt 0000:00:01.0[A] -> GSI 29 (level, low) -> IRQ 16
[ 0.141929] PCI: Setting latency timer of device 0000:00:01.0 to 64
[ 0.141936] ACPI: PCI Interrupt 0000:00:02.0[A] -> GSI 34 (level, low) -> IRQ 17
[ 0.141940] PCI: Setting latency timer of device 0000:00:02.0 to 64
[ 0.141946] PCI: Setting latency timer of device 0000:00:05.0 to 64
[ 0.141953] PCI: Setting latency timer of device 0000:00:06.0 to 64
[ 0.141993] NET: Registered protocol family 2
[ 0.191712] IP route cache hash table entries: 32768 (order: 5, 131072 bytes)
[ 0.191885] TCP established hash table entries: 131072 (order: 8, 1048576 bytes)
[ 0.192723] TCP bind hash table entries: 65536 (order: 7, 524288 bytes)
[ 0.193154] TCP: Hash tables configured (established 131072 bind 65536)
[ 0.193157] TCP reno registered
[ 0.207737] checking if image is initramfs... it is
[ 0.855106] Freeing initrd memory: 8111k freed
[ 1.480000] audit: initializing netlink socket (disabled)
[ 1.480000] audit(1176675788.664:1): initialized
[ 1.480000] highmem bounce pool size: 64 pages
[ 1.480000] VFS: Disk quotas dquot_6.5.1
[ 1.480000] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[ 1.480000] io scheduler noop registered
[ 1.480000] io scheduler anticipatory registered
[ 1.480000] io scheduler deadline registered
[ 1.480000] io scheduler cfq registered (default)
[ 1.528000] PCI: Setting latency timer of device 0000:00:01.0 to 64
[ 1.528000] assign_interrupt_mode Found MSI capability
[ 1.528000] Allocate Port Service[0000:00:01.0:pcie00]
[ 1.528000] PCI: Setting latency timer of device 0000:00:02.0 to 64
[ 1.528000] assign_interrupt_mode Found MSI capability
[ 1.528000] Allocate Port Service[0000:00:02.0:pcie00]
[ 1.528000] isapnp: Scanning for PnP cards...
[ 1.880000] isapnp: No Plug & Play device found
[ 1.904000] Real Time Clock Driver v1.12ac
[ 1.904000] Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
[ 1.904000] mice: PS/2 mouse device common for all mice
[ 1.904000] RAMDISK driver initialized: 16 RAM disks of 65536K size 1024 blocksize
[ 1.904000] input: Macintosh mouse button emulation as /class/input/input0
[ 1.904000] Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
[ 1.904000] ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
[ 1.904000] PNP: PS/2 Controller [PNP0303:PS2K] at 0x60,0x64 irq 1
[ 1.904000] PNP: PS/2 controller doesn't have AUX irq; using default 12
[ 1.908000] serio: i8042 KBD port at 0x60,0x64 irq 1
[ 1.908000] serio: i8042 AUX port at 0x60,0x64 irq 12
[ 1.908000] EISA: Probing bus 0 at eisa.0
[ 1.908000] EISA: Detected 0 cards.
[ 1.928000] input: AT Translated Set 2 keyboard as /class/input/input1
[ 1.936000] TCP cubic registered
[ 1.936000] NET: Registered protocol family 1
[ 1.936000] Starting balanced_irq
[ 1.936000] Using IPI No-Shortcut mode
[ 1.936000] ACPI: (supports S0 S1 S3 S4 S5)
[ 1.936000] Magic number: 3:272:401
[ 1.936000] Freeing unused kernel memory: 328k freed
[ 1.940000] Time: acpi_pm clocksource has been installed.
[ 3.164000] md: raid10 personality registered for level 10
[ 3.172000] SCSI subsystem initialized
[ 3.180000] libata version 2.20 loaded.
[ 3.184000] sata_uli 0000:00:12.1: version 1.1
[ 3.184000] ACPI: PCI Interrupt 0000:00:12.1[A] -> GSI 19 (level, low) -> IRQ 18
[ 3.184000] ata1: SATA max UDMA/133 cmd 0x0001dc00 ctl 0x0001d882 bmdma 0x0001d400 irq 18
[ 3.184000] ata2: SATA max UDMA/133 cmd 0x0001d800 ctl 0x0001d482 bmdma 0x0001d408 irq 18
[ 3.184000] scsi0 : sata_uli
[ 3.656000] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 3.672000] ata1.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
[ 3.672000] ata1.00: ATA-6: WDC WD2000JD-22HBB0, 08.02D08, max UDMA/133
[ 3.672000] ata1.00: 390721968 sectors, multi 16: LBA48
[ 3.684000] ata1.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
[ 3.684000] ata1.00: configured for UDMA/133
[ 3.684000] scsi1 : sata_uli
[ 4.156000] ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 4.172000] ata2.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
[ 4.172000] ata2.00: ATA-6: WDC WD2000JD-22HBB0, 08.02D08, max UDMA/133
[ 4.172000] ata2.00: 390721968 sectors, multi 16: LBA48
[ 4.184000] ata2.00: ata_hpa_resize 1: sectors = 390721968, hpa_sectors = 390721968
[ 4.184000] ata2.00: configured for UDMA/133
[ 4.184000] scsi 0:0:0:0: Direct-Access ATA WDC WD2000JD-22H 08.0 PQ: 0 ANSI: 5
[ 4.184000] SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
[ 4.184000] sda: Write Protect is off
[ 4.184000] sda: Mode Sense: 00 3a 00 00
[ 4.184000] SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 4.184000] SCSI device sda: 390721968 512-byte hdwr sectors (200050 MB)
[ 4.184000] sda: Write Protect is off
[ 4.184000] sda: Mode Sense: 00 3a 00 00
[ 4.184000] SCSI device sda: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 4.184000] sda: sda1 sda2 < sda5 sda6 >
[ 4.216000] sd 0:0:0:0: Attached scsi disk sda
[ 4.216000] scsi 1:0:0:0: Direct-Access ATA WDC WD2000JD-22H 08.0 PQ: 0 ANSI: 5
[ 4.216000] SCSI device sdb: 390721968 512-byte hdwr sectors (200050 MB)
[ 4.216000] sdb: Write Protect is off
[ 4.216000] sdb: Mode Sense: 00 3a 00 00
[ 4.216000] SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 4.216000] SCSI device sdb: 390721968 512-byte hdwr sectors (200050 MB)
[ 4.216000] sdb: Write Protect is off
[ 4.216000] sdb: Mode Sense: 00 3a 00 00
[ 4.216000] SCSI device sdb: write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[ 4.216000] sdb: sdb1 sdb2 < sdb5 sdb6 >
[ 4.240000] sd 1:0:0:0: Attached scsi disk sdb
[ 4.244000] Capability LSM initialized
[ 4.260000] device-mapper: ioctl: 4.11.0-ioctl (2006-10-12) initialised: <email address hidden>
[ 4.628000] md: bind<dm-0>
[ 4.628000] md: bind<dm-1>
[ 4.636000] md: raid1 personality registered for level 1
[ 4.636000] raid1: raid set md0 active with 2 out of 2 mirrors
[ 4.792000] usbcore: registered new interface driver usbfs
[ 4.792000] usbcore: registered new interface driver hub
[ 4.792000] usbcore: registered new device driver usb
[ 4.792000] ohci_hcd: 2006 August 04 USB 1.1 'Open' Host Controller (OHCI) Driver
[ 4.792000] ACPI: PCI Interrupt 0000:00:13.0[A] -> GSI 20 (level, low) -> IRQ 19
[ 4.792000] ohci_hcd 0000:00:13.0: OHCI Host Controller
[ 4.792000] ohci_hcd 0000:00:13.0: new USB bus registered, assigned bus number 1
[ 4.792000] ohci_hcd 0000:00:13.0: irq 19, io mem 0xf4fff000
[ 4.860000] ieee1394: Initialized config rom entry `ip1394'
[ 4.868000] Intel(R) PRO/1000 Network Driver - version 7.3.15-k2-NAPI
[ 4.868000] Copyright (c) 1999-2006 Intel Corporation.
[ 4.880000] usb usb1: configuration #1 chosen from 1 choice
[ 4.880000] hub 1-0:1.0: USB hub found
[ 4.880000] hub 1-0:1.0: 3 ports detected
[ 4.988000] ACPI: PCI Interrupt 0000:00:13.1[B] -> GSI 21 (level, low) -> IRQ 20
[ 4.988000] ohci_hcd 0000:00:13.1: OHCI Host Controller
[ 4.988000] ohci_hcd 0000:00:13.1: new USB bus registered, assigned bus number 2
[ 4.988000] ohci_hcd 0000:00:13.1: irq 20, io mem 0xf4ffe000
[ 5.048000] usb usb2: configuration #1 chosen from 1 choice
[ 5.048000] hub 2-0:1.0: USB hub found
[ 5.048000] hub 2-0:1.0: 3 ports detected
[ 5.156000] ACPI: PCI Interrupt 0000:00:13.3[D] -> GSI 23 (level, low) -> IRQ 21
[ 5.156000] ehci_hcd 0000:00:13.3: EHCI Host Controller
[ 5.156000] ehci_hcd 0000:00:13.3: new USB bus registered, assigned bus number 3
[ 5.156000] ehci_hcd 0000:00:13.3: debug port 1
[ 5.180000] ehci_hcd 0000:00:13.3: irq 21, io mem 0xf4ffdc00
[ 5.180000] ehci_hcd 0000:00:13.3: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
[ 5.184000] usb usb3: configuration #1 chosen from 1 choice
[ 5.184000] hub 3-0:1.0: USB hub found
[ 5.184000] hub 3-0:1.0: 8 ports detected
[ 5.292000] ALI15X3: IDE controller at PCI slot 0000:00:12.0
[ 5.292000] ACPI: PCI Interrupt 0000:00:12.0[A] -> GSI 19 (level, low) -> IRQ 18
[ 5.292000] ALI15X3: chipset revision 199
[ 5.292000] ALI15X3: not 100% native mode: will probe irqs later
[ 5.292000] ide0: BM-DMA at 0xff00-0xff07, BIOS settings: hda:pio, hdb:pio
[ 5.292000] ide1: BM-DMA at 0xff08-0xff0f, BIOS settings: hdc:pio, hdd:pio
[ 5.292000] Probing IDE interface ide0...
[ 5.308000] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 5.308000] sd 1:0:0:0: Attached scsi generic sg1 type 0
[ 5.860000] Probing IDE interface ide1...
[ 6.436000] ACPI: PCI Interrupt 0000:04:06.2[B] -> GSI 22 (level, low) -> IRQ 22
[ 6.484000] ohci1394: fw-host0: OHCI-1394 1.1 (PCI): IRQ=[22] MMIO=[f7fff800-f7ffffff] Max Packet=[2048] IR/IT contexts=[4/8]
[ 6.488000] ACPI: PCI Interrupt 0000:04:07.0[A] -> GSI 22 (level, low) -> IRQ 22
[ 6.748000] e1000: 0000:04:07.0: e1000_probe: (PCI:33MHz:32-bit) 00:0e:0c:b2:c7:c5
[ 6.944000] e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection
[ 7.440000] kjournald starting. Commit interval 5 seconds
[ 7.440000] EXT3-fs: mounted filesystem with ordered data mode.
[ 7.768000] ieee1394: Host added: ID:BUS[0-00:1023] GUID[0000000000000000]
[ 18.360000] e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
[ 19.640000] Linux agpgart interface v0.102 (c) Dave Jones
[ 19.684000] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[ 19.692000] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[ 19.936000] input: PC Speaker as /class/input/input2
[ 20.000000] ali1563: SMBus control = 0403
[ 20.000000] ali1563_probe: Returning 0
[ 20.036000] agpgart: Detected AGP bridge 20
[ 20.036000] Setting up ULi AGP.
[ 20.044000] agpgart: AGP aperture is 128M @ 0xc8000000
[ 20.056000] NET: Registered protocol family 17
[ 20.100000] ali15x3_smbus 0000:00:07.1: ALI15X3_smb region uninitialized - upgrade BIOS or use force_addr=0xaddr
[ 20.100000] ali15x3_smbus 0000:00:07.1: ALI15X3 not detected, module not inserted.
[ 20.120000] ali1535_smbus 0000:00:07.1: ALI1535_smb region uninitialized - upgrade BIOS?
[ 20.120000] ali1535_smbus 0000:00:07.1: ALI1535 not detected, module not inserted.
[ 20.348000] nvidia: module license 'NVIDIA' taints kernel.
[ 20.628000] ACPI: PCI Interrupt 0000:03:00.0[A] -> GSI 16 (level, low) -> IRQ 23
[ 20.628000] NVRM: loading NVIDIA UNIX x86 Kernel Module 1.0-9755 Mon Feb 26 23:21:15 PST 2007
[ 20.904000] ACPI: PCI Interrupt 0000:04:06.0[A] -> GSI 21 (level, low) -> IRQ 20
[ 21.132000] Intel ISA PCIC probe: not found.
[ 21.244000] lp: driver loaded but no devices found
[ 21.340000] Adding 1502036k swap on /dev/disk/by-uuid/fdc09087-4ba2-42a5-a1f6-fb70e0d47400. Priority:1 extents:1 across:1502036k
[ 21.340000] Adding 1502036k swap on /dev/disk/by-uuid/3f20f453-ed33-431b-91e9-54bf09e7bb6f. Priority:1 extents:1 across:1502036k
[ 21.428000] EXT3 FS on md0, internal journal
[ 21.572000] md: md1 stopped.
[ 21.812000] md: bind<sdb6>
[ 21.816000] md: bind<sda6>
[ 21.844000] raid10: raid set md1 active with 2 out of 2 devices
[ 22.820000] kjournald starting. Commit interval 5 seconds
[ 22.828000] EXT3 FS on md1, internal journal
[ 22.828000] EXT3-fs: mounted filesystem with ordered data mode.
[ 23.144000] NET: Registered protocol family 10
[ 23.144000] lo: Disabled Privacy Extensions
[ 23.656000] input: Power Button (FF) as /class/input/input3
[ 23.656000] ACPI: Power Button (FF) [PWRF]
[ 23.656000] input: Power Button (CM) as /class/input/input4
[ 23.660000] ACPI: Power Button (CM) [PWRB]
[ 23.676000] Using specific hotkey driver
[ 23.728000] No dock devices found.
[ 23.768000] ibm_acpi: ec object not found
[ 23.912000] pcc_acpi: loading...
[ 24.152000] powernow-k8: Found 2 AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ processors (version 2.00.00)
[ 24.152000] powernow-k8: 0 : fid 0xc (2000 MHz), vid 0x8
[ 24.152000] powernow-k8: 1 : fid 0xa (1800 MHz), vid 0xa
[ 24.152000] powernow-k8: 2 : fid 0x2 (1000 MHz), vid 0x12
[ 27.428000] usb 3-1: new high speed USB device using ehci_hcd and address 3
[ 27.564000] usb 3-1: configuration #1 chosen from 1 choice
[ 27.564000] hub 3-1:1.0: USB hub found
[ 27.564000] hub 3-1:1.0: 2 ports detected
[ 27.876000] usb 3-1.1: new high speed USB device using ehci_hcd and address 4
[ 27.980000] usb 3-1.1: configuration #1 chosen from 1 choice
[ 27.980000] hub 3-1.1:1.0: USB hub found
[ 27.980000] hub 3-1.1:1.0: 4 ports detected
[ 28.292000] usb 3-1.2: new high speed USB device using ehci_hcd and address 5
[ 28.468000] usb 3-1.2: configuration #1 chosen from 1 choice
[ 28.672000] usb 3-1.1.4: new low speed USB device using ehci_hcd and address 6
[ 28.784000] usb 3-1.1.4: configuration #1 chosen from 1 choice
[ 33.248000] usbcore: registered new interface driver libusual
[ 33.280000] Initializing USB Mass Storage driver...
[ 33.280000] scsi2 : SCSI emulation for USB Mass Storage devices
[ 33.280000] usbcore: registered new interface driver usb-storage
[ 33.280000] USB Mass Storage support registered.
[ 33.280000] usb-storage: device found at 5
[ 33.280000] usb-storage: waiting for device to settle before scanning
[ 33.404000] input: Logitech USB RECEIVER as /class/input/input5
[ 33.404000] lmpcm_usb.c: Detected device: Logitech USB RECEIVER
[ 33.404000] usbcore: registered new interface driver lmpcm_usb
[ 33.404000] ubuntu/misc/lmpcm_usb.c: v0.5.5:USB Logitech MediaPlay Cordless Mouse driver
[ 33.468000] usbcore: registered new interface driver hiddev
[ 33.468000] usbcore: registered new interface driver usbhid
[ 33.468000] drivers/usb/input/hid-core.c: v2.6:USB HID core driver
[ 33.968000] ppdev: user-space parallel port driver
[ 34.912000] apm: BIOS version 1.2 Flags 0x02 (Driver version 1.16ac)
[ 34.912000] apm: disabled - APM is not SMP safe.
[ 38.288000] usb-storage: device scan complete
[ 38.292000] scsi 2:0:0:0: Direct-Access SMSC 223 U HS-CF 3.60 PQ: 0 ANSI: 0
[ 38.296000] scsi 2:0:0:1: Direct-Access SMSC 223 U HS-MS 3.60 PQ: 0 ANSI: 0
[ 38.296000] scsi 2:0:0:2: Direct-Access SMSC 223 U HS-SM 3.60 PQ: 0 ANSI: 0
[ 38.300000] scsi 2:0:0:3: Direct-Access SMSC 223 U HS-SD/MMC 3.60 PQ: 0 ANSI: 0
[ 38.308000] sd 2:0:0:0: Attached scsi removable disk sdc
[ 38.308000] sd 2:0:0:0: Attached scsi generic sg2 type 0
[ 38.312000] sd 2:0:0:1: Attached scsi removable disk sdd
[ 38.312000] sd 2:0:0:1: Attached scsi generic sg3 type 0
[ 38.316000] sd 2:0:0:2: Attached scsi removable disk sde
[ 38.316000] sd 2:0:0:2: Attached scsi generic sg4 type 0
[ 38.328000] Bluetooth: Core ver 2.11
[ 38.328000] NET: Registered protocol family 31
[ 38.328000] Bluetooth: HCI device and connection manager initialized
[ 38.328000] Bluetooth: HCI socket layer initialized
[ 38.364000] sd 2:0:0:3: Attached scsi removable disk sdf
[ 38.364000] sd 2:0:0:3: Attached scsi generic sg5 type 0
[ 38.432000] Bluetooth: L2CAP ver 2.8
[ 38.432000] Bluetooth: L2CAP socket layer initialized
[ 38.636000] Bluetooth: RFCOMM socket layer initialized
[ 38.636000] Bluetooth: RFCOMM TTY layer initialized
[ 38.636000] Bluetooth: RFCOMM ver 1.8
[ 40.076000] agpgart: Found an AGP 3.0 compliant device at 0000:00:04.0.
[ 40.076000] agpgart: Putting AGP V3 device at 0000:00:04.0 into 8x mode
[ 40.076000] agpgart: Putting AGP V3 device at 0000:03:00.0 into 8x mode
[ 41.064000] /dev/vmmon[5929]: Module vmmon: registered with major=10 minor=165
[ 41.064000] /dev/vmmon[5929]: Module vmmon: initialized
[ 41.212000] /dev/vmnet: open called by PID 5957 (vmnet-bridge)
[ 41.212000] /dev/vmnet: hub 0 does not exist, allocating memory.
[ 41.212000] /dev/vmnet: port on hub 0 successfully opened
[ 41.212000] bridge-eth0: enabling the bridge
[ 41.212000] bridge-eth0: up
[ 41.212000] bridge-eth0: already up
[ 41.212000] bridge-eth0: attached
[ 48.456000] eth0: no IPv6 routers present
[ 191.800000] e1000: eth0: e1000_watchdog: NIC Link is Down
[ 197.676000] e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
[ 197.680000] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 198.768000] e1000: eth0: e1000_watchdog: NIC Link is Down
[ 201.072000] e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex
[ 201.072000] ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[ 202.168000] e1000: eth0: e1000_watchdog: NIC Link is Down
[ 204.528000] e1000: eth0: e1000_watchdog: NIC Link is Up 1000 Mbps Full Duplex

Revision history for this message
Dylan (dylang) wrote :

More interesting bits: lots of network traffic can cause the system to just be wonky, without actually killing the link.

Here is a summary of my experiences
 - Transfering big files over NFS will cause the NIC to no longer work.
 - Streaming video (Mythfrontend) will experience problems where the keyboard appears to get modifier keys (shift, etc) stuck on, and the finally cause the NIC to no longer work after a few more minutes.
 - Minor traffic (~20k/s of torrents) will cause the modifier keys to appear stuck on periodically.

It seems that bidirectional traffic makes it worse than uni-directional, and that higher rates of transfer lead to more rapid failures.

Revision history for this message
Dylan (dylang) wrote :
Download full text (4.7 KiB)

Here is a transcription of the lockup from the perspective of MythTV!

2007-04-27 00:11:47.138 Using runtime prefix = /usr
2007-04-27 00:11:47.144 DPMS is active.
2007-04-27 00:11:47.497 New DB connection, total: 1
2007-04-27 00:11:47.572 Connected to database 'mythconverg' at host: apollo
2007-04-27 00:11:47.600 Total desktop dim: 1920x1200, with 1 screen[s].
2007-04-27 00:11:47.630 Using screen 0, 1920x1200 at 0,0
2007-04-27 00:11:47.652 Current Schema Version: 1160
2007-04-27 00:11:47.652 mythfrontend version: 0.20.20060828-3 www.mythtv.org
2007-04-27 00:11:47.652 Enabled verbose msgs: important general
2007-04-27 00:11:49.193 Total desktop dim: 1920x1200, with 1 screen[s].
2007-04-27 00:11:49.196 Using screen 0, 1920x1200 at 0,0
2007-04-27 00:11:49.200 Switching to square mode (Titivillus)
2007-04-27 00:11:49.289 Using the Qt painter
mythtv: could not connect to socket
mythtv: No such file or directory
lirc_init failed for mythtv, see preceding messages
2007-04-27 00:11:51.025 Joystick disabled.
2007-04-27 00:11:51.198 Loading from: /usr/share/mythtv/themes/default/base.xml
2007-04-27 00:11:52.080 Registering Internal as a media playback plugin.
2007-04-27 00:12:29.375 XMLParse::LoadTheme using /usr/share/mythtv/themes/Titivillus/ui.xml
2007-04-27 00:12:32.205 Connecting to backend server: 192.168.0.3:6543 (try 1 of 5)
2007-04-27 00:12:32.207 Using protocol version 31
2007-04-27 00:12:43.744 New DB connection, total: 2
2007-04-27 00:12:43.826 Connected to database 'mythconverg' at host: apollo
2007-04-27 00:12:43.902 TV: Attempting to change from None to WatchingPreRecorded
2007-04-27 00:12:44.013 DPMS Deactivated
0: start_time: 0.036 duration: 165.324
1: start_time: 0.026 duration: 165.307
stream: start_time: 0.289 duration: 1837.046 bitrate=4896 kb/s
2007-04-27 00:12:44.440 AFD: Opened codec 0x8523800, id(MPEG2VIDEO) type(Video)
2007-04-27 00:12:44.481 AFD: Opened codec 0x8461790, id(MP2) type(Audio)
2007-04-27 00:12:44.505 Opening OSS audio device '/dev/dsp'.
2007-04-27 00:12:44.749 VideoOutputXv: XvMCTex: Init failed
2007-04-27 00:12:44.761 VideoOutputXv: XVideo Adaptor Name: 'NV17 Video Texture'
X Error: BadMatch (invalid parameter attributes) 8
  Major opcode: 140
  Minor opcode: 14
  Resource id: 0x290
2007-04-27 00:12:47.695 TV: Changing from None to WatchingPreRecorded
2007-04-27 00:12:47.696 New DB connection, total: 3
2007-04-27 00:12:47.716 Realtime priority would require SUID as root.
2007-04-27 00:12:47.741 Connected to database 'mythconverg' at host: apollo
2007-04-27 00:12:47.835 Video timing method: USleep with busy wait
2007-04-27 00:12:49.758 AO: Using time stretch 1.5
2007-04-27 00:20:17.793 NVP: prebuffering pause
2007-04-27 00:20:18.180 RingBuf(myth://192.168.0.3:6543/1032_20070426220000.mpg): Waited 1.0 seconds for data to become available...
2007-04-27 00:20:18.712 NVP: Prebuffer wait timed out 10 times.
2007-04-27 00:20:19.204 RingBuf(myth://192.168.0.3:6543/1032_20070426220000.mpg): Waited 2.0 seconds for data to become available...
2007-04-27 00:20:19.632 NVP: Prebuffer wait timed out 10 times.
2007-04-27 00:20:20.552 NVP: Prebuffer wait timed out 10 times.
2007-04-27 00:20:21.252 RingBuf(myth://192.168.0.3:...

Read more...

Revision history for this message
Dylan (dylang) wrote :

I've tried a few things. I did a blacklist ipv6, which seems to make it harder to trigger the condition with MythTV traffic (I was able to successfully watch a number of shows without the client locking up), but this didn't stop the case where any NFS transfers to my Slackware server locked up the client.

On a lark, I swapped the network card from the new e1000 to an old e100 card I had in a drawer. It locks up identically under NFS load:
[ 341.152000] NETDEV WATCHDOG: eth0: transmit timed out
[ 341.172000] e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex
[ 346.168000] NETDEV WATCHDOG: eth0: transmit timed out
[ 346.188000] e100: eth0: e100_watchdog: link up, 100Mbps, full-duplex

I suspect a deeper bug in the kernel, perhaps relating to interrupts, or the networking parts common to both code paths (where the skbs are transmitted).

Revision history for this message
Dylan (dylang) wrote :

I have a stronger idea of where this bug lurks now. I upgraded another e1000 equipped machine, and it exhibited similar symptoms.

Affected:
 SMP Linux 2.6.20-15, 2.6.20-16 (Ubuntu)
 SMP Linux 2.6.20.7 (kernel.org)
 SMP Linux 2.6.22.1 (kernel.org)

Unaffected:
 SMP Linux 2.6.17-11 (Ubuntu)
 UMP Linux02.6.22.1 (kernel.org)

Hardware affected:
 AMD X2 3600+, 3800+, 4200+, etc (1.9-2.5Ghz range I can test) with both cores enabled and kernel >= 2.6.20-15 + e1000 (included the latest driver from http://sourceforge.net/projects/e1000 , which was 7.6.5).

Hardware unaffected
 The exact same hardware with the dual-core mode disabled in the BIOS. I can easily sling around bits anywhere from 300Mbps to 600Mbps with jumbo frames, rock solid.

So... what's broken in SMP mode? What broke between 2.6.17-11 and 2.6.20-15?
I don't feel like wading through kernel diffs alone.

Revision history for this message
Matthew Lenz (matthew-nocturnal) wrote :

Having the same problem with a dell workstation purchased in april of this year. its core 2 duo E6600 system.

Revision history for this message
Matthew Tighe (tighem) wrote :

You guys still running Feisty? Still seeing the problem on whichever version you are running?

Changed in linux-source-2.6.20:
status: New → Confirmed
Revision history for this message
Dylan (dylang) wrote :

7.10 64-bit does not exhibit this behaviour. 32-bit Linux does, but 64-bit doesn't (on my machine).

Revision history for this message
Matthew Tighe (tighem) wrote :

Bug may already be fixed in later versions of kernel but need to confirm. See comments above.

Changed in linux-source-2.6.20:
assignee: nobody → linuxkernels
assignee: linuxkernels → linux-kernel-worm
assignee: linux-kernel-worm → kernel-bugs
Changed in linux-source-2.6.20:
assignee: kernel-bugs → nobody
Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this bug to the new "linux" package. However, development has already began for the upcoming Intrepid Ibex 8.10 release. It would be helpful if you could test the upcoming release and verify if this is still an issue - http://www.ubuntu.com/testing . If the issue still exists, please update this report by changing the Status of the "linux" task from "Incomplete" to "New". We appreciate your patience and understanding as we make this transition. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

*This is an automated response*

This bug report is being closed because we received no response to the previous request for information. Please reopen this if it is still an issue in the actively developed pre-release of Jaunty Jackalope 9.04 - http://cdimage.ubuntu.com/releases/jaunty . To reopen the bug report simply change the Status of the "linux" task back to "New".

Changed in linux:
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.