[e1000e] ethtool -t eth0 offline loses routing table

Bug #1395269 reported by Peter Cordes
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

ethtool -t eth0 offline does the tests, but leaves the routing table with only the entry for the local network. I had to sudo route add default gw 10.0.0.1, in my case. The online test didn't do this.

Ubuntu 14.04, ethtool 1:3.13-1

Linux tesla 3.13.0-39-generic #66-Ubuntu SMP Tue Oct 28 13:30:27 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

ethtool -i eth0:
driver: e1000e
version: 2.3.2-k
firmware-version: 1.1-0
bus-info: 0000:00:19.0

relevant kernel log:
[637008.472410] e1000e 0000:00:19.0 eth0: offline testing starting
[637009.077985] e1000e 0000:00:19.0 eth0: testing unshared interrupt
[637022.468941] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X
[637022.572094] e1000e 0000:00:19.0: irq 45 for MSI/MSI-X
[637022.572257] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[637037.432893] e1000e: eth0 NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx
[637037.433003] e1000e 0000:00:19.0 eth0: Link Speed was downgraded by SmartSpeed
[637037.433005] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO
[637037.433035] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[637037.982611] net_ratelimit: 3 callbacks suppressed
[637037.982623] IPv4: martian source 10.0.0.17 from 80.73.161.44, on dev eth0
[637037.982628] ll header: 00000000: 00 19 d1 11 b4 9b 00 03 6d 11 34 1b 08 00 ........m.4...

(the martian packets are from TCP connections that my router is still NATing to this machine, even though without its routing table, it's not happy to see them.)

 And yes, my e1000e is autonegotiating to 10baseT/Full on the same cables and switch that still works at 1000baseT with another machine, hence running self-tests... I thought this machine used to run at 1000baseT, weird if I went 5 years without noticing my desktop being slow. Not what this bug report is about, though.

 The e1000e hardware is on a DG965WH Intel mobo (ICH8 / g965 graphics, first-gen core2)
00:19.0 Ethernet controller: Intel Corporation 82566DC Gigabit Network Connection (rev 02)
        Subsystem: Intel Corporation Device 0001
        Flags: bus master, fast devsel, latency 0, IRQ 45
        Memory at e0300000 (32-bit, non-prefetchable) [size=128K]
        Memory at e0324000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at 20e0 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Kernel driver in use: e1000e

$ ethtool eth0
Settings for eth0:
        Supported ports: [ TP ]
        Supported link modes: 10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
        Supported pause frame use: No
        Supports auto-negotiation: Yes
        Advertised link modes: 10baseT/Full
                                100baseT/Full
                                1000baseT/Full
        Advertised pause frame use: No
        Advertised auto-negotiation: Yes
        Speed: 10Mb/s
        Duplex: Full
        Port: Twisted Pair
        PHYAD: 1
        Transceiver: internal
        Auto-negotiation: on
        MDI-X: on (auto)
Cannot get wake-on-lan settings: Operation not permitted
        Current message level: 0x00000007 (7)
                               drv probe link
        Link detected: yes

Appears to be the same problem as someone reported to Redhat a while ago, which got marked as fixed for the igb driver
https://bugzilla.redhat.com/show_bug.cgi?format=multiple&id=661976
Not very useful info in their BTS, because the bug that it's a dup of is now flagged private, so nobody can even look at it.

 Possibly this is a per-driver thing, unless the right fix is to have ethtool save/restore the routing table entries for that iface.

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: ethtool 1:3.13-1
ProcVersionSignature: Ubuntu 3.13.0-39.66-generic 3.13.11.8
Uname: Linux 3.13.0-39-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.5
Architecture: amd64
Date: Sat Nov 22 03:00:35 2014
Dependencies:
 gcc-4.9-base 4.9.1-0ubuntu1
 libc6 2.19-0ubuntu6.3
 libgcc1 1:4.9.1-0ubuntu1
 multiarch-support 2.19-0ubuntu6.3
SourcePackage: ethtool
UpgradeStatus: Upgraded to trusty on 2014-07-14 (130 days ago)
---
ApportVersion: 2.14.1-0ubuntu3.5
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: peter 2715 F.... pulseaudio
 /dev/snd/pcmC0D0p: peter 2715 F...m pulseaudio
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=0da07ae0-ff5a-43c6-9702-519aff370fd5
IwConfig: Error: [Errno 2] No such file or directory
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: root=LABEL=t-root2 ro
ProcVersionSignature: Ubuntu 3.13.0-39.66-generic 3.13.11.8
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-39-generic N/A
 linux-backports-modules-3.13.0-39-generic N/A
 linux-firmware 1.127.4
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
Uname: Linux 3.13.0-39-generic x86_64
UpgradeStatus: Upgraded to trusty on 2014-07-14 (131 days ago)
UserGroups: adm admin audio cdrom dialout dip floppy fuse lpadmin plugdev sambashare scanner src staff users vboxusers video
_MarkForUpload: True
dmi.bios.date: 11/17/2008
dmi.bios.vendor: Intel Corp.
dmi.bios.version: MQ96510J.86A.1754.2008.1117.0002
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: DG965WH
dmi.board.vendor: Intel Corporation
dmi.board.version: AAD41692-304
dmi.chassis.type: 3
dmi.modalias: dmi:bvnIntelCorp.:bvrMQ96510J.86A.1754.2008.1117.0002:bd11/17/2008:svn:pn:pvr:rvnIntelCorporation:rnDG965WH:rvrAAD41692-304:cvn:ct3:cvr:

Revision history for this message
Peter Cordes (peter-cordes) wrote :
Revision history for this message
Ben Hutchings (benh-debian) wrote :

This is not ethtool's decision. Maybe the driver is taking the interface down and up.

affects: ethtool (Ubuntu) → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1395269

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Peter Cordes (peter-cordes) wrote : AlsaInfo.txt

apport information

summary: - ethtool -t eth0 offline loses routing table
+ [e1000e] ethtool -t eth0 offline loses routing table
tags: added: apport-collected
description: updated
Revision history for this message
Peter Cordes (peter-cordes) wrote : BootDmesg.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : Lspci.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : Lsusb.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : ProcEnviron.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : ProcModules.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : PulseList.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : UdevDb.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : UdevLog.txt

apport information

Revision history for this message
Peter Cordes (peter-cordes) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Peter Cordes (peter-cordes) wrote :

 So the idea is for drivers to not tell the kernel that the interface went down, while it's doing self-tests? I guess igb had this problem fixed, according to the redhat bug, but I guess not e1000e.

 Yes, I'm pretty sure the interface goes down during the offline portion of the full set of self-tests, for my e1000e. Connected to my switch, it takes longer than usual to autonegotiate a link. I should have posted this in the initial report, but here's the actual output:

sudo ethtool -t eth0 offline
The test result is FAIL
The test extra info:
Register test (offline) 0
Eeprom test (offline) 0
Interrupt test (offline) 0
Loopback test (offline) 0
Link test (on/offline) 1

 If this doesn't usually happen with e1000e, the long autonegotiation is probably the corner case that's causing it. It's so long that the link test fails. (also, would it make sense to do the link test first, before offline tests that trigger autonegotiation? Or do we WANT to flag problems like sketchy setups that require SmartSpeed fallback to 10baseT to make a working link?)

 The other solution would be to save/restore routing table entries for that interface. But that might cause problems in some corner cases. So it might be a lot of work to implement safely, in the face of complex routing tables and/or changes made during the self-test while the interface was still online. Oh duh, nvm, there's more than just IPv4 to save/restore routing tables for. Some custom protocol that ethtool doesn't know about would not have its routing table saved/restored.

 Anyway, thanks for having a look into this. It's not a problem for me now that I know about it, just wanted to get it reported so at least the docs could include a warning. That's all that I think really needs doing, since checking every driver would be a lot of work.

how about:
 ethtool(8):
...
       offline
              Perform all tests, including ones that interrupt normal operation. Some drivers may bring the interface down/up during this process, flushing routing table entries. They shouldn't, but be prepared just in case. Report problems with specific drivers against the Linux kernel (not ethtool).

 The "report a bug" sentence is probably too much, and could go.

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18-rc6-vivid/

Revision history for this message
Peter Cordes (peter-cordes) wrote :

Yup, will test this sometime this week and post again. Good suggestion, hadn't even thought of doing that, derp. :P

Revision history for this message
Peter Cordes (peter-cordes) wrote :

Found a time when I didn't mind rebooting, and tested tested
linux-image-3.18.0-031800rc7-generic version 3.18.0-031800rc7.201411302035

peter@tesla:~$ uname -a
Linux tesla 3.18.0-031800rc7-generic #201411302035 SMP Mon Dec 1 01:36:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

peter@tesla:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 eth0
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 eth0

peter@tesla:~$ sudo ethtool -t eth0 offline
The test result is FAIL
The test extra info:
Register test (offline) 0
Eeprom test (offline) 0
Interrupt test (offline) 0
Loopback test (offline) 0
Link test (on/offline) 1

peter@tesla:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 eth0

peter@tesla:~$ sudo route add default gw 10.0.0.1
peter@tesla:~$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.0.1 0.0.0.0 UG 0 0 0 eth0
10.0.0.0 0.0.0.0 255.0.0.0 U 0 0 0 eth0

dmesg (relevant section):

[ 2039.585309] e1000e 0000:00:19.0 eth0: offline testing starting
[ 2039.805954] e1000e: eth0 NIC Link is Down
[ 2040.225964] e1000e 0000:00:19.0 eth0: testing unshared interrupt
[ 2053.684783] e1000e 0000:00:19.0: irq 29 for MSI/MSI-X
[ 2053.788074] e1000e 0000:00:19.0: irq 29 for MSI/MSI-X
[ 2068.760891] e1000e: eth0 NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx
[ 2068.761001] e1000e 0000:00:19.0 eth0: Link Speed was downgraded by SmartSpeed
[ 2068.761004] e1000e 0000:00:19.0 eth0: 10/100 speed: disabling TSO

So it takes about 18 seconds for my NIC to fall back to a 10baseT link to my gigabit switch. I think my switch may be dying, I've seen some issues on other machines, too.

 If anyone's interested in working on fixing this, you could simulate this by unplugging the ethernet cable as you press return on ethtool, and wait 10 secs to plug it back in. Or just ethtool -t offline with your cable unplugged, I guess.

tags: added: e1000e kernel-bug-exists-upstream
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.