108e:abcd niu 10g ethernet driver lock-up (Transmit timed out, resetting) and NETDEV WATCHDOG

Bug #1164497 reported by arbuntu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Incomplete
Low
Unassigned

Bug Description

Using niu driver for this card:
"Oracle/SUN Multithreaded 10-Gigabit Ethernet Network Controller" (PCI id = 108e:abcd)
after a period (often less than 24 hours) the interface will hang, with errors every 5 seconds
"niu: xxx: eth2: Transmit timed out, resetting"

Sometimes also in syslog are messages
WARNING: at sch_generic:255 dev_watchdog
NETDEV WATCHDOG: eth2 (niu): transmit queue 10 timed out

I've seen this in kernel 3.5.0-26-generic #42~precise1-Ubuntu SMP installed from 12.04 server.
I've also seen it in kernel 3.2.0-39-generic #62-Ubuntu SMP after I installed this older version as a test.
(I've not *yet* seen it in kernel 3.2.0-38-generic #61-Ubuntu SMP which is running on a different computer, same hardware but installed from 12.04 then upgraded rather than being installed from 12.04.2 directly. The problem computer is also less heavily loaded in general than the one which has worked fine so far.)
---
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Dec 8 04:50 seq
 crw-rw---T 1 root audio 116, 33 Dec 8 04:50 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.0.1-0ubuntu17.6
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=UUID=5c24e548-cf18-4644-8f93-49c6e74b52a0
InstallationMedia: Ubuntu-Server 12.04.2 LTS "Precise Pangolin" - Release amd64 (20130214)
MachineType: SUN MICROSYSTEMS SUN FIRE X2250
MarkForUpload: True
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 LANGUAGE=en_GB:en
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_GB.UTF-8
 SHELL=/bin/bash
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.5.0-41-generic root=UUID=eeb10158-22d5-486b-bb2e-8745655a4a59 ro consoleblank=0 enable_mtrr_cleanup mtrr_gran_size=1M mtrr_chunk_size=256M
ProcVersionSignature: Ubuntu 3.5.0-41.64~precise1-generic 3.5.7.21
RelatedPackageVersions:
 linux-restricted-modules-3.5.0-41-generic N/A
 linux-backports-modules-3.5.0-41-generic N/A
 linux-firmware 1.79.7
RfKill: Error: [Errno 2] No such file or directory
Tags: precise
Uname: Linux 3.5.0-41-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: sysadmin vboxuser
dmi.bios.date: 03/01/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: S86_3A19
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: SUN FIRE X2250
dmi.board.vendor: SUN MICROSYSTEMS
dmi.board.version: 50
dmi.chassis.type: 23
dmi.chassis.vendor: SUN MICROSYSTEMS
dmi.chassis.version: 50
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrS86_3A19:bd03/01/2010:svnSUNMICROSYSTEMS:pnSUNFIREX2250:pvr50:rvnSUNMICROSYSTEMS:rnSUNFIREX2250:rvr50:cvnSUNMICROSYSTEMS:ct23:cvr50:
dmi.product.name: SUN FIRE X2250
dmi.product.version: 50
dmi.sys.vendor: SUN MICROSYSTEMS

Brad Figg (brad-figg)
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1164497

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: quantal
Revision history for this message
Joseph Salisbury (jsalisbury) wrote : Re: niu 10g ethernet driver lock-up (Transmit timed out, resetting) and NETDEV WATCHDOG

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.9 kernel[0] (Not a kernel in the daily directory) and install both the linux-image and linux-image-extra .deb packages.

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.9-rc5-raring/

Changed in linux (Ubuntu):
importance: Undecided → Medium
arbuntu (arb)
tags: added: precise
Revision history for this message
arbuntu (arb) wrote :
Download full text (4.2 KiB)

Thank you for your suggestions.

I have tried kernel 3.9.0-030900rc5-generic, and the computer has now been running for 5 days! (Previously the problem would have surfaced after a day or two). I had some issues initially after booting: it would boot but when trying to login the session would hang whilst running .profile, so I think it's not completely working.

I've just had another instance of:

niu 0000:09:00.0: eth2: Transmit timed out, resetting

This time though eth2 recovered and the network is still functioning, so that's an improvement but not a complete solution.

I don't know whether that means the problem is fixed or not, so I don't know whether to tag "kernel-bug-exists-upstream" or "kernel-fixed-upstream" (or even "kernel-unable-to-test-upstream" given the initial problems).

Here's the syslog:

Apr 10 02:58:58 metope2 kernel: [388060.816009] ------------[ cut here ]------------
Apr 10 02:58:58 metope2 kernel: [388060.816031] WARNING: at /home/apw/COD/linux/net/sched/sch_generic.c:255 dev_watchdog+0x262/0x270()
Apr 10 02:58:58 metope2 kernel: [388060.816037] Hardware name: SUN FIRE X2250
Apr 10 02:58:58 metope2 kernel: [388060.816039] NETDEV WATCHDOG: eth2 (niu): transmit queue 7 timed out
Apr 10 02:58:58 metope2 kernel: [388060.816042] Modules linked in: nfsv3 autofs4 nfsd nfs_acl auth_rpcgss nfs fscache lockd sunrpc tpm_infineon xt_hl ip6t_rt nf_conntrack_ipv6 nf_defrag_ipv6 ast ipt_REJECT xt_LOG ttm xt_limit drm_kms_helper drm xt_tcpudp xt_addrtype coretemp kvm_intel kvm i2c_algo_bit nf_conntrack_ipv4 nf_defrag_ipv4 xt_state sysimgblt sysfillrect ip6table_filter gpio_ich ip6_tables syscopyarea microcode ioatdma nf_conntrack_netbios_ns nf_conntrack_broadcast nf_nat_ftp nf_nat i5400_edac tpm_tis lpc_ich nf_conntrack_ftp edac_core nf_conntrack psmouse dca shpchp i5k_amb joydev serio_raw iptable_filter mac_hid lp ip_tables parport x_tables hid_generic usbhid hid raid10 raid456 async_pq async_xor xor async_memcpy async_raid6_recov e1000e ptp pps_core raid6_pq async_tx niu raid1 raid0 multipath linear
Apr 10 02:58:58 metope2 kernel: [388060.816126] Pid: 0, comm: swapper/1 Tainted: G I 3.9.0-030900rc5-generic #201303311835
Apr 10 02:58:58 metope2 kernel: [388060.816133] Call Trace:
Apr 10 02:58:58 metope2 kernel: [388060.816136] <IRQ> [<ffffffff8105a53f>] warn_slowpath_common+0x7f/0xc0
Apr 10 02:58:58 metope2 kernel: [388060.816147] [<ffffffff8105a636>] warn_slowpath_fmt+0x46/0x50
Apr 10 02:58:58 metope2 kernel: [388060.816156] [<ffffffff81077574>] ? wake_up_worker+0x24/0x30
Apr 10 02:58:58 metope2 kernel: [388060.816163] [<ffffffff8160f3f2>] dev_watchdog+0x262/0x270
Apr 10 02:58:58 metope2 kernel: [388060.816169] [<ffffffff81077fb0>] ? __queue_work+0x2a0/0x2a0
Apr 10 02:58:58 metope2 kernel: [388060.816172] [<ffffffff8160f190>] ? pfifo_fast_dequeue+0xe0/0xe0
Apr 10 02:58:58 metope2 kernel: [388060.816180] [<ffffffff8106a3a6>] call_timer_fn+0x46/0x160
Apr 10 02:58:58 metope2 kernel: [388060.816185] [<ffffffff8106be77>] run_timer_softirq+0x267/0x2c0
Apr 10 02:58:58 metope2 kernel: [388060.816191] [<ffffffff8101bad9>] ? read_tsc+0x9/0x20
Apr 10 02:58:58 metope2 kernel: [388060.816198] [<ffffffff8160f190>] ? pfifo_f...

Read more...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
arbuntu (arb)
tags: added: kernel-bug-exists-upstream
Revision history for this message
arbuntu (arb) wrote :

Sorry to report that the recent kernel also has the problem (3.9.0-030900rc5-generic).

Revision history for this message
penalvch (penalvch) wrote :

arbuntu, thank you for taking the time to report this bug and helping to make Ubuntu better. Please execute the following command, as it will automatically gather debugging information, in a terminal:
apport-collect 1164497
When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs.

tags: added: needs-kernel-logs needs-upstream-testing regression-potential
tags: added: kernel-bug-exists-upstream-v3.9-rc5
removed: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
summary: - niu 10g ethernet driver lock-up (Transmit timed out, resetting) and
- NETDEV WATCHDOG
+ 108e:abcd niu 10g ethernet driver lock-up (Transmit timed out,
+ resetting) and NETDEV WATCHDOG
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
arbuntu (arb) wrote :

[Unexpired -- nobody has bothered to investigate it yet the problem still exists]

Changed in linux (Ubuntu):
status: Expired → Confirmed
Revision history for this message
arbuntu (arb) wrote :

Turned on debugging using "ethtool -s eth0 msglvl $((0x7fff))" and saw the following messages when it hung:

[3408740.816032] niu: niu_interrupt() ldg[ffff8807141d16d0](18)
v0[8000000000] v1[0] v2[0]
[3408740.816036] niu 0000:09:00.0: eth2: niu_txchan_intr() cs[b860b860000c000]
[3408740.816038] niu 0000:09:00.0: eth2: niu_poll_core() v0[0000008000000000]
[3408740.816040] niu 0000:09:00.0: eth2: niu_tx_work() pkt_cnt[0] cons[119]
[3408740.816042] niu: niu_interrupt() ldg[ffff8807141d16d0](18)
v0[8000000000] v1[0] v2[0]
[3408740.820004] [sched_delayed] sched: RT throttling activated
[3408740.824021] niu 0000:09:00.0: eth2: Disable interrupts
[3408740.824044] niu 0000:09:00.0: eth2: Disable RX MAC
[3408740.824048] niu 0000:09:00.0: eth2: Disable IPP
[3408740.824054] niu 0000:09:00.0: eth2: Stop TX channels
[3408740.824641] niu 0000:09:00.0: eth2: Stop RX channels
[3408740.824652] niu 0000:09:00.0: eth2: Reset TX channels
[3408740.825212] niu 0000:09:00.0: eth2: Reset RX channels
[3408740.825999] niu 0000:09:00.0: eth2: Initialize TXC
[3408740.826002] niu 0000:09:00.0: eth2: Initialize TX channels

Revision history for this message
penalvch (penalvch) wrote :
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
arbuntu (arb) wrote : AcpiTables.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
arbuntu (arb) wrote : BootDmesg.txt

apport information

Revision history for this message
arbuntu (arb) wrote : CurrentDmesg.txt

apport information

Revision history for this message
arbuntu (arb) wrote : IwConfig.txt

apport information

Revision history for this message
arbuntu (arb) wrote : Lspci.txt

apport information

Revision history for this message
arbuntu (arb) wrote : Lsusb.txt

apport information

Revision history for this message
arbuntu (arb) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
arbuntu (arb) wrote : ProcInterrupts.txt

apport information

Revision history for this message
arbuntu (arb) wrote : ProcModules.txt

apport information

Revision history for this message
arbuntu (arb) wrote : UdevDb.txt

apport information

Revision history for this message
arbuntu (arb) wrote : UdevLog.txt

apport information

Revision history for this message
arbuntu (arb) wrote : WifiSyslog.txt

apport information

Revision history for this message
arbuntu (arb) wrote :

Hopefully that enough information now for you to debug this issue.
FYI see also https://bugzilla.kernel.org/show_bug.cgi?id=56631
Thank you

Changed in linux (Ubuntu):
status: Expired → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

arbuntu, as per your https://launchpadlibrarian.net/161783564/BootDmesg.txt :
[ 9.468539] WARNING: at /build/buildd/linux-lts-quantal-3.5.0/drivers/iommu/intel-iommu.c:3294 quirk_ioat_snb_local_iommu+0xab/0xc0()
[ 9.468544] Hardware name: SUN FIRE X2250
[ 9.468545] BIOS assigned incorrect VT-d unit for Intel(R) QuickData Technology device

As well, as per http://www.oracle.com/technetwork/systems/patches/firmware/release-history-jsp-138416.html#X2250 an update is available for your BIOS (3A20). If you update to this during your maintenance window following https://help.ubuntu.com/community/BiosUpdate , does it change anything? If it doesn't, could you please both specify what happened, and just provide the output of the following terminal command:
sudo dmidecode -s bios-version && sudo dmidecode -s bios-release-date

For more on BIOS updates and linux, please see https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette .

Thank you for your understanding.

tags: added: bios-outdated-3a20
Changed in linux (Ubuntu):
importance: Medium → Low
status: Confirmed → Incomplete
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.