eth0: Reset adapter

Bug #1640856 reported by Ingo Voland
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

using ubuntu Ubuntu 14.04 LTS

we get sporadic networking adapter resets resulting in a loss of service. kernel log:

Oct 28 10:57:08 puno01 kernel: [2306272.568264] ffff881f35400000 ffff881f37e30940 0000000000000008 0000000000000024
Oct 28 10:57:08 puno01 kernel: [2306272.568274] Call Trace:
Oct 28 10:57:08 puno01 kernel: [2306272.568276] <IRQ> [<ffffffff81729f76>] dump_stack+0x64/0x82
Oct 28 10:57:08 puno01 kernel: [2306272.568299] [<ffffffff8106987d>] warn_slowpath_common+0x7d/0xa0
Oct 28 10:57:08 puno01 kernel: [2306272.568305] [<ffffffff810698ec>] warn_slowpath_fmt+0x4c/0x50
Oct 28 10:57:08 puno01 kernel: [2306272.568312] [<ffffffff8164d786>] dev_watchdog+0x276/0x280
Oct 28 10:57:08 puno01 kernel: [2306272.568318] [<ffffffff8164d510>] ? dev_graft_qdisc+0x80/0x80
Oct 28 10:57:08 puno01 kernel: [2306272.568327] [<ffffffff810767d6>] call_timer_fn+0x36/0x150
Oct 28 10:57:08 puno01 kernel: [2306272.568332] [<ffffffff8164d510>] ? dev_graft_qdisc+0x80/0x80
Oct 28 10:57:08 puno01 kernel: [2306272.568338] [<ffffffff810777cf>] run_timer_softirq+0x21f/0x310
Oct 28 10:57:08 puno01 kernel: [2306272.568347] [<ffffffff8106ee9c>] __do_softirq+0xfc/0x310
Oct 28 10:57:08 puno01 kernel: [2306272.568354] [<ffffffff8106f425>] irq_exit+0x105/0x110
Oct 28 10:57:08 puno01 kernel: [2306272.568365] [<ffffffff8143a755>] xen_evtchn_do_upcall+0x35/0x50
Oct 28 10:57:08 puno01 kernel: [2306272.568378] [<ffffffff8173c53e>] xen_do_hypervisor_callback+0x1e/0x30
Oct 28 10:57:08 puno01 kernel: [2306272.568380] <EOI> [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
Oct 28 10:57:08 puno01 kernel: [2306272.568393] [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
Oct 28 10:57:08 puno01 kernel: [2306272.568402] [<ffffffff8100ad70>] ? xen_safe_halt+0x10/0x20
Oct 28 10:57:08 puno01 kernel: [2306272.568414] [<ffffffff8101daff>] ? default_idle+0x1f/0x100
Oct 28 10:57:08 puno01 kernel: [2306272.568420] [<ffffffff8101e416>] ? arch_cpu_idle+0x26/0x30
Oct 28 10:57:08 puno01 kernel: [2306272.568429] [<ffffffff810c1be1>] ? cpu_startup_entry+0xc1/0x2b0
Oct 28 10:57:08 puno01 kernel: [2306272.568438] [<ffffffff81011178>] ? cpu_bringup_and_idle+0x18/0x20
Oct 28 10:57:08 puno01 kernel: [2306272.568441] ---[ end trace 52192aef6937b28d ]---
Oct 28 10:57:08 puno01 kernel: [2306272.568633] igb 0000:04:00.0 eth0: Reset adapter
Oct 28 10:57:08 puno01 kernel: [2306272.586016] xenbr0: port 1(eth0) entered disabled state
Oct 28 10:57:08 puno01 kernel: [2306272.600491] igb 0000:04:00.1 eth1: Reset adapter
Oct 28 10:57:12 puno01 kernel: [2306276.123402] igb: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX
:

we moved from kernel 3.13.0-88-generic to 4.4.0-45-generic, but the issue keeps reapearing sporadicly (apr 1 - 2 times a month). so the adapter reset happend on

cat /proc/version_signature
Ubuntu 3.13.0-88.135-generic 3.13.11-ckt39

and on

 cat /proc/version_signature
Ubuntu 4.4.0-45.66~14.04.1-generic 4.4.21

 lspci-vnvn.log are attached.
---
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Nov 5 04:35 seq
 crw-rw---- 1 root audio 116, 33 Nov 5 04:35 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.1-0ubuntu3.18
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.04
IwConfig: Error: [Errno 2] No such file or directory
Lsusb: Error: [Errno 2] No such file or directory
MachineType: Supermicro X10DRi
Package: linux (not installed)
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 LANG=de_DE.utf8
 LC_MESSAGES=POSIX
 SHELL=/bin/bash
ProcFB: 0 astdrmfb
ProcKernelCmdLine: placeholder root=UUID=fc9c0437-9d90-40fd-b0ee-73b4afb00a13 ro nomdmonddf nomdmonisw
ProcVersionSignature: Ubuntu 4.4.0-45.66~14.04.1-generic 4.4.21
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-45-generic N/A
 linux-backports-modules-4.4.0-45-generic N/A
 linux-firmware 1.127.22
RfKill: Error: [Errno 2] No such file or directory
Tags: trusty
Uname: Linux 4.4.0-45-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 04/14/2015
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 1.1
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: X10DRi
dmi.board.vendor: Supermicro
dmi.board.version: 1.02B
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 23
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr1.1:bd04/14/2015:svnSupermicro:pnX10DRi:pvr123456789:rvnSupermicro:rnX10DRi:rvr1.02B:cvnToBeFilledByO.E.M.:ct23:cvrToBeFilledByO.E.M.:
dmi.product.name: X10DRi
dmi.product.version: 123456789
dmi.sys.vendor: Supermicro

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1640856

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Ingo Voland (ivoland) wrote : BootDmesg.txt

apport information

tags: added: apport-collected
description: updated
Revision history for this message
Ingo Voland (ivoland) wrote : CRDA.txt

apport information

Revision history for this message
Ingo Voland (ivoland) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Ingo Voland (ivoland) wrote : Lspci.txt

apport information

Revision history for this message
Ingo Voland (ivoland) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Ingo Voland (ivoland) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Ingo Voland (ivoland) wrote : ProcModules.txt

apport information

Revision history for this message
Ingo Voland (ivoland) wrote : UdevDb.txt

apport information

Revision history for this message
Ingo Voland (ivoland) wrote : UdevLog.txt

apport information

Revision history for this message
Ingo Voland (ivoland) wrote : WifiSyslog.txt

apport information

Ingo Voland (ivoland)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.9 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.9-rc5

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Ingo Voland (ivoland) wrote :
Download full text (5.2 KiB)

we activated the susuggested kernel

 cat /proc/version
Linux version 4.9.0-040900rc5-generic (kernel@tangerine) (gcc version 6.2.0 20161005 (Ubuntu 6.2.0-5ubuntu12) ) #201611131431 SMP Sun Nov 13 19:33:15 UTC 2016

however, same issue reappeared:

Dec 11 21:58:00 puno01 kernel: [2306342.662457] ------------[ cut here ]------------
Dec 11 21:58:00 puno01 kernel: [2306342.662470] WARNING: CPU: 0 PID: 0 at /home/kernel/COD/linux/net/sched/sch_generic.c:316 dev_watchdog+0x22c/0x230
Dec 11 21:58:00 puno01 kernel: [2306342.662472] NETDEV WATCHDOG: eth0 (igb): transmit queue 7 timed out
Dec 11 21:58:00 puno01 kernel: [2306342.662474] Modules linked in: btrfs ufs qnx4 hfsplus hfs minix ntfs msdos jfs xfs dm_snapshot dm_bufio xen_netback xen_blkback xen_gntdev xen_evtchn xenfs xen_privcmd bridge stp llc intel_rapl sb_edac edac_core x86_pkg_temp_thermal ast ttm drm_kms_helper coretemp drm ipmi_ssif crct10dif_pclmul crc32_pclmul ghash_clmulni_intel fb_sys_fops syscopyarea aesni_intel sysfillrect sysimgblt aes_x86_64 lrw glue_helper ablk_helper cryptd intel_rapl_perf joydev input_leds ipmi_si ipmi_msghandler shpchp wmi lpc_ich ioatdma mac_hid acpi_power_meter raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear raid10 igb i2c_algo_bit dca ahci ptp libahci pps_core hid_generic mpt3sas raid_class fjes mptsas mptscsih mptbase scsi_transport_sas usbhid hid
Dec 11 21:58:00 puno01 kernel: [2306342.662555] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 4.9.0-040900rc5-generic #201611131431
Dec 11 21:58:00 puno01 kernel: [2306342.662557] Hardware name: Supermicro X10DRi/X10DRi, BIOS 1.1 04/14/2015
Dec 11 21:58:00 puno01 kernel: [2306342.662561] ffff88200e403da0 ffffffff81416e32 ffff88200e403df0 0000000000000000
Dec 11 21:58:00 puno01 kernel: [2306342.662565] ffff88200e403de0 ffffffff8108364b 0000013c00000000 0000000000000007
Dec 11 21:58:00 puno01 kernel: [2306342.662569] 0000000000000008 0000000000000000 ffff881ff433c000 ffff881ff6413940
Dec 11 21:58:00 puno01 kernel: [2306342.662574] Call Trace:
Dec 11 21:58:00 puno01 kernel: [2306342.662577] <IRQ>
Dec 11 21:58:00 puno01 kernel: [2306342.662586] [<ffffffff81416e32>] dump_stack+0x63/0x81
Dec 11 21:58:00 puno01 kernel: [2306342.662594] [<ffffffff8108364b>] __warn+0xcb/0xf0
Dec 11 21:58:00 puno01 kernel: [2306342.662598] [<ffffffff810836cf>] warn_slowpath_fmt+0x5f/0x80
Dec 11 21:58:00 puno01 kernel: [2306342.662603] [<ffffffff810f8c92>] ? hrtimer_interrupt+0xc2/0x180
Dec 11 21:58:00 puno01 kernel: [2306342.662606] [<ffffffff81799eac>] dev_watchdog+0x22c/0x230
Dec 11 21:58:00 puno01 kernel: [2306342.662609] [<ffffffff81799c80>] ? qdisc_rcu_free+0x40/0x40
Dec 11 21:58:00 puno01 kernel: [2306342.662616] [<ffffffff810f5f25>] call_timer_fn+0x35/0x120
Dec 11 21:58:00 puno01 kernel: [2306342.662619] [<ffffffff810f64b5>] run_timer_softirq+0x215/0x4b0
Dec 11 21:58:00 puno01 kernel: [2306342.662624] [<ffffffff810e5eca>] ? handle_percpu_irq+0x3a/0x50
Dec 11 21:58:00 puno01 kernel: [2306342.662631] [<ffffffff8188eb94>] __do_softirq+0x104/0x28c
Dec 11 21:58:00 puno01 kernel: [2306342.662634] [<ffffffff81089ab6>] irq_exit+0xb6/0xc0
Dec...

Read more...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-bug-exists-upstream
Revision history for this message
penalvch (penalvch) wrote :

Ingo Voland, to clarify:
1) Did this problem not occur in a kernel release prior to 3.13.0-88-generic?
2) To keep this relevant to upstream, one would want to periodically check for, and test the latest mainline kernel (now 4.9) as it is released.

Could you please advise?

tags: added: bios-outdated-2.1 kernel-bug-exists-upstream-4.9-rc5 needs-upstream-testing
removed: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Markus Heinrichs (mheinrichs) wrote :

I work together with Ingo. In the team, we get the feeling that we keep testing upstream kernels in our production systems but still lack the idea of how to really tackle the issue. Unfortunately we cannot risk our customer data with more new kernel versions. We will rest this ticket but please notice that the issue is still open. I am afraid it will just surface elsewhere, leading to data loss.

penalvch (penalvch)
Changed in linux (Ubuntu):
status: Expired → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.