NIC RTL8168 randomly disconnected from pci bus

Bug #2034916 reported by LittleBigBrain
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-hwe-6.2 (Ubuntu)
Incomplete
Undecided
Unassigned

Bug Description

After kernel update to 6.2.0-26 and newer. The my device experience network problem from time to time. In the end, I found the NIC card actually fall off from the pcie bus:

```
lspci -vnnk
03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev ff) (prog-if ff)
        DeviceName: GLAN
        !!! Unknown header type 7f
        Kernel modules: r8169
```

Only reboot can reinitiate the nic.

6.2.0-32-generic #32~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Fri Aug 18 10:40:13 UTC 2 x86_64 x86_64 x86_64 GNU/Linux

The NIC is this one https://linux-hardware.org/?id=pci:10ec-8168-1043-208f

dmesg:

Sep 08 10:51:39 : ------------[ cut here ]------------
Sep 08 10:51:39 : NETDEV WATCHDOG: eno2 (r8169): transmit queue 0 timed out
Sep 08 10:51:39 : WARNING: CPU: 10 PID: 0 at net/sched/sch_generic.c:525 dev_watchdog+0x21f/0x230
Sep 08 10:51:39 : Modules linked in: rfcomm xt_CHECKSUM nvme_fabrics nft_chain_nat xt_MASQUERADE nf_nat xfrm_user xfrm_algo cmac algif_hash algif_skci
pher af_alg bnep ip6t_REJECT nf_reject_ipv6 xt_hl ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog nft_limit xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_c
onntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat sunrpc nf_tables nfnetlink binfmt_misc dm_crypt nls_iso8859_1 snd_hda_codec_realtek snd_hda_codec_generic snd_sof
_pci_intel_cnl snd_sof_intel_hda_common soundwire_intel soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd
_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_bus snd_soc_core snd_hda_codec_hdmi snd_compress ac97_bus snd_pcm_dmae
ngine snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec intel_tcc_cooling x86_pkg_temp_thermal snd_hda_core intel_powerclamp snd_hwdep btusb btrtl uc
si_ccg snd_pcm cmdlinepart btbcm kvm_intel
Sep 08 10:51:39 : typec_ucsi btintel spi_nor hid_multitouch mei_pxp mei_hdcp kvm iwlmvm typec ee1004 mtd intel_rapl_msr snd_seq_midi btmtk i915 btrfs
 mac80211 snd_seq_midi_event irqbypass bluetooth snd_rawmidi rapl blake2b_generic drm_buddy libarc4 snd_seq ttm intel_cstate xor mfd_aaeon snd_seq_device ecdh_gene
ric asus_nb_wmi input_leds iwlwifi joydev raid6_pq processor_thermal_device_pci_legacy ecc wmi_bmof mxm_wmi snd_timer drm_display_helper processor_thermal_device s
nd r8169 cec processor_thermal_rfim i2c_i801 rc_core cfg80211 spi_intel_pci i2c_nvidia_gpu processor_thermal_mbox soundcore spi_intel i2c_ccgx_ucsi realtek intel_l
pss_pci mei_me i2c_algo_bit i2c_smbus processor_thermal_rapl intel_lpss mei intel_rapl_common idma64 intel_pch_thermal intel_soc_dts_iosf int3403_thermal int3400_t
hermal int340x_thermal_zone acpi_thermal_rel acpi_tad acpi_pad asus_wireless mac_hid nvidia_uvm(POE) sch_fq_codel pstore_blk ramoops pstore_zone reed_solomon efi_p
store xfs libcrc32c wacom hid_asus
Sep 08 10:51:39 : asus_wmi ledtrig_audio sparse_keymap platform_profile hid_generic uas usbhid usb_storage nvidia_drm(POE) nvidia_modeset(POE) nvidia
(POE) crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel drm_kms_helper nvme crypto_simd cryptd drm nvme_co
re serio_raw ahci i2c_hid_acpi syscopyarea nvme_common libahci i2c_hid sysfillrect xhci_pci sysimgblt xhci_pci_renesas hid video wmi pinctrl_cannonlake overlay ipt
able_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables coretemp msr parport_pc ppdev lp parport ip_tables x_tables autofs4
Sep 08 10:51:39 : CPU: 10 PID: 0 Comm: swapper/10 Tainted: P OE 6.2.0-32-generic #32~22.04.1-Ubuntu
Sep 08 10:51:39 : Hardware name: ASUSTeK COMPUTER INC. ROG Strix G731GW_G731GW/G731GW, BIOS G731GW.309 01/29/2021
Sep 08 10:51:39 : RIP: 0010:dev_watchdog+0x21f/0x230
Sep 08 10:51:39 : Code: 00 e9 31 ff ff ff 4c 89 e7 c6 05 66 83 78 01 01 e8 56 00 f8 ff 44 89 f1 4c 89 e6 48 c7 c7 08 4f 84 91 48 89 c2 e8 61 df 2b ff <0f> 0b e9 22 ff ff ff 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
Sep 08 10:51:39 : RSP: 0018:fffface9c0378e70 EFLAGS: 00010246
Sep 08 10:51:39 : RAX: 0000000000000000 RBX: ffff9944cf17c4c8 RCX: 0000000000000000
Sep 08 10:51:39 : RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Sep 08 10:51:39 : RBP: fffface9c0378e98 R08: 0000000000000000 R09: 0000000000000000
Sep 08 10:51:39 : R10: 0000000000000000 R11: 0000000000000000 R12: ffff9944cf17c000
Sep 08 10:51:39 : R13: ffff9944cf17c41c R14: 0000000000000000 R15: 0000000000000000
Sep 08 10:51:39 : FS: 0000000000000000(0000) GS:ffff9953fdc80000(0000) knlGS:0000000000000000
Sep 08 10:51:39 : CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 08 10:51:39 : CR2: 000024e000389000 CR3: 0000000d32e10002 CR4: 00000000003706e0
Sep 08 10:51:39 : Call Trace:
Sep 08 10:51:39 : <IRQ>
Sep 08 10:51:39 : ? show_regs+0x72/0x90
Sep 08 10:51:39 : ? dev_watchdog+0x21f/0x230
Sep 08 10:51:39 : ? __warn+0x8d/0x160
Sep 08 10:51:39 : ? dev_watchdog+0x21f/0x230
Sep 08 10:51:39 : ? report_bug+0x1bb/0x1d0
Sep 08 10:51:39 : ? handle_bug+0x46/0x90
Sep 08 10:51:39 : ? exc_invalid_op+0x19/0x80
Sep 08 10:51:39 : ? asm_exc_invalid_op+0x1b/0x20
Sep 08 10:51:39 : ? dev_watchdog+0x21f/0x230
Sep 08 10:51:39 : ? __pfx_dev_watchdog+0x10/0x10
Sep 08 10:51:39 : call_timer_fn+0x29/0x160
Sep 08 10:51:39 : ? __pfx_dev_watchdog+0x10/0x10
Sep 08 10:51:39 : __run_timers.part.0+0x1fb/0x2b0
Sep 08 10:51:39 : ? ktime_get+0x43/0xc0
Sep 08 10:51:39 : ? __pfx_tick_sched_timer+0x10/0x10
Sep 08 10:51:39 : ? lapic_next_deadline+0x2c/0x50
Sep 08 10:51:39 : ? clockevents_program_event+0xb2/0x140
Sep 08 10:51:39 : run_timer_softirq+0x2a/0x60
Sep 08 10:51:39 : __do_softirq+0xda/0x330
Sep 08 10:51:39 : ? hrtimer_interrupt+0x12b/0x250
Sep 08 10:51:39 : __irq_exit_rcu+0xa2/0xd0
Sep 08 10:51:39 : irq_exit_rcu+0xe/0x20
Sep 08 10:51:39 : sysvec_apic_timer_interrupt+0x96/0xb0
Sep 08 10:51:39 : </IRQ>
Sep 08 10:51:39 : <TASK>
Sep 08 10:51:39 : asm_sysvec_apic_timer_interrupt+0x1b/0x20
Sep 08 10:51:39 : RIP: 0010:cpuidle_enter_state+0xde/0x6f0
Sep 08 10:51:39 : Code: 79 51 6f e8 a4 34 45 ff 8b 53 04 49 89 c7 0f 1f 44 00 00 31 ff e8 52 13 44 ff 80 7d d0 00 0f 85 e8 00 00 00 fb 0f 1f 44 00 00 <45> 85 f6 0f 88 0f 02 00 00 4d 63 ee 49 83 fd 09 0f 87 c4 04 00 00
Sep 08 10:51:39 : RSP: 0018:fffface9c015fe28 EFLAGS: 00000246
Sep 08 10:51:39 : RAX: 0000000000000000 RBX: ffffcce9bfc80100 RCX: 0000000000000000
Sep 08 10:51:39 : RDX: 000000000000000a RSI: 0000000000000000 RDI: 0000000000000000
Sep 08 10:51:39 : RBP: fffface9c015fe78 R08: 0000000000000000 R09: 0000000000000000
Sep 08 10:51:39 : R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff922c2840
Sep 08 10:51:39 : R13: 0000000000000004 R14: 0000000000000004 R15: 000002c008948c36
Sep 08 10:51:39 : ? cpuidle_enter_state+0xce/0x6f0
Sep 08 10:51:39 : cpuidle_enter+0x2e/0x50
Sep 08 10:51:39 : cpuidle_idle_call+0x14f/0x1e0
Sep 08 10:51:39 : do_idle+0x82/0x110
Sep 08 10:51:39 : cpu_startup_entry+0x20/0x30
Sep 08 10:51:39 : start_secondary+0x122/0x160
Sep 08 10:51:39 : secondary_startup_64_no_verify+0xe5/0xeb
Sep 08 10:51:39 : </TASK>
Sep 08 10:51:39 : ---[ end trace 0000000000000000 ]---

Tags: jammy
description: updated
description: updated
Paul White (paulw2u)
affects: ubuntu → linux-hwe-6.2 (Ubuntu)
tags: added: jammy
Revision history for this message
Juerg Haefliger (juergh) wrote :

Maybe related: bug 2031537.

Revision history for this message
LittleBigBrain (braingateway) wrote :

I am not sure, because bug 2031537's lspci does not show `!!! Unknown header type 7f`

Revision history for this message
Juerg Haefliger (juergh) wrote :

Please provide logs from when the problem occurs: apport-collect 2034916

Changed in linux-hwe-6.2 (Ubuntu):
status: New → Incomplete
Revision history for this message
LittleBigBrain (braingateway) wrote :

unfortunately I don't have that right now. And when network is dead apport-collect won't work. But I added some trace info from dmseg, hope that will help a little.

description: updated
description: updated
Revision history for this message
Juerg Haefliger (juergh) wrote (last edit ):

Same NIC and same symptom:
> 03:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168]
> Sep 08 10:51:39 : NETDEV WATCHDOG: eno2 (r8169): transmit queue 0 timed out

Looks to be the same issue.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.