Timestamp when failure occurred:
It occurred at about2023-11-28 06:38.
Is the issue intermittent or is it 100% Reproducible? Frequency of Occurrence?
It occurred once.
Steps to Reproduce OR Events leading up to failure:
Impact of Failure:
It caused pod not able to startup.
Did the system automatically recover?
No.
Log/File location:
Triage:
Was the issue reproduced internally?
No.
Time-line based on log analysis:
The pod 'csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4' was created at about 2023-11-28 06:38, but if failed to set up VF interface with "device or resource busy" error.
2023-11-28T06:38:47Z csblrlabdevcc2ec17-5017cuup1-965cc6598 ReplicaSet Created pod: csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 SuccessfulCreate Normal
2023-11-28T06:38:47Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Successfully assigned csblrlabdevcc2ec17-5017/csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 to controller-0 Scheduled Normal
2023-11-28T06:38:48Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Add netf1u [2001:4000:aa:93::2e/64] from default/sriov-f1u-dpdk-dynamic AddedInterface Normal
2023-11-28T06:38:48Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Add eth0 [172.16.192.110/32] from chain AddedInterface Normal
2023-11-28T06:38:48Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Add netcore [] from default/sriov-core-dpdk AddedInterface Normal
2023-11-28T06:38:48Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Add netcore1 [] from default/sriov-core1-dpdk AddedInterface Normal
2023-11-28T06:38:52Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8e6188a1d5d154c68d581e163b91ff8acc7d22d29cf7cdf429ae334a86c06bb7": plugin type="multus" name="multus-cni-network" failed (add): [csblrlabdevcc2ec17-5017/csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4/:sriov-e1c-dynamic]: error adding container to network "sriov-e1c-dynamic": failed to set up pod interface "nete1c" from the device "ens1f0": error setting up interface in container namespace: "error bringing interface up in container ns: \"device or resource busy\"" FailedCreatePodSandBox Warning
2023-11-28T06:38:53Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Add eth0 [172.16.192.109/32] from chain AddedInterface Normal
2023-11-28T06:38:53Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Add netcore [] from default/sriov-core-dpdk AddedInterface Normal
2023-11-28T06:38:53Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Add netcore1 [] from default/sriov-core1-dpdk AddedInterface Normal
2023-11-28T06:38:53Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Add netf1u [2001:4000:aa:93::2e/64] from default/sriov-f1u-dpdk-dynamic AddedInterface Normal
2023-11-28T06:38:57Z csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4 Pod Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "19e763ec4a673124f1365a0d6299e2fe84d22532ea430be581a4bb01eccc1306": plugin type="multus" name="multus-cni-network" failed (add): [csblrlabdevcc2ec17-5017/csblrlabdevcc2ec17-5017cuup1-965cc6598-b5rg4/:sriov-e1c-dynamic]: error adding container to network "sriov-e1c-dynamic": failed to set up pod interface "nete1c" from the device "ens1f0": error setting up interface in container namespace: "error bringing interface up in container ns: \"device or resource busy\"" FailedCreatePodSandBox Warning
......
The physical interface is ens1f0 which is E810 NIC. And I see kernel printed 'iavf 0000:8a:02.0: Never saw reset' message repeatedly at that time.
--------------------------------------------------------------------
Tue Nov 28 06:44:15 UTC 2023 : : ethtool -i ens1f0
--------------------------------------------------------------------
driver: ice
version: 1.9.11
firmware-version: 3.10 0x8000ad69 1.3106.0
expansion-rom-version:
bus-info: 0000:8a:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
8a:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller E810-C for QSFP [8086:1592] (rev 02)
2023-11-28T06:38:48.078 controller-0 kernel: info [901389.921256] ice 0000:8a:00.0: Setting VLAN 93, QoS 0, TPID 0x8100 on VF 28
2023-11-28T06:38:48.156 controller-0 kernel: info [901390.001943] iavf 0000:8a:02.0 temp_49: renamed from nete1c
2023-11-28T06:38:48.218 controller-0 kernel: info [901390.063910] iavf 0000:8a:02.0 nete1c: renamed from temp_49
2023-11-28T06:38:52.083 controller-0 kernel: info [901393.929164] iavf 0000:8a:02.0: Never saw reset
2023-11-28T06:38:52.125 controller-0 kernel: info [901393.971110] ice 0000:8a:00.0: Clearing port VLAN on VF 28
2023-11-28T06:38:53.120 controller-0 kernel: info [901394.965999] IPv6: ADDRCONF(NETDEV_CHANGE): cali44564bbb5a9: link becomes ready
2023-11-28T06:38:53.205 controller-0 kernel: info [901395.050824] ixgbe 0000:01:00.1: Setting VLAN 96, QOS 0x0 on VF 25
2023-11-28T06:38:53.209 controller-0 kernel: info [901395.054924] ixgbe 0000:01:00.1: Setting VLAN 96, QOS 0x0 on VF 24
2023-11-28T06:38:53.212 controller-0 kernel: info [901395.058439] ice 0000:8a:00.0: Setting VLAN 93, QoS 0, TPID 0x8100 on VF 28
2023-11-28T06:38:53.291 controller-0 kernel: info [901395.136650] iavf 0000:8a:02.0 temp_49: renamed from nete1c
2023-11-28T06:38:53.336 controller-0 kernel: info [901395.181662] iavf 0000:8a:02.0 nete1c: renamed from temp_49
2023-11-28T06:38:56.655 controller-0 kernel: info [901398.501216] iavf 0000:8a:02.0: Never saw reset
2023-11-28T06:38:56.762 controller-0 kernel: info [901398.608537] ice 0000:8a:00.0: Clearing port VLAN on VF 28
2023-11-28T06:38:58.213 controller-0 kernel: info [901400.059428] IPv6: ADDRCONF(NETDEV_CHANGE): cali44564bbb5a9: link becomes ready
2023-11-28T06:38:58.213 controller-0 kernel: info [901400.059525] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
2023-11-28T06:38:58.335 controller-0 kernel: info [901400.180888] ixgbe 0000:01:00.1: Setting VLAN 96, QOS 0x0 on VF 24
2023-11-28T06:38:58.341 controller-0 kernel: info [901400.186931] ixgbe 0000:01:00.1: Setting VLAN 96, QOS 0x0 on VF 25
2023-11-28T06:38:58.344 controller-0 kernel: info [901400.190446] ice 0000:8a:00.0: Setting VLAN 93, QoS 0, TPID 0x8100 on VF 28
2023-11-28T06:38:58.419 controller-0 kernel: info [901400.264792] iavf 0000:8a:02.0 temp_49: renamed from nete1c
2023-11-28T06:38:58.459 controller-0 kernel: info [901400.304810] iavf 0000:8a:02.0 nete1c: renamed from temp_49
2023-11-28T06:39:01.227 controller-0 kernel: info [901403.073478] iavf 0000:8a:02.0: Never saw reset
The 'iavf 0000:8a:02.0: Never saw reset' message occured at about 2023-11-27 15:36 at the first time, then the "device or resource busy" error occurred after a while. It looks like a NIC driver issue same as CGTS-54060, please Eng check it.
2023-11-27T15:36:37.494 controller-0 kernel: info [847259.529235] iavf 0000:8a:02.0: Never saw reset
2023-11-27T15:37:04.483 controller-0 containerd[2670404]: info time="2023-11-27T15:37:04.483227587Z" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:csblrlabdevcc2ec17-5017cuup1-6f6bb67578-hzhpg,Uid:ca3b75e9-157d-4ea9-a10d-3dbc1dcd4a56,Namespace:csblrlabdevcc2ec17-5017,Attempt:0,} failed, error" error="failed to setup network for sandbox \"416a904855de8fca43b649415881f6ce94f95eb5b290fd2dd2daf99536b058da\": plugin type=\"multus\" name=\"multus-cni-network\" failed (add): [csblrlabdevcc2ec17-5017/csblrlabdevcc2ec17-5017cuup1-6f6bb67578-hzhpg/:sriov-e1c-dynamic]: error adding container to network \"sriov-e1c-dynamic\": failed to set up pod interface \"nete1c\" from the device \"ens1f0\": error setting up interface in container namespace: \"error bringing interface up in container ns: \\\"device or resource busy\\\"\""
Key failure logs:
Summary of triage:
Fix proposed to branch: master /review. opendev. org/c/starlingx /kernel/ +/914047
Review: https:/