Activity log for bug #1944586

Date Who What changed Old value New value Message
2021-09-22 15:42:52 Eric Desrochers bug added bug
2021-09-22 15:43:21 Eric Desrochers tags seg sts
2021-09-22 15:43:44 Eric Desrochers description It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]---
2021-09-22 15:44:18 Eric Desrochers description It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html
2021-09-22 15:46:25 Eric Desrochers summary kernel bug found when disconnecting one fiber channel interface on Cisco Chassis with fnic DRV_VERSION below 1.6.0.47 kernel bug found when disconnecting one fiber channel interface on Cisco Chassis with fnic DRV_VERSION " 1.6.0.47"
2021-09-22 15:49:19 Eric Desrochers description It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] [Where problems could occur] [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html
2021-09-22 16:00:08 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2021-09-22 16:00:10 Ubuntu Kernel Bot tags seg sts focal seg sts
2021-09-22 18:38:13 Eric Desrochers summary kernel bug found when disconnecting one fiber channel interface on Cisco Chassis with fnic DRV_VERSION " 1.6.0.47" kernel bug found when disconnecting one fiber channel interface on Cisco Chassis with fnic DRV_VERSION "1.6.0.47"
2021-09-27 13:26:19 Eric Desrochers description [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] [Where problems could occur] [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] <sbparke> ??? [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html
2021-09-27 13:30:14 Eric Desrochers description [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] <sbparke> ??? [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] <sbparke> ??? [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. Note that Field Engineer and I did test the patch on Cisco UCS hw and the patch didn't reproduce the problem. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html
2021-09-27 13:31:45 Eric Desrochers linux (Ubuntu): status Incomplete In Progress
2021-09-27 13:31:51 Eric Desrochers nominated for series Ubuntu Focal
2021-09-27 13:31:51 Eric Desrochers bug task added linux (Ubuntu Focal)
2021-09-27 13:31:57 Eric Desrochers linux (Ubuntu): status In Progress Fix Released
2021-09-27 13:32:02 Eric Desrochers linux (Ubuntu Focal): status New In Progress
2021-09-27 13:32:06 Eric Desrochers linux (Ubuntu Focal): assignee Eric Desrochers (slashd)
2021-09-27 13:32:12 Eric Desrochers linux (Ubuntu Focal): importance Undecided Critical
2021-09-27 13:32:15 Eric Desrochers linux (Ubuntu Focal): importance Critical High
2021-09-27 13:33:10 Eric Desrochers description [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] <sbparke> ??? [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. Note that Field Engineer and I did test the patch on Cisco UCS hw and the patch didn't reproduce the problem. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] <sbparke> ??? [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. Note that Field Engineer and I did test the patch on Cisco UCS hw and the patch didn't reproduce the problem nor produce observable subsequent issues/regressions. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html
2021-09-27 14:43:18 Steven Parker description [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] <sbparke> ??? [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. Note that Field Engineer and I did test the patch on Cisco UCS hw and the patch didn't reproduce the problem nor produce observable subsequent issues/regressions. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] There are two ways to replicate the bug: Specific hardware: Chassis Cisco UCS 5108 AC2 Chassis Blades Cisco UCS B200 IO module Cisco UCS 2408 Server loads - Ubuntu 20.04 cluster running deployed maas, juju and openstack. 1) Reset a single chassis I/O module or fail over a fabric interconnect (FI) for all chassis in the cluster. We have performed both tests. Fail over of single chassis I/O module results in at least one node locking up. After patching the kernel multiple tests by resetting the I/O module did not result in further failures. Or chassis hold 8 blades. 2) The larger test reboots the actual FI (fabric interconnect) for one channel serving 3 chassis. What we call a fiber channel fail over test. This test covers 3 chassis with 8 blades each. In this test at least one and often as many as 4 nodes will lock up. After loading the patched kernel we ran this test 3 times with no failures. [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. Note that Field Engineer and I did test the patch on Cisco UCS hw and the patch didn't reproduce the problem nor produce observable subsequent issues/regressions. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html
2021-09-27 14:44:32 Steven Parker description [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] There are two ways to replicate the bug: Specific hardware: Chassis Cisco UCS 5108 AC2 Chassis Blades Cisco UCS B200 IO module Cisco UCS 2408 Server loads - Ubuntu 20.04 cluster running deployed maas, juju and openstack. 1) Reset a single chassis I/O module or fail over a fabric interconnect (FI) for all chassis in the cluster. We have performed both tests. Fail over of single chassis I/O module results in at least one node locking up. After patching the kernel multiple tests by resetting the I/O module did not result in further failures. Or chassis hold 8 blades. 2) The larger test reboots the actual FI (fabric interconnect) for one channel serving 3 chassis. What we call a fiber channel fail over test. This test covers 3 chassis with 8 blades each. In this test at least one and often as many as 4 nodes will lock up. After loading the patched kernel we ran this test 3 times with no failures. [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. Note that Field Engineer and I did test the patch on Cisco UCS hw and the patch didn't reproduce the problem nor produce observable subsequent issues/regressions. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] There are two ways to replicate the bug: Reset a single chassis I/O module or fail over a fabric interconnect (FI) for all chassis in the cluster. We have performed both tests. Specific hardware:    Chassis Cisco UCS 5108 AC2 Chassis    Blades Cisco UCS B200    IO module Cisco UCS 2408 Server loads - Ubuntu 20.04 cluster running deployed maas, juju and openstack. Tests 1) Fail over of single chassis I/O module results in at least one node locking up. After patching the kernel multiple tests by resetting the I/O module did not result in further failures. Or chassis hold 8 blades. 2) The larger test reboots the actual FI (fabric interconnect) for one channel serving 3 chassis. What we call a fiber channel fail over test. This test covers 3 chassis with 8 blades each. In this test at least one and often as many as 4 nodes will lock up. After loading the patched kernel we ran this test 3 times with no failures. [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. Note that Field Engineer and I did test the patch on Cisco UCS hw and the patch didn't reproduce the problem nor produce observable subsequent issues/regressions. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html
2021-09-27 14:45:49 Steven Parker description [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] There are two ways to replicate the bug: Reset a single chassis I/O module or fail over a fabric interconnect (FI) for all chassis in the cluster. We have performed both tests. Specific hardware:    Chassis Cisco UCS 5108 AC2 Chassis    Blades Cisco UCS B200    IO module Cisco UCS 2408 Server loads - Ubuntu 20.04 cluster running deployed maas, juju and openstack. Tests 1) Fail over of single chassis I/O module results in at least one node locking up. After patching the kernel multiple tests by resetting the I/O module did not result in further failures. Or chassis hold 8 blades. 2) The larger test reboots the actual FI (fabric interconnect) for one channel serving 3 chassis. What we call a fiber channel fail over test. This test covers 3 chassis with 8 blades each. In this test at least one and often as many as 4 nodes will lock up. After loading the patched kernel we ran this test 3 times with no failures. [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. Note that Field Engineer and I did test the patch on Cisco UCS hw and the patch didn't reproduce the problem nor produce observable subsequent issues/regressions. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html [Impact] It has been brought to my attention the following: " We have been experiencing node lockups and degradation when testing fiber channel fail over for multi-path PURESTORAGE drives. Testing usually consists of either failing over the fabric or the local I/O module for the Cisco chassis which houses a number of individual blades. After rebooting a local Chassis I/O module we see commands like multipath -ll hanging. Resetting the blades individual fiber channel interface results in the following messages. " 6051160.241383] rport-9:0-1: blocked FC remote port time out: removing target and saving binding [6051160.252901] BUG: kernel NULL pointer dereference, address: 0000000000000040 [6051160.262267] #PF: supervisor read access in kernel mode [6051160.269314] #PF: error_code(0x0000) - not-present page [6051160.276016] PGD 0 P4D 0 [6051160.279807] Oops: 0000 [#1] SMP NOPTI [6051160.284642] CPU: 10 PID: 49346 Comm: kworker/10:2 Tainted: P O 5.4.0-77-generic #86-Ubuntu [6051160.295967] Hardware name: Cisco Systems Inc UCSB-B200-M5/UCSB-B200-M5, BIOS B200M5.4.1.1d.0.0609200543 06/09/2020 [6051160.308199] Workqueue: fc_dl_9 fc_timeout_deleted_rport [scsi_transport_fc] [6051160.316640] RIP: 0010:fnic_terminate_rport_io+0x10f/0x510 [fnic] [6051160.324050] Code: 48 89 c3 48 85 c0 0f 84 7b 02 00 00 48 05 20 01 00 00 48 89 45 b0 0f 84 6b 02 00 00 48 8b 83 58 01 00 00 48 8b 80 b8 01 00 00 <48> 8b 78 40 e8 68 e6 06 00 85 c0 0f 84 4c 02 00 00 48 8b 83 58 01 [6051160.346553] RSP: 0018:ffffbc224f297d90 EFLAGS: 00010082 [6051160.353115] RAX: 0000000000000000 RBX: ffff90abdd4c4b00 RCX: ffff90d8ab2c2bb0 [6051160.361983] RDX: ffff90d8b5467400 RSI: 0000000000000000 RDI: ffff90d8ab3b4b40 [6051160.370812] RBP: ffffbc224f297df8 R08: ffff90d8c08978c8 R09: ffff90d8b8850800 [6051160.379518] R10: ffff90d8a59d64c0 R11: 0000000000000001 R12: ffff90d8ab2c31f8 [6051160.388242] R13: 0000000000000000 R14: 0000000000000246 R15: ffff90d8ab2c27b8 [6051160.396953] FS: 0000000000000000(0000) GS:ffff90d8c0880000(0000) knlGS:0000000000000000 [6051160.406838] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [6051160.414168] CR2: 0000000000000040 CR3: 0000000fc1c0a004 CR4: 00000000007626e0 [6051160.423146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [6051160.431884] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [6051160.440615] PKRU: 55555554 [6051160.444337] Call Trace: [6051160.447841] fc_terminate_rport_io+0x56/0x70 [scsi_transport_fc] [6051160.455263] fc_timeout_deleted_rport.cold+0x1bc/0x2c7 [scsi_transport_fc] [6051160.463623] process_one_work+0x1eb/0x3b0 [6051160.468784] worker_thread+0x4d/0x400 [6051160.473660] kthread+0x104/0x140 [6051160.478102] ? process_one_work+0x3b0/0x3b0 [6051160.483439] ? kthread_park+0x90/0x90 [6051160.488213] ret_from_fork+0x1f/0x40 [6051160.492901] Modules linked in: dm_service_time zfs(PO) zunicode(PO) zlua(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_nat iptable_raw iptable_mangle iptable_nat nf_nat vhost_vsock vmw_vsock_virtio_transport_common vsock unix_diag nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 vhost_net vhost tap 8021q garp mrp bluetooth ecdh_generic ecc tcp_diag inet_diag sctp nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge stp llc nls_iso8859_1 dm_queue_length dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp ipmi_ssif coretemp kvm_intel kvm rapl input_leds joydev intel_cstate mei_me ioatdma mei dca ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor [6051160.492928] async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear fnic mgag200 drm_vram_helper i2c_algo_bit ttm drm_kms_helper crct10dif_pclmul syscopyarea hid_generic crc32_pclmul libfcoe sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops crypto_simd libfc usbhid cryptd scsi_transport_fc hid drm glue_helper enic ahci lpc_ich libahci wmi [6051160.632623] CR2: 0000000000000040 [6051160.637043] ---[ end trace 236e6f4850146477 ]--- [Test Plan] There are two ways to replicate the bug: Reset a single chassis I/O module or fail over a fabric interconnect (FI) for all chassis in the cluster. We have performed both tests. Specific hardware:    Chassis Cisco UCS 5108 AC2 Chassis    Blades Cisco UCS B200    IO module Cisco UCS 2408 Server loads - Ubuntu 20.04 cluster running deployed maas, juju and openstack. Tests 1) Fail over of single chassis I/O module results in at least one node locking up. After patching the kernel multiple tests by resetting the I/O module did not result in further failures. Each chassis holds 8 blades. 2) The larger test reboots the actual FI (fabric interconnect) for one channel serving 3 chassis. What we call a fiber channel fail over test. This test covers 3 chassis with 8 blades each. In this test at least one and often as many as 4 nodes will lock up. After loading the patched kernel we ran this test 3 times with no failures. [Where problems could occur] Cisco "fNIC" driver enables FCoE support for the Cisco UCS Virtual Interface Card family of products. If a problem arise it would be limited to these VIC which are specially designed for Cisco UCS blade and rack servers and possibly command to terminate I/O in any case at worst case (again only on Cisco UCS hw family. Note that Field Engineer and I did test the patch on Cisco UCS hw and the patch didn't reproduce the problem nor produce observable subsequent issues/regressions. [Other informations] https://support.oracle.com/knowledge/Oracle%20Linux%20and%20Virtualization/2792832_1.html#FIX https://www.spinics.net/lists/linux-scsi/msg142179.html
2021-10-12 22:58:20 Kelsey Steele linux (Ubuntu Focal): status In Progress Fix Committed
2021-10-19 12:01:27 Ubuntu Kernel Bot tags focal seg sts focal seg sts verification-needed-focal
2021-11-01 12:57:15 Eric Desrochers tags focal seg sts verification-needed-focal focal seg sts verification-done-focal
2021-11-08 14:21:32 Launchpad Janitor linux (Ubuntu Focal): status Fix Committed Fix Released
2021-11-08 14:21:32 Launchpad Janitor cve linked 2019-19449
2021-11-08 14:21:32 Launchpad Janitor cve linked 2020-36385
2021-11-08 14:21:32 Launchpad Janitor cve linked 2021-3428
2021-11-08 14:21:32 Launchpad Janitor cve linked 2021-3759