Activity log for bug #2073092

Date Who What changed Old value New value Message
2024-07-15 08:12:06 gerald.yang bug added bug
2024-07-15 08:12:15 gerald.yang linux (Ubuntu): status New In Progress
2024-07-15 08:12:18 gerald.yang linux (Ubuntu): assignee gerald.yang (gerald-yang-tw)
2024-07-15 08:14:06 gerald.yang summary [SRU] Fix conntrack use-after-free net/sched: Fix conntrack use-after-free
2024-07-15 08:14:22 gerald.yang description [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, the freed conntrack won't be used and the rest of code path will follow the original path. It won't cause other issue. BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, the freed conntrack won't be used and the rest of code path will follow the original path. It won't cause other issue.
2024-07-15 08:14:51 gerald.yang nominated for series Ubuntu Noble
2024-07-15 08:14:51 gerald.yang bug task added linux (Ubuntu Noble)
2024-07-15 08:14:51 gerald.yang nominated for series Ubuntu Oracular
2024-07-15 08:14:51 gerald.yang bug task added linux (Ubuntu Oracular)
2024-07-15 08:14:51 gerald.yang nominated for series Ubuntu Jammy
2024-07-15 08:14:51 gerald.yang bug task added linux (Ubuntu Jammy)
2024-07-15 08:15:01 gerald.yang linux (Ubuntu Jammy): status New In Progress
2024-07-15 08:15:04 gerald.yang linux (Ubuntu Noble): status New In Progress
2024-07-15 08:15:06 gerald.yang linux (Ubuntu Noble): assignee gerald.yang (gerald-yang-tw)
2024-07-15 08:15:08 gerald.yang linux (Ubuntu Jammy): assignee gerald.yang (gerald-yang-tw)
2024-07-15 08:17:49 gerald.yang description BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, the freed conntrack won't be used and the rest of code path will follow the original path. It won't cause other issue. BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, the freed conntrack won't be used and the rest of code path will follow the original path. It won't cause other issue.
2024-07-15 08:18:17 gerald.yang description BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, the freed conntrack won't be used and the rest of code path will follow the original path. It won't cause other issue. BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the conntrack slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, the freed conntrack won't be used and the rest of code path will follow the original path. It won't cause other issue.
2024-07-15 08:56:09 gerald.yang description BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the conntrack slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, the freed conntrack won't be used and the rest of code path will follow the original path. It won't cause other issue. BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev to fix it and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the conntrack slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, call nf_ct_get to get the correct conntrack, the freed conntrack won't be used and the rest of code path will follow the original path. It won't cause other issue.
2024-07-15 08:57:08 gerald.yang description BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev to fix it and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the conntrack slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, call nf_ct_get to get the correct conntrack, the freed conntrack won't be used and the rest of code path will follow the original path. It won't cause other issue. BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev to fix it and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the conntrack slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, call nf_ct_get to get the correct conntrack, the freed conntrack won't be used and the rest of code path will follow the original path. This won't cause other issues.
2024-07-15 10:58:30 gerald.yang linux (Ubuntu Oracular): status In Progress Invalid
2024-07-16 04:54:57 gerald.yang description BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to netdev to fix it and got merged: https://patchwork.kernel.org/project/netdevbpf/patch/20240710053747.13223-1-chengen.du@canonical.com/ Cherry-pick this comment to fix the conntrack slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, call nf_ct_get to get the correct conntrack, the freed conntrack won't be used and the rest of code path will follow the original path. This won't cause other issues. BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to upstream and got merged: commit 26488172b0292bed837b95a006a3f3431d1898c3 Author: Chengen Du <chengen.du@canonical.com> Date: Wed Jul 10 13:37:47 2024 +0800 net/sched: Fix UAF when resolving a clash Cherry-pick this comment to fix the conntrack slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment which is constantly hitting this issue. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, call nf_ct_get to get the correct conntrack, the freed conntrack won't be used and the rest of code path will follow the original path. This won't cause other issues.
2024-07-16 04:56:08 gerald.yang description BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to upstream and got merged: commit 26488172b0292bed837b95a006a3f3431d1898c3 Author: Chengen Du <chengen.du@canonical.com> Date: Wed Jul 10 13:37:47 2024 +0800 net/sched: Fix UAF when resolving a clash Cherry-pick this comment to fix the conntrack slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment which is constantly hitting this issue. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, call nf_ct_get to get the correct conntrack, the freed conntrack won't be used and the rest of code path will follow the original path. This won't cause other issues. BugLink: https://bugs.launchpad.net/bugs/2073092 [Impact] Hit conntrack refcount use-after-free issue: refcount_t: addition on 0; use-after-free. Call Trace: <IRQ> ? show_regs+0x6d/0x80 ? __warn+0x89/0x160 ? refcount_warn_saturate+0x12e/0x150 ? report_bug+0x17e/0x1b0 ? handle_bug+0x46/0x90 ? exc_invalid_op+0x18/0x80 ? asm_exc_invalid_op+0x1b/0x20 ? refcount_warn_saturate+0x12e/0x150 flow_offload_alloc+0xe5/0xf0 [nf_flow_table] tcf_ct_flow_table_process_conn+0xc2/0x1e0 [act_ct] tcf_ct_act+0x6c8/0xaa0 [act_ct] tcf_action_exec+0xbc/0x1a0 fl_classify+0x1f8/0x200 [cls_flower] __tcf_classify+0x169/0x200 tcf_classify+0xff/0x250 sch_handle_ingress.constprop.0+0x11f/0x290 ? srso_alias_return_thunk+0x5/0x7f __netif_receive_skb_core.constprop.0+0x60b/0xd70 ? __udp4_lib_lookup+0x25f/0x2a0 __netif_receive_skb_list_core+0xfd/0x250 netif_receive_skb_list_internal+0x1a3/0x2d0 ? srso_alias_return_thunk+0x5/0x7f ? dev_gro_receive+0x196/0x350 napi_complete_done+0x74/0x1c0 gro_cell_poll+0x7c/0xb0 __napi_poll+0x33/0x1f0 net_rx_action+0x181/0x2e0 __do_softirq+0xdc/0x349 ? srso_alias_return_thunk+0x5/0x7f ? handle_irq_event+0x52/0x80 ? handle_edge_irq+0xda/0x250 __irq_exit_rcu+0x75/0xa0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa4/0xb0 </IRQ> <TASK> [Fix] I enabled kasan and get: BUG: KASAN: slab-use-after-free in tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] Read of size 1 at addr ffff888c07603600 by task handler130/6469 Call Trace: <IRQ> dump_stack_lvl+0x48/0x70 print_address_description.constprop.0+0x33/0x3d0 print_report+0xc0/0x2b0 kasan_report+0xd0/0x120 __asan_load1+0x6c/0x80 tcf_ct_flow_table_process_conn+0x12b/0x380 [act_ct] tcf_ct_act+0x886/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 __irq_exit_rcu+0x82/0xc0 irq_exit_rcu+0xe/0x20 common_interrupt+0xa1/0xb0 </IRQ> Allocated by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_alloc_info+0x1e/0x40 __kasan_krealloc+0x133/0x190 krealloc+0xaa/0x130 nf_ct_ext_add+0xed/0x230 [nf_conntrack] tcf_ct_act+0x1095/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 Freed by task 6469: kasan_save_stack+0x38/0x70 kasan_set_track+0x25/0x40 kasan_save_free_info+0x2b/0x60 ____kasan_slab_free+0x180/0x1f0 __kasan_slab_free+0x12/0x30 slab_free_freelist_hook+0xd2/0x1a0 __kmem_cache_free+0x1a2/0x2f0 kfree+0x78/0x120 nf_conntrack_free+0x74/0x130 [nf_conntrack] nf_ct_destroy+0xb2/0x140 [nf_conntrack] __nf_ct_resolve_clash+0x529/0x5d0 [nf_conntrack] nf_ct_resolve_clash+0xf6/0x490 [nf_conntrack] __nf_conntrack_confirm+0x2c6/0x770 [nf_conntrack] tcf_ct_act+0x12ad/0x1350 [act_ct] tcf_action_exec+0xf8/0x1f0 fl_classify+0x355/0x360 [cls_flower] __tcf_classify+0x1fd/0x330 tcf_classify+0x21c/0x3c0 sch_handle_ingress.constprop.0+0x2c5/0x500 __netif_receive_skb_core.constprop.0+0xb25/0x1510 __netif_receive_skb_list_core+0x220/0x4c0 netif_receive_skb_list_internal+0x446/0x620 napi_complete_done+0x157/0x3d0 gro_cell_poll+0xcf/0x100 __napi_poll+0x65/0x310 net_rx_action+0x30c/0x5c0 __do_softirq+0x14f/0x491 When resolving a clash, a duplicate conntrack will be freed, but in tcf_ct_act, it still uses the freed conntrack instead of the correct conntrack. We sent a patch to upstream to fix it and got merged: commit 26488172b0292bed837b95a006a3f3431d1898c3 Author: Chengen Du <chengen.du@canonical.com> Date: Wed Jul 10 13:37:47 2024 +0800     net/sched: Fix UAF when resolving a clash Cherry-pick this comment to fix the conntrack slab use-after-free issue. [Testcase] Built a test kernel and verified on our environment which is constantly hitting this issue. [Where problems could occur] This patch ensure when a clash happens and the duplicated conntrack is freed, call nf_ct_get to get the correct conntrack, the freed conntrack won't be used and the rest of code path will follow the original path. This won't cause other issues.
2024-07-18 07:03:44 Stefan Bader bug task added linux-hwe-6.8 (Ubuntu)
2024-07-18 07:03:52 Stefan Bader linux-hwe-6.8 (Ubuntu Noble): status New Invalid
2024-07-18 07:03:57 Stefan Bader linux-hwe-6.8 (Ubuntu Oracular): status New Invalid
2024-07-18 07:04:14 Stefan Bader linux-hwe-6.8 (Ubuntu Jammy): importance Undecided High
2024-07-18 07:04:14 Stefan Bader linux-hwe-6.8 (Ubuntu Jammy): status New Triaged
2024-07-18 07:04:58 Stefan Bader linux (Ubuntu Noble): importance Undecided High
2024-07-18 07:05:04 Stefan Bader linux (Ubuntu Jammy): importance Undecided High
2024-07-18 07:40:12 Stefan Bader linux-hwe-6.8 (Ubuntu Jammy): status Triaged Fix Committed
2024-07-19 09:36:54 Stefan Bader linux (Ubuntu Noble): status In Progress Fix Committed
2024-07-19 09:37:39 Stefan Bader linux (Ubuntu Jammy): status In Progress Fix Committed
2024-07-22 23:00:08 Ubuntu Kernel Bot tags kernel-spammed-jammy-linux-hwe-6.8-v2 verification-needed-jammy-linux-hwe-6.8
2024-07-27 03:28:34 gerald.yang tags kernel-spammed-jammy-linux-hwe-6.8-v2 verification-needed-jammy-linux-hwe-6.8 kernel-spammed-jammy-linux-hwe-6.8-v2 verification-done-jammy-linux-hwe-6.8
2024-07-29 08:20:06 gerald.yang linux-hwe-6.8 (Ubuntu Jammy): assignee gerald.yang (gerald-yang-tw)
2024-08-08 15:13:39 Ubuntu Kernel Bot tags kernel-spammed-jammy-linux-hwe-6.8-v2 verification-done-jammy-linux-hwe-6.8 kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux-hwe-6.8 verification-needed-noble-linux
2024-08-09 03:27:23 Ubuntu Kernel Bot tags kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux-hwe-6.8 verification-needed-noble-linux kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-jammy-linux-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux-hwe-6.8 verification-needed-jammy-linux verification-needed-noble-linux
2024-08-09 03:40:33 gerald.yang tags kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-jammy-linux-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux-hwe-6.8 verification-needed-jammy-linux verification-needed-noble-linux kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-jammy-linux-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux verification-done-jammy-linux-hwe-6.8 verification-done-noble-linux
2024-08-13 14:03:05 Launchpad Janitor linux-hwe-6.8 (Ubuntu Jammy): status Fix Committed Fix Released
2024-08-13 14:03:05 Launchpad Janitor cve linked 2024-25742
2024-08-13 14:03:05 Launchpad Janitor cve linked 2024-35984
2024-08-13 14:03:05 Launchpad Janitor cve linked 2024-35990
2024-08-13 14:03:05 Launchpad Janitor cve linked 2024-35992
2024-08-13 14:03:05 Launchpad Janitor cve linked 2024-35997
2024-08-13 14:03:05 Launchpad Janitor cve linked 2024-36008
2024-08-13 14:03:05 Launchpad Janitor cve linked 2024-36016
2024-08-26 08:51:48 Ubuntu Kernel Bot tags kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-jammy-linux-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux verification-done-jammy-linux-hwe-6.8 verification-done-noble-linux kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-jammy-linux-nvidia-tegra-v2 kernel-spammed-jammy-linux-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux verification-done-jammy-linux-hwe-6.8 verification-done-noble-linux verification-needed-jammy-linux-nvidia-tegra
2024-08-26 13:37:51 Ubuntu Kernel Bot tags kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-jammy-linux-nvidia-tegra-v2 kernel-spammed-jammy-linux-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux verification-done-jammy-linux-hwe-6.8 verification-done-noble-linux verification-needed-jammy-linux-nvidia-tegra kernel-spammed-jammy-linux-gcp-6.8-v2 kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-jammy-linux-nvidia-tegra-v2 kernel-spammed-jammy-linux-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux verification-done-jammy-linux-hwe-6.8 verification-done-noble-linux verification-needed-jammy-linux-gcp-6.8 verification-needed-jammy-linux-nvidia-tegra
2024-08-27 18:46:39 Ubuntu Kernel Bot tags kernel-spammed-jammy-linux-gcp-6.8-v2 kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-jammy-linux-nvidia-tegra-v2 kernel-spammed-jammy-linux-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux verification-done-jammy-linux-hwe-6.8 verification-done-noble-linux verification-needed-jammy-linux-gcp-6.8 verification-needed-jammy-linux-nvidia-tegra kernel-spammed-jammy-linux-gcp-6.8-v2 kernel-spammed-jammy-linux-hwe-6.8-v2 kernel-spammed-jammy-linux-nvidia-tegra-igx-v2 kernel-spammed-jammy-linux-nvidia-tegra-v2 kernel-spammed-jammy-linux-v2 kernel-spammed-noble-linux-v2 verification-done-jammy-linux verification-done-jammy-linux-hwe-6.8 verification-done-noble-linux verification-needed-jammy-linux-gcp-6.8 verification-needed-jammy-linux-nvidia-tegra verification-needed-jammy-linux-nvidia-tegra-igx