[roce-1126]RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver

Bug #1853989 reported by Fred Kimmy on 2019-11-26
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kunpeng920
Undecided
Ike Panhc
Ubuntu-18.04
Undecided
Ike Panhc
Ubuntu-18.04-hwe
Undecided
Unassigned
Ubuntu-19.04
Undecided
Ike Panhc
Ubuntu-19.10
Undecided
Ike Panhc
Ubuntu-20.04
Undecided
Unassigned
Upstream-kernel
Undecided
Unassigned
linux (Ubuntu)
Undecided
Unassigned
Bionic
Undecided
Ike Panhc
Disco
Undecided
Ike Panhc
Eoan
Undecided
Ike Panhc

Bug Description

[Impact]
  KASAN reports slab-out-of-bounds in RDMA/hns driver

[Testcase]
  Enable KASAN and modprobe RDMA/hns driver

[Regression Risk]
  Only RDMA/hns driver modified. lowest risk to other drivers/platforms

[Bug Description]
KASAN: slab-out-of-bounds in hns_roce_table_mhop_put+0x584/0x828
    [hns_roce]
    Read of size 8 at addr ffff802185e08300 by task rmmod/270

    Call trace:
    dump_backtrace+0x0/0x1e8
    show_stack+0x14/0x20
    dump_stack+0xc4/0xfc
    print_address_description+0x60/0x270
    __kasan_report+0x164/0x1b8
    kasan_report+0xc/0x18
    __asan_load8+0x84/0xa8
    hns_roce_table_mhop_put+0x584/0x828 [hns_roce]
    hns_roce_table_put+0x174/0x1a0 [hns_roce]
    hns_roce_mr_free+0x124/0x210 [hns_roce]
    hns_roce_dereg_mr+0x90/0xb8 [hns_roce]
    ib_dealloc_pd_user+0x60/0xf0
    ib_mad_port_close+0x128/0x1d8
    ib_mad_remove_device+0x94/0x118
    remove_client_context+0xa0/0xe0
    disable_device+0xfc/0x1c0
    __ib_unregister_device+0x60/0xe0
    ib_unregister_device+0x24/0x38
    hns_roce_exit+0x3c/0x138 [hns_roce]
    __hns_roce_hw_v2_uninit_instance.isra.30+0x28/0x50 [hns_roce_hw_v2]
    hns_roce_hw_v2_uninit_instance+0x44/0x60 [hns_roce_hw_v2]
    hclge_uninit_client_instance+0x15c/0x238 [hclge]
    hnae3_uninit_client_instance+0x84/0xa8 [hnae3]
    hnae3_unregister_client+0x84/0x158 [hnae3]
    hns_roce_hw_v2_exit+0x14/0x20 [hns_roce_hw_v2]
    __arm64_sys_delete_module+0x20c/0x308
    el0_svc_handler+0xbc/0x210
    el0_svc+0x8/0xc

    Allocated by task 255:
    __kasan_kmalloc.isra.0+0xd0/0x180
    kasan_kmalloc+0xc/0x18
    __kmalloc+0x16c/0x328
    hns_roce_init_hem_table+0x20c/0x428 [hns_roce]
    hns_roce_init+0x214/0xfe0 [hns_roce]
    __hns_roce_hw_v2_init_instance+0x284/0x330 [hns_roce_hw_v2]
    hns_roce_hw_v2_init_instance+0xd0/0x1b8 [hns_roce_hw_v2]
    hclge_init_roce_client_instance+0x180/0x310 [hclge]
    hclge_init_client_instance+0xcc/0x508 [hclge]
    hnae3_init_client_instance.part.3+0x3c/0x80 [hnae3]
    hnae3_register_client+0x134/0x1a8 [hnae3]
    0xffff200009c00014
    do_one_initcall+0x9c/0x3e0
    do_init_module+0xd4/0x2d8
    load_module+0x3284/0x3690
    __se_sys_init_module+0x274/0x308
    __arm64_sys_init_module+0x40/0x50
    el0_svc_handler+0xbc/0x210
    el0_svc+0x8/0xc

    Freed by task 0:
    (stack is not available)

    The buggy address belongs to the object at ffff802185e06300
    which belongs to the cache kmalloc-8k of size 8192
    The buggy address is located 0 bytes to the right of
    8192-byte region [ffff802185e06300, ffff802185e08300)
    The buggy address belongs to the page:
    page:ffff7fe008617800 refcount:1 mapcount:0 mapping:ffff802340020e00 index:0x0
    compound_mapcount: 0
    flags: 0x5fffe00000010200(slab|head)
    raw: 5fffe00000010200 dead000000000100 dead000000000200 ffff802340020e00
    raw: 0000000000000000 00000000803e003e 00000001ffffffff 0000000000000000
    page dumped because: kasan: bad access detected
     Memory state around the buggy address:
    ffff802185e08200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ffff802185e08280: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    >ffff802185e08300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ^
    ffff802185e08380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
    ffff802185e08400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ==================================================================
    Disabling lock debugging due to kernel taint

[Steps to Reproduce]
Enable KASAN and configure PAGE_SIZE to 64K, insmod hns roce driver and then rmmod it.

[Actual Results]
Call trace because of slab-out-of-bound.

[Expected Results]
Success

[Reproducibility]
Inevitably

[Additional information]
Hardware: D06 CS
Firmware: NA
Kernel: NA

[Resolution]
Not configure eq->next when number of eq_buf is 1 in eq_mhop_alloc().

RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver
RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1853989

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu Bionic):
status: New → Incomplete
Changed in linux (Ubuntu Disco):
status: New → Incomplete
Changed in linux (Ubuntu Eoan):
status: New → Incomplete
Ike Panhc (ikepanhc) wrote :
Changed in linux (Ubuntu Eoan):
assignee: nobody → Ike Panhc (ikepanhc)
status: Incomplete → In Progress
Changed in linux (Ubuntu Disco):
assignee: nobody → Ike Panhc (ikepanhc)
status: Incomplete → In Progress
Changed in linux (Ubuntu Bionic):
assignee: nobody → Ike Panhc (ikepanhc)
status: Incomplete → In Progress
Changed in linux (Ubuntu):
status: Incomplete → Invalid
Changed in kunpeng920:
status: New → In Progress
assignee: nobody → Ike Panhc (ikepanhc)
Ike Panhc (ikepanhc) on 2019-12-20
description: updated
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Disco):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Eoan):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-disco' to 'verification-done-disco'. If the problem still exists, change the tag 'verification-needed-disco' to 'verification-failed-disco'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-disco
Ike Panhc (ikepanhc) wrote :

Thanks. Ubuntu-5.0.0-40.44 works for me.

tags: added: verification-done-disco
removed: verification-needed-disco
Launchpad Janitor (janitor) wrote :
Download full text (22.6 KiB)

This bug was fixed in the package linux - 5.0.0-40.44

---------------
linux (5.0.0-40.44) disco; urgency=medium

  * disco/linux: 5.0.0-40.44 -proposed tracker (LP: #1859724)

  * use-after-free in i915_ppgtt_close (LP: #1859522) // CVE-2020-7053
    - SAUCE: drm/i915: Fix use-after-free when destroying GEM context

  * CVE-2019-14615
    - drm/i915/gen9: Clear residual context state on context switch

  * System hang with kernel traces while entering reboot process on a Disco
    ARM64 moonshot node (LP: #1859582)
    - Revert "RDMA/cm: Fix memory leak in cm_add/remove_one"

linux (5.0.0-39.43) disco; urgency=medium

  * disco/linux: 5.0.0-39.43 -proposed tracker (LP: #1858547)

  * [Regression] usb usb2-port2: Cannot enable. Maybe the USB cable is bad?
    (LP: #1856608)
    - SAUCE: Revert "usb: handle warm-reset port requests on hub resume"

  * PAN is broken for execute-only user mappings on ARMv8 (LP: #1858815)
    - arm64: Revert support for execute-only user mappings

  * Fix unusable USB hub on Dell TB16 after S3 (LP: #1855312)
    - SAUCE: USB: core: Make port power cycle a seperate helper function
    - SAUCE: USB: core: Attempt power cycle port when it's in eSS.Disabled state

  * [sas-1126]scsi: hisi_sas: Fix out of bound at debug_I_T_nexus_reset()
    (LP: #1853992)
    - scsi: hisi_sas: Fix out of bound at debug_I_T_nexus_reset()

  * [sas-1126]scsi: hisi_sas: Assign NCQ tag for all NCQ commands (LP: #1853995)
    - scsi: hisi_sas: Assign NCQ tag for all NCQ commands

  * [sas-1126]scsi: hisi_sas: Fix the conflict between device gone and host
    reset (LP: #1853997)
    - scsi: hisi_sas: Fix the conflict between device gone and host reset

  * scsi: hisi_sas: Check sas_port before using it (LP: #1855952)
    - scsi: hisi_sas: Check sas_port before using it

  * CVE-2019-18885
    - btrfs: refactor btrfs_find_device() take fs_devices as argument
    - btrfs: merge btrfs_find_device and find_device

  * Integrate Intel SGX driver into linux-azure (LP: #1844245)
    - [Packaging] Add systemd service to load intel_sgx

  * [SRU][B/OEM-B/OEM-OSP1/D/E/F] Add LG I2C touchscreen multitouch support
    (LP: #1857541)
    - SAUCE: HID: multitouch: Add LG MELF0410 I2C touchscreen support

  * cifs: DFS Caching feature causing problems traversing multi-tier DFS setups
    (LP: #1854887)
    - cifs: Fix retrieval of DFS referrals in cifs_mount()

  * qede driver causes 100% CPU load (LP: #1855409)
    - qede: Handle infinite driver spinning for Tx timestamp.

  * [roce-1126]RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver
    (LP: #1853989)
    - RDMA/hns: Bugfix for slab-out-of-bounds when unloading hip08 driver
    - RDMA/hns: bugfix for slab-out-of-bounds when loading hip08 driver

  * [roce-1126]RDMA/hns: Fixs hw access invalid dma memory error (LP: #1853990)
    - RDMA/hns: Fixs hw access invalid dma memory error

  * [hns-1126]net: hns3: revert to old channel when setting new channel num fail
    (LP: #1853983)
    - net: hns3: revert to old channel when setting new channel num fail

  * [hns-1126]net: hns3: fix port setting handle for fibre port
    (LP: #1853984)
    - net: hns3: fix port setting handle for fibre...

Changed in linux (Ubuntu Disco):
status: Fix Committed → Fix Released
tags: added: verification-needed-bionic

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-eoan' to 'verification-done-eoan'. If the problem still exists, change the tag 'verification-needed-eoan' to 'verification-failed-eoan'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-eoan
Ike Panhc (ikepanhc) wrote :

Both 4.15.0-87.87 and 5.3.0-40.32 work fine for me. Thanks.

tags: added: verification-done-bionic verification-done-eoan
removed: verification-needed-bionic verification-needed-eoan
Changed in kunpeng920:
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (78.1 KiB)

This bug was fixed in the package linux - 5.3.0-40.32

---------------
linux (5.3.0-40.32) eoan; urgency=medium

  * eoan/linux: 5.3.0-40.32 -proposed tracker (LP: #1861214)

  * No sof soundcard for 'ASoC: CODEC DAI intel-hdmi-hifi1 not registered' after
    modprobe sof (LP: #1860248)
    - ASoC: SOF: Intel: fix HDA codec driver probe with multiple controllers

  * ocfs2-tools is causing kernel panics in Ubuntu Focal (Ubuntu-5.4.0-9.12)
    (LP: #1852122)
    - ocfs2: fix the crash due to call ocfs2_get_dlm_debug once less

  * QAT drivers for C3XXX and C62X not included as modules (LP: #1845959)
    - [Config] CRYPTO_DEV_QAT_C3XXX=m, CRYPTO_DEV_QAT_C62X=m and
      CRYPTO_DEV_QAT_DH895xCC=m

  * Eoan update: upstream stable patchset 2020-01-24 (LP: #1860816)
    - scsi: lpfc: Fix discovery failures when target device connectivity bounces
    - scsi: mpt3sas: Fix clear pending bit in ioctl status
    - scsi: lpfc: Fix locking on mailbox command completion
    - Input: atmel_mxt_ts - disable IRQ across suspend
    - f2fs: fix to update time in lazytime mode
    - iommu: rockchip: Free domain on .domain_free
    - iommu/tegra-smmu: Fix page tables in > 4 GiB memory
    - dmaengine: xilinx_dma: Clear desc_pendingcount in xilinx_dma_reset
    - scsi: target: compare full CHAP_A Algorithm strings
    - scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
    - scsi: csiostor: Don't enable IRQs too early
    - scsi: hisi_sas: Replace in_softirq() check in hisi_sas_task_exec()
    - powerpc/pseries: Mark accumulate_stolen_time() as notrace
    - powerpc/pseries: Don't fail hash page table insert for bolted mapping
    - powerpc/tools: Don't quote $objdump in scripts
    - dma-debug: add a schedule point in debug_dma_dump_mappings()
    - leds: lm3692x: Handle failure to probe the regulator
    - clocksource/drivers/asm9260: Add a check for of_clk_get
    - clocksource/drivers/timer-of: Use unique device name instead of timer
    - powerpc/security/book3s64: Report L1TF status in sysfs
    - powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
    - ext4: update direct I/O read lock pattern for IOCB_NOWAIT
    - ext4: iomap that extends beyond EOF should be marked dirty
    - jbd2: Fix statistics for the number of logged blocks
    - scsi: tracing: Fix handling of TRANSFER LENGTH == 0 for READ(6) and WRITE(6)
    - scsi: lpfc: Fix duplicate unreg_rpi error in port offline flow
    - f2fs: fix to update dir's i_pino during cross_rename
    - clk: qcom: Allow constant ratio freq tables for rcg
    - clk: clk-gpio: propagate rate change to parent
    - irqchip/irq-bcm7038-l1: Enable parent IRQ if necessary
    - irqchip: ingenic: Error out if IRQ domain creation failed
    - fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
    - scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences
    - PCI: rpaphp: Fix up pointer to first drc-info entry
    - scsi: ufs: fix potential bug which ends in system hang
    - powerpc/pseries/cmm: Implement release() function for sysfs device
    - PCI: rpaphp: Don't rely on firmware feature to imply drc-info support
    - PCI: rpaphp: Annotate and corr...

Changed in linux (Ubuntu Eoan):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (79.8 KiB)

This bug was fixed in the package linux - 4.15.0-88.88

---------------
linux (4.15.0-88.88) bionic; urgency=medium

  * bionic/linux: 4.15.0-88.88 -proposed tracker (LP: #1862824)

  * Segmentation fault (kernel oops) with memory-hotplug in
    ubuntu_kernel_selftests on Bionic kernel (LP: #1862312)
    - Revert "mm/memory_hotplug: fix online/offline_pages called w.o.
      mem_hotplug_lock"
    - mm/memory_hotplug: fix online/offline_pages called w.o. mem_hotplug_lock

linux (4.15.0-87.87) bionic; urgency=medium

  * bionic/linux: 4.15.0-87.87 -proposed tracker (LP: #1861165)

  * Bionic update: upstream stable patchset 2020-01-22 (LP: #1860602)
    - scsi: lpfc: Fix discovery failures when target device connectivity bounces
    - scsi: mpt3sas: Fix clear pending bit in ioctl status
    - scsi: lpfc: Fix locking on mailbox command completion
    - Input: atmel_mxt_ts - disable IRQ across suspend
    - iommu/tegra-smmu: Fix page tables in > 4 GiB memory
    - scsi: target: compare full CHAP_A Algorithm strings
    - scsi: lpfc: Fix SLI3 hba in loop mode not discovering devices
    - scsi: csiostor: Don't enable IRQs too early
    - powerpc/pseries: Mark accumulate_stolen_time() as notrace
    - powerpc/pseries: Don't fail hash page table insert for bolted mapping
    - powerpc/tools: Don't quote $objdump in scripts
    - dma-debug: add a schedule point in debug_dma_dump_mappings()
    - clocksource/drivers/asm9260: Add a check for of_clk_get
    - powerpc/security/book3s64: Report L1TF status in sysfs
    - powerpc/book3s64/hash: Add cond_resched to avoid soft lockup warning
    - ext4: update direct I/O read lock pattern for IOCB_NOWAIT
    - jbd2: Fix statistics for the number of logged blocks
    - scsi: tracing: Fix handling of TRANSFER LENGTH == 0 for READ(6) and WRITE(6)
    - scsi: lpfc: Fix duplicate unreg_rpi error in port offline flow
    - f2fs: fix to update dir's i_pino during cross_rename
    - clk: qcom: Allow constant ratio freq tables for rcg
    - irqchip/irq-bcm7038-l1: Enable parent IRQ if necessary
    - irqchip: ingenic: Error out if IRQ domain creation failed
    - fs/quota: handle overflows of sysctl fs.quota.* and report as unsigned long
    - scsi: lpfc: fix: Coverity: lpfc_cmpl_els_rsp(): Null pointer dereferences
    - scsi: ufs: fix potential bug which ends in system hang
    - powerpc/pseries/cmm: Implement release() function for sysfs device
    - powerpc/security: Fix wrong message when RFI Flush is disable
    - scsi: atari_scsi: sun3_scsi: Set sg_tablesize to 1 instead of SG_NONE
    - clk: pxa: fix one of the pxa RTC clocks
    - bcache: at least try to shrink 1 node in bch_mca_scan()
    - HID: logitech-hidpp: Silence intermittent get_battery_capacity errors
    - libnvdimm/btt: fix variable 'rc' set but not used
    - HID: Improve Windows Precision Touchpad detection.
    - scsi: pm80xx: Fix for SATA device discovery
    - scsi: ufs: Fix error handing during hibern8 enter
    - scsi: scsi_debug: num_tgts must be >= 0
    - scsi: NCR5380: Add disconnect_mask module parameter
    - scsi: iscsi: Don't send data to unbound connection
    - scsi: target: iscsi: Wait for all commands to finish before freeing a
...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers