system crash when removing ipmi_msghandler module

Bug #1950666 reported by Ioanna Alifieraki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
In Progress
Medium
Ioanna Alifieraki
Focal
Fix Released
Medium
Ioanna Alifieraki
Hirsute
Won't Fix
Medium
Ioanna Alifieraki
Impish
Fix Released
Medium
Ioanna Alifieraki
Jammy
In Progress
Medium
Ioanna Alifieraki

Bug Description

[IMPACT]

Commit 3b9a907223d7 (ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier)
pushes the removal of an ipmi_user into the system's workqueue.

Whenever an ipmi_user struct is about to be removed it is scheduled as a work on the system's workqueue to guarantee the free operation won't be executed in atomic context. When the work is executed the free_user_work() function is invoked which frees the ipmi_user.

When ipmi_msghandler module is removed in cleanup_ipmi() function, there is no check if there are any pending works to be executed.
Therefore, there is a potential race condition :
An ipmi_user is scheduled for removal and shortly after to remove the ipmi_msghandler module.
If the scheduled work delays execution for any reason and the module is removed first, then when the work is executed the pages of free_user_work() are gone and the system crashes with the following :

BUG: unable to handle page fault for address: ffffffffc05c3450
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 635420e067 P4D 635420e067 PUD 6354210067 PMD 4711e51067 PTE 0
Oops: 0010 [#1] SMP PTI
CPU: 19 PID: 29646 Comm: kworker/19:1 Kdump: loaded Not tainted 5.4.0-77-generic #86~18.04.1-Ubuntu
Hardware name: Ciara Technologies ORION RS610-G4-DTH4S/MR91-FS1-Y9, BIOS F29 05/23/2019
Workqueue: events 0xffffffffc05c3450
RIP: 0010:0xffffffffc05c3450
Code: Bad RIP value.
RSP: 0018:ffffb721333c3e88 EFLAGS: 00010286
RAX: ffffffffc05c3450 RBX: ffff92a95f56a740 RCX: ffffb7221cfd14e8
RDX: 0000000000000001 RSI: ffff92616040d4b0 RDI: ffffb7221cf404e0
RBP: ffffb721333c3ec0 R08: 000073746e657665 R09: 8080808080808080
R10: ffffb721333c3de0 R11: fefefefefefefeff R12: ffff92a95f570700
R13: ffff92a0a40ece40 R14: ffffb7221cf404e0 R15: 0ffff92a95f57070
FS: 0000000000000000(0000) GS:ffff92a95f540000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc05c3426 CR3: 00000081e9bfc005 CR4: 00000000007606e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
? process_one_work+0x20f/0x400
worker_thread+0x34/0x410
kthread+0x121/0x140
? process_one_work+0x400/0x400
? kthread_park+0x90/0x90
ret_from_fork+0x35/0x40
Modules linked in: xt_REDIRECT xt_owner ipt_rpfilter xt_CT xt_multiport xt_set ip_set_hash_ip veth xt_statistic ipt_REJECT
... megaraid_sas ahci libahci wmi [last unloaded: ipmi_msghandler]
CR2: ffffffffc05c3450

[TEST CASE]

The user who reported the issue can reproduce reliably by stopping the ipmi related services and then removing the ipmi modules.
I could reproduce the issue only when turning the normal 'work' to delayed work.

[WHERE PROBLEMS COULD OCCUR]

The fixing patch creates a dedicated workqueue for the remove_work struct of ipmi_user when loading the ipmi_msghandler modules and destroys the workqueue when removing the module. Therefore any potential problems would occur during these two operations or when scheduling works on the dedicated workqueue.

[OTHER]

Upstream patches :
1d49eb91e86e (ipmi: Move remove_work to dedicated workqueue)
5a3ba99b62d8 (ipmi: msghandler: Make symbol 'remove_work_wq' static)

CVE References

Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Changed in linux (Ubuntu Hirsute):
importance: Undecided → Medium
Changed in linux (Ubuntu Impish):
importance: Undecided → Medium
Changed in linux (Ubuntu Jammy):
importance: Undecided → Medium
Changed in linux (Ubuntu Focal):
status: New → Confirmed
Changed in linux (Ubuntu Hirsute):
status: New → Confirmed
Changed in linux (Ubuntu Impish):
status: New → Confirmed
Changed in linux (Ubuntu Jammy):
status: New → In Progress
assignee: nobody → Ioanna Alifieraki (joalif)
Changed in linux (Ubuntu Impish):
assignee: nobody → Ioanna Alifieraki (joalif)
Changed in linux (Ubuntu Hirsute):
assignee: nobody → Ioanna Alifieraki (joalif)
Changed in linux (Ubuntu Focal):
assignee: nobody → Ioanna Alifieraki (joalif)
description: updated
description: updated
Changed in linux (Ubuntu Focal):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Hirsute):
status: Confirmed → Fix Committed
Changed in linux (Ubuntu Impish):
status: Confirmed → Fix Committed
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

https://<email address hidden>/

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.13.0-24.24 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-impish' to 'verification-done-impish'. If the problem still exists, change the tag 'verification-needed-impish' to 'verification-failed-impish'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-impish
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.11.0-47.52 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.4.0-97.110 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Ioanna Alifieraki (joalif) wrote :

VERIFICATION

User that reported the issue has tested from -proposed and confirmed that it works.

tags: added: verification-done-focal verification-done-hirsute verification-done-impish
removed: verification-needed-focal verification-needed-hirsute verification-needed-impish
Revision history for this message
Brian Murray (brian-murray) wrote :

The Hirsute Hippo has reached End of Life, so this bug will not be fixed for that release.

Changed in linux (Ubuntu Hirsute):
status: Fix Committed → Won't Fix
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (31.9 KiB)

This bug was fixed in the package linux - 5.4.0-97.110

---------------
linux (5.4.0-97.110) focal; urgency=medium

  * icmp_redirect from selftests fails on F/kvm (unary operator expected)
    (LP: #1938964)
    - selftests: icmp_redirect: pass xfail=0 to log_test()

  * Focal: CIFS stable updates (LP: #1954926)
    - cifs: use the expiry output of dns_query to schedule next resolution
    - cifs: set a minimum of 120s for next dns resolution
    - cifs: To match file servers, make sure the server hostname matches

  * seccomp_bpf in seccomp from ubuntu_kernel_selftests failed to build on B-5.4
    (LP: #1896420)
    - SAUCE: selftests/seccomp: fix "storage size of 'md' isn't known" build issue
    - SAUCE: selftests/seccomp: Fix s390x regs not defined issue

  * system crash when removing ipmi_msghandler module (LP: #1950666)
    - ipmi: Move remove_work to dedicated workqueue
    - ipmi: msghandler: Make symbol 'remove_work_wq' static

  * zcrypt DD: Toleration for new IBM Z Crypto Hardware - (Backport to Ubuntu
    20.04) (LP: #1954680)
    - s390/AP: support new dynamic AP bus size limit

  * [UBUNTU 20.04] KVM hardware diagnose data improvements for guest kernel -
    kernel part (LP: #1953334)
    - s390/setup: diag 318: refactor struct
    - s390/kvm: diagnose 0x318 sync and reset
    - KVM: s390: remove diag318 reset code
    - KVM: s390: add debug statement for diag 318 CPNC data

  * Updates to ib_peer_memory requested by Nvidia (LP: #1947206)
    - SAUCE: RDMA/core: Updated ib_peer_memory

  * Include Infiniband Peer Memory interface (LP: #1923104)
    - IB: Allow calls to ib_umem_get from kernel ULPs
    - SAUCE: RDMA/core: Introduce peer memory interface

  * Focal update: v5.4.162 upstream stable release (LP: #1954834)
    - arm64: zynqmp: Do not duplicate flash partition label property
    - arm64: zynqmp: Fix serial compatible string
    - ARM: dts: NSP: Fix mpcore, mmc node names
    - scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()
    - arm64: dts: hisilicon: fix arm,sp805 compatible string
    - RDMA/bnxt_re: Check if the vlan is valid before reporting
    - usb: musb: tusb6010: check return value after calling
      platform_get_resource()
    - usb: typec: tipd: Remove WARN_ON in tps6598x_block_read
    - arm64: dts: qcom: msm8998: Fix CPU/L2 idle state latency and residency
    - arm64: dts: freescale: fix arm,sp805 compatible string
    - ASoC: SOF: Intel: hda-dai: fix potential locking issue
    - clk: imx: imx6ul: Move csi_sel mux to correct base register
    - ASoC: nau8824: Add DMI quirk mechanism for active-high jack-detect
    - scsi: advansys: Fix kernel pointer leak
    - firmware_loader: fix pre-allocated buf built-in firmware use
    - ARM: dts: omap: fix gpmc,mux-add-data type
    - usb: host: ohci-tmio: check return value after calling
      platform_get_resource()
    - ARM: dts: ls1021a: move thermal-zones node out of soc/
    - ARM: dts: ls1021a-tsn: use generic "jedec,spi-nor" compatible for flash
    - ALSA: ISA: not for M68K
    - tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
    - MIPS: sni: Fix the build
    - scsi: target: Fix ordered tag handling
    - scsi: target: Fix al...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (74.6 KiB)

This bug was fixed in the package linux - 5.13.0-28.31

---------------
linux (5.13.0-28.31) impish; urgency=medium

  * amd_sfh: Null pointer dereference on early device init causes early panic
    and fails to boot (LP: #1956519)
    - HID: amd_sfh: Fix potential NULL pointer dereference

  * impish: ddebs build take too long and times out (LP: #1957810)
    - [Packaging] enforce xz compression for ddebs

  * audio mute/ mic mute are not working on a HP machine (LP: #1955691)
    - ALSA: hda/realtek: fix mute/micmute LEDs for a HP ProBook

  * rtw88_8821ce causes freeze (LP: #1927808)
    - rtw88: Disable PCIe ASPM while doing NAPI poll on 8821CE

  * alsa/sdw: fix the audio sdw codec parsing logic in the acpi table
    (LP: #1955686)
    - ALSA: hda: intel-sdw-acpi: harden detection of controller
    - ALSA: hda: intel-sdw-acpi: go through HDAS ACPI at max depth of 2

  * icmp_redirect from selftests fails on F/kvm (unary operator expected)
    (LP: #1938964)
    - selftests: icmp_redirect: pass xfail=0 to log_test()

  * Impish update: upstream stable patchset 2021-12-17 (LP: #1955180)
    - arm64: zynqmp: Do not duplicate flash partition label property
    - arm64: zynqmp: Fix serial compatible string
    - ARM: dts: sunxi: Fix OPPs node name
    - arm64: dts: allwinner: h5: Fix GPU thermal zone node name
    - arm64: dts: allwinner: a100: Fix thermal zone node name
    - staging: wfx: ensure IRQ is ready before enabling it
    - ARM: dts: NSP: Fix mpcore, mmc node names
    - scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()
    - arm64: dts: rockchip: Disable CDN DP on Pinebook Pro
    - arm64: dts: hisilicon: fix arm,sp805 compatible string
    - RDMA/bnxt_re: Check if the vlan is valid before reporting
    - bus: ti-sysc: Add quirk handling for reinit on context lost
    - bus: ti-sysc: Use context lost quirk for otg
    - usb: musb: tusb6010: check return value after calling
      platform_get_resource()
    - usb: typec: tipd: Remove WARN_ON in tps6598x_block_read
    - ARM: dts: ux500: Skomer regulator fixes
    - staging: rtl8723bs: remove possible deadlock when disconnect (v2)
    - ARM: BCM53016: Specify switch ports for Meraki MR32
    - arm64: dts: qcom: msm8998: Fix CPU/L2 idle state latency and residency
    - arm64: dts: qcom: ipq6018: Fix qcom,controlled-remotely property
    - arm64: dts: freescale: fix arm,sp805 compatible string
    - ASoC: SOF: Intel: hda-dai: fix potential locking issue
    - clk: imx: imx6ul: Move csi_sel mux to correct base register
    - ASoC: nau8824: Add DMI quirk mechanism for active-high jack-detect
    - scsi: advansys: Fix kernel pointer leak
    - ALSA: intel-dsp-config: add quirk for APL/GLK/TGL devices based on ES8336
      codec
    - firmware_loader: fix pre-allocated buf built-in firmware use
    - ARM: dts: omap: fix gpmc,mux-add-data type
    - usb: host: ohci-tmio: check return value after calling
      platform_get_resource()
    - ARM: dts: ls1021a: move thermal-zones node out of soc/
    - ARM: dts: ls1021a-tsn: use generic "jedec,spi-nor" compatible for flash
    - ALSA: ISA: not for M68K
    - tty: tty_buffer: Fix the softlockup issue in flush_to_ldisc
    - MIPS: sni:...

Changed in linux (Ubuntu Impish):
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-ibm-5.4/5.4.0-1014.15~18.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.