Kernel crash vmcore controller reboot related to BIOS/Kernel CVE

Bug #2031597 reported by Jiping Ma
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jiping Ma

Bug Description

Brief Description

Standby controller crashes with vmcore generated.

vmcore generated on controller-1 of Radio site 44895 running 21.12 Patch6 and Samsung R2.

Might be related to Spectre/Meltdown correction BIOS/Kernel CVE level of BIOS and Kernel. It happens when the pod/ containers using RT cores.
Information requested from Dell to get the BIOS/Kernel CVE level for Dell XR11 servers.
Interestingly after the incident we saw an excessive etc slowdown on controller-0 from the smu-eru database there is a steep drop in the free memory after 23:29:06.

Severity

Major: Service impacting reboot.

Steps to Reproduce

Experienced on Radio site running starlingx on controller-1.
More sites might be affected, to be updated in this JIRA as soon as this is discovered.

Expected Behavior

System should protect itself from rebooting and crash the application process that is causing the issue on the Kernel.

Actual Behavior

System crashes and vmcore generated.

Reproducibility

such vmcore seen 5 times
Free memory consumed at 2 Mega Bytes per seconds seen on another site on 88807 worker-1 and 96090 controller-1
etcd slow downs seen on almost all sites.

kernel BUG at kernel/locking/rtmutex.c:1331!
invalid opcode: 0000 [#1] PREEMPT_RT SMP NOPTI
......
Call Trace:
 rt_spin_lock_slowlock_locked+0xb2/0x2a0
 ? update_load_avg+0x80/0x690
 rt_spin_lock_slowlock+0x50/0x80
 ? update_load_avg+0x80/0x690
 rt_spin_lock+0x2a/0x30
 free_unref_page+0xc5/0x280
 __vunmap+0x17f/0x240
 put_task_stack+0xc6/0x130
 __put_task_struct+0x3d/0x180
 rt_mutex_adjust_prio_chain+0x365/0x7b0
 task_blocks_on_rt_mutex+0x1eb/0x370
 rt_spin_lock_slowlock_locked+0xb2/0x2a0
 rt_spin_lock_slowlock+0x50/0x80
 rt_spin_lock+0x2a/0x30
 free_unref_page_list+0x128/0x5e0
 release_pages+0x2b4/0x320
 tlb_flush_mmu+0x44/0x150
 tlb_finish_mmu+0x3c/0x70
 zap_page_range+0x12a/0x170
 ? find_vma+0x16/0x70
 do_madvise+0x99d/0xba0
 ? do_epoll_wait+0xa2/0xe0
 ? __x64_sys_madvise+0x26/0x30
 __x64_sys_madvise+0x26/0x30
 do_syscall_64+0x33/0x40
 entry_SYSCALL_64_after_hwframe+0x44/0xa9

Jiping Ma (jma11)
Changed in starlingx:
assignee: nobody → Jiping Ma (jma11)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kernel (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/kernel/+/891652

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kernel (master)

Reviewed: https://review.opendev.org/c/starlingx/kernel/+/891652
Committed: https://opendev.org/starlingx/kernel/commit/b541465cc394fb7ca765a452908f0077d6c33e80
Submitter: "Zuul (22348)"
Branch: master

commit b541465cc394fb7ca765a452908f0077d6c33e80
Author: Jiping Ma <email address hidden>
Date: Wed Aug 16 21:28:54 2023 -0700

    kernel-rt: beware of __put_task_struct() calling context

    Under PREEMPT_RT, __put_task_struct() indirectly acquires sleeping
    locks. Therefore, it can't be called from an non-preemptible context.

    Instead of calling __put_task_struct() directly, we defer it using
    call_rcu(). A more natural approach would use a workqueue, but since
    in PREEMPT_RT, we can't allocate dynamic memory from atomic context,
    the code would become more complex because we would need to put the
    work_struct instance in the task_struct and initialize it when we
    allocate a new task_struct.

    We met 5 same panics, __put_task_struct is called during the process
    holding a lock that caused the kernel BUG_ON. The below is the call
    trace.

    We also need cherry pick the following commits, because the necessary
    context is not in 5.10.18x, such as there is not definition
    DEFINE_WAIT_OVERRIDE_MAP.

    * commit 5f2962401c6e
      ("locking/lockdep: Exclude local_lock_t from IRQ inversions")
    * commit 175b1a60e880
      ("locking/lockdep: Clean up check_redundant() a bit")
    * commit bc2dd71b2836
      ("locking/lockdep: Add a skip() function to __bfs()")
    * commit 0cce06ba859a
      ("debugobjects,locking: Annotate debug_object_fill_pool() wait type
       violation")

    kernel BUG at kernel/locking/rtmutex.c:1331!
    invalid opcode: 0000 [#1] PREEMPT_RT SMP NOPTI
    ......
    Call Trace:
     rt_spin_lock_slowlock_locked+0xb2/0x2a0
     ? update_load_avg+0x80/0x690
     rt_spin_lock_slowlock+0x50/0x80
     ? update_load_avg+0x80/0x690
     rt_spin_lock+0x2a/0x30
     free_unref_page+0xc5/0x280
     __vunmap+0x17f/0x240
     put_task_stack+0xc6/0x130
     __put_task_struct+0x3d/0x180
     rt_mutex_adjust_prio_chain+0x365/0x7b0
     task_blocks_on_rt_mutex+0x1eb/0x370
     rt_spin_lock_slowlock_locked+0xb2/0x2a0
     rt_spin_lock_slowlock+0x50/0x80
     rt_spin_lock+0x2a/0x30
     free_unref_page_list+0x128/0x5e0
     release_pages+0x2b4/0x320
     tlb_flush_mmu+0x44/0x150
     tlb_finish_mmu+0x3c/0x70
     zap_page_range+0x12a/0x170
     ? find_vma+0x16/0x70
     do_madvise+0x99d/0xba0
     ? do_epoll_wait+0xa2/0xe0
     ? __x64_sys_madvise+0x26/0x30
     __x64_sys_madvise+0x26/0x30
     do_syscall_64+0x33/0x40
     entry_SYSCALL_64_after_hwframe+0x44/0xa9

    Verification:
    - build-pkgs; build-iso; install and boot up on aio-sx lab.
    - Can not reproduce the isue during the stress-ng test for almost 24 hours.
      while true; do sudo stress-ng --sched rr --mmapfork 23 -t 20; done
      while true; do sudo stress-ng --sched fifo--mmapfork 23 -t 20; done

    Closes-Bug: 2031597
    Signed-off-by: Jiping Ma <email address hidden>
    Change-Id: If022441d61492eaec88eede8603a6cb052af99d1

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Medium
tags: added: stx.9.0 stx.distro.other
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.