Warnings are reported by preempt-rt kernel when dumping vmcore

Bug #1997932 reported by Jiping Ma
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Jiping Ma

Bug Description

Brief Description

**The preempt-rt kernel reports warnings when dumping vmcore files due to the use of kernel command line arguments such as nohz_full=, isolcpus=, rcu_nocbs= with the kexec/kdump kernel.

Severity

Minor: vmcore dump appears to work, but there are warnings which may cause confusion.

Steps to Reproduce

I was able to reproduce this issue with the 5.10.74 preempt-rt kernel as follows:

[ 9.724046] io scheduler kyber registered
         Startin[ 9.724064] io scheduler bfq registered
g udev Kernel De[ 9.735536] ------------[ cut here ]------------
vice Manager...[ 9.735537] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:139 tick_sched_do_timer+0x5e/0x70
[ 9.735543] Modules linked in:
[ 9.735545] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.10.74-200.1638.tis.rt.el7.x86_64 #1
[ 9.735547] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0016.033120161139 03/31/2016
[ 9.735548] RIP: 0010:tick_sched_do_timer+0x5e/0x70
[ 9.735550] Code: 01 00 75 26 89 15 36 74 6f 01 48 8b 05 2b 87 d5 01 48 89 f1 48 29 c1 48 3b 0d 5e 85 d5 01 7c cb 48 89 f7 e8 64 fe ff ff eb c1 <0f> 0b eb d6 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
[ 9.735552] RSP: 0000:ffffc90000003ef0 EFLAGS: 00010002
[ 9.735553] RAX: 00000000ffffffff RBX: ffff888078c1ad00 RCX: 0000000000000018
[ 9.735554] RDX: 0000000000000000 RSI: 0000000243e95b9b RDI: ffff888078c1ad00
[ 9.735555] RBP: ffffffff82603e08 R08: 0000000000000000 R09: 0000000000ed744d
[ 9.735556] R10: 0000ecd267612d86 R11: 0000000000000000 R12: 0000000243e95b9b
[ 9.735557] R13: ffffffff81160f30 R14: 000000000000000f R15: ffff888078c1a540
[ 9.735558] FS: 0000000000000000(0000) GS:ffff888078c00000(0000) knlGS:0000000000000000
[ 9.735559] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.735560] CR2: 0000000000000000 CR3: 0000000077610001 CR4: 00000000003706b0
[ 9.735561] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 9.735562] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 9.735563] Call Trace:
[ 9.735564] <IRQ>
[ 9.735566] tick_sched_timer+0x27/0x80
[ 9.735568] __hrtimer_run_queues+0x10b/0x360
[ 9.735572] hrtimer_interrupt+0x100/0x210
[ 9.735574] __sysvec_apic_timer_interrupt+0x5d/0x160
[ 9.735578] asm_call_irq_on_stack+0xf/0x20
[ 9.735581] </IRQ>
[ 9.735582] sysvec_apic_timer_interrupt+0x73/0x80
[ 9.735584] asm_sysvec_apic_timer_interrupt+0x12/0x20
[ 9.735586] RIP: 0010:mwait_idle+0x6d/0x90
[ 9.735588] Code: 0f 01 c8 48 8b 10 83 e2 08 75 21 48 8b 00 a9 00 00 08 00 75 17 e9 07 00 00 00 0f 00 2d 3e 8a 5f 00 31 c0 48 89 c1 fb 0f 01 c9 <eb> 01 fb 65 48 8b 04 25 80 6d 01 00 3e 80 60 02 df c3 0f ae f0 0f
[ 9.735590] RSP: 0000:ffffffff82603eb8 EFLAGS: 00000246
[ 9.735591] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 9.735592] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffffffff82618940
[ 9.735592] RBP: ffffffff82858140 R08: 0000000000000000 R09: 0000000000026b40
[ 9.735593] R10: 0000ecd26673cfca R11: 00000000fa83b2da R12: 0000000000000000
[ 9.735594] R13: 0000000000000000 R14: ffffffffffffffff R15: ffffffff82618940
[ 9.735595] default_idle_call+0x3b/0x140
[ 9.735598] do_idle+0x23b/0x2f0
[ 9.735600] cpu_startup_entry+0x19/0x20
[ 9.735601] start_kernel+0x558/0x57b
[ 9.735605] secondary_startup_64_no_verify+0xc2/0xcb
[ 9.735609] ---[ end trace 0000000000000001 ]---

First, log on to the serial console of the target, and log the serial console output using script and/or another logging tool.

Then run the following command on the target to trigger a kernel NULL pointer dereference (i.e., a controlled crash):

echo c | sudo tee /proc/sysrq-trigger

Expected Behavior

The kdump kernel should not report warnings when dumping the crashed kernel's vmcore.

Actual Behavior

The following warnings are emitted by the kdump kernel:

Reproducibility

100% reproducible

System Configuration

Branch/Pull Time/Commit

Unknown

Last Pass

Unknown; I can confirm with a 3.10 kernel as well if needed.

Timestamp/Logs

Please see above.

Alarms

None.

Test Activity

Developer Testing

Workaround

None.

Jiping Ma (jma11)
Changed in starlingx:
assignee: nobody → Jiping Ma (jma11)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to integ (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/integ/+/865627

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to integ (master)

Reviewed: https://review.opendev.org/c/starlingx/integ/+/865627
Committed: https://opendev.org/starlingx/integ/commit/aafd8aba4893579c9b3d361afdd0a7b50b7f4406
Submitter: "Zuul (22348)"
Branch: master

commit aafd8aba4893579c9b3d361afdd0a7b50b7f4406
Author: Jiping Ma <email address hidden>
Date: Thu Nov 24 21:42:20 2022 -0500

    CentOS: kdump: remove unnecessary bootargs

    The 5.10.74 preempt-rt kernel reports the following warning when
    dumping vmcore files due to the use of kernel command line arguments
    such as nohz_full=, isolcpus=, rcu_nocbs= with the kexec/kdump kernel.

    [ 1.568059] WARNING: CPU: 0 PID: 0 at kernel/time/tick-sched.c:139
    tick_sched_do_timer+0x5e/0x70
    [ 1.568064] Modules linked in:
    [ 1.568066] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G I
    5.10.74-200.1648.tis.rt.el7.x86_64 #1
    [ 1.568068] Hardware name: Dell Inc. PowerEdge R740/0WRPXK, BIOS
    2.10.2 02/24/2021
    [ 1.568068] RIP: 0010:tick_sched_do_timer+0x5e/0x70
    [ 1.568071] Code: 01 00 75 26 89 15 26 74 6f 01 48 8b 05 1b 87 d5 01

    Commit 1655ee30e6("sched/isolation: really align nohz_full with
    rcu_nocbs") is included in the 5.10.112 kernel, that had fixed the
    warning. So the warning will not be reproduced with 5.10.112 and the
    later versions of kernel.

    We can remove the irqaffinity, isolcpus, nohz_full, rcu_nocbs, and
    kthread_cpus arguments from the kdump kernel's command line arguments,
    which will also fix the issue.

    Testing:
    - An ISO image can be built successfully.
    - There are no warnings after the fix with 5.10.74 kernel.

    Closes-Bug: 1997932

    Signed-off-by: M. Vefa Bicakci <email address hidden>
    Signed-off-by: Jiping Ma <email address hidden>
    Reported-by: M. Vefa Bicakci <email address hidden>
    Change-Id: I7d1dbd864fdfe2533197084d7274ef6ab70892db

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
tags: added: stx.8.0 stx.distro.other
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.