focal: 5.15.0-91 crashes on boot as Xen PV guest

Bug #2045248 reported by James Dingwall
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-meta (Ubuntu)
New
Undecided
Unassigned

Bug Description

We have a custom build of the kernel based on the Ubuntu-hwe-5.15-5.15.0-91.101_20.04.1 tag. It includes a small number of patches but nothing in the area of the early boot code. Xen is based on the upstream 4.15.5 stable branch with all patches up to and including XSA-444. In approximately 1% of pv guest boots we get the following crash which looks like it involves the entry_64.S code. We have seen this on different hardware models but only with an Intel processor although we don't have any AMD based systems. The problem was also observed with the 5.15.0-85 tag.

I have had a look on the main line kernel branch for arch/x86/entry changes but I can't obviously connect this problem to anything there based on the commit messages. I don't have the knowledge to understand the code though and whether there is actually something relevant.

```
[ 0.303715] Spectre V1 : Mitigation: usercopy/swapgs barriers and __user pointer sanitization
[ 0.303727] Spectre V2 : Mitigation: Enhanced IBRS
[ 0.303733] Spectre V2 : Spectre v2 / SpectreRSB mitigation: Filling RSB on context switch
[ 0.303740] Spectre V2 : Spectre v2 / PBRSB-eIBRS: Retire a single CALL on VMEXIT
[ 0.303746] RETBleed: Mitigation: Enhanced IBRS
[ 0.303752] Spectre V2 : mitigation: Enabling conditional Indirect Branch Prediction Barrier
[ 0.303760] Speculative Store Bypass: Mitigation: Speculative Store Bypass disabled via prctl and seccomp
[ 0.303771] MMIO Stale Data: Mitigation: Clear CPU buffers
[ 0.303777] GDS: Unknown: Dependent on hypervisor status
[ 0.303827] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[ 0.303835] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[ 0.303840] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[ 0.303846] x86/fpu: Supporting XSAVE feature 0x020: 'AVX-512 opmask'
[ 0.303851] x86/fpu: Supporting XSAVE feature 0x040: 'AVX-512 Hi256'
[ 0.303857] x86/fpu: Supporting XSAVE feature 0x080: 'AVX-512 ZMM_Hi256'
[ 0.303865] x86/fpu: xstate_offset[2]: 576, xstate_sizes[2]: 256
[ 0.303871] x86/fpu: xstate_offset[5]: 1088, xstate_sizes[5]: 64
[ 0.303877] x86/fpu: xstate_offset[6]: 1152, xstate_sizes[6]: 512
[ 0.303882] x86/fpu: xstate_offset[7]: 1664, xstate_sizes[7]: 1024
[ 0.303888] x86/fpu: Enabled xstate features 0xe7, context size is 2688 bytes, using 'standard' format.
[ 0.327588] segment-related general protection fault: e030 [#1] SMP NOPTI
[ 0.327604] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.15.0-91-generic #101~20.04.1custom1
[ 0.327614] RIP: e030:native_irq_return_iret+0x0/0x2
[ 0.327627] Code: 5b 41 5b 41 5a 41 59 41 58 58 59 5a 5e 5f 48 83 c4 08 eb 0f 0f 1f 00 90 66 66 2e 0f 1f 84 00 00 00 00 00 f6 44 24 20 04 75 02 <48> cf 57 0f 01 f8 eb 12 0f 20 df 90 90 90 90 90 48 81 e7 ff e7 ff
[ 0.327640] RSP: e02b:ffffffff82e03bc8 EFLAGS: 00010046
[ 0.327647] RAX: 0000000000000000 RBX: ffffffff82e03c30 RCX: ffffffff81e01101
[ 0.327653] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001f
[ 0.327660] RBP: ffffffff82e03bf8 R08: ffffffff81e011ef R09: 0000000000000005
[ 0.327666] R10: 0000000000000006 R11: e8ae0feb75ccff49 R12: ffffffff81e011ef
[ 0.327672] R13: 0000000000000006 R14: ffffffff81e011f1 R15: 0000000000000002
[ 0.327684] FS: 0000000000000000(0000) GS:ffff888015a00000(0000) knlGS:0000000000000000
[ 0.327691] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.327696] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 0000000000050660
[ 0.327705] Call Trace:
[ 0.327709] <TASK>
[ 0.327713] ? show_trace_log_lvl+0x1d6/0x2ea
[ 0.327723] ? show_trace_log_lvl+0x1d6/0x2ea
[ 0.327729] ? insn_decode+0xec/0x100
[ 0.327738] ? show_regs.part.0+0x23/0x29
[ 0.327743] ? __die_body.cold+0x8/0xd
[ 0.327748] ? die_addr+0x3e/0x60
[ 0.327756] ? exc_general_protection+0x1c1/0x350
[ 0.327766] ? asm_exc_general_protection+0x27/0x30
[ 0.327772] ? restore_regs_and_return_to_kernel+0x1d/0x2c
[ 0.327778] ? restore_regs_and_return_to_kernel+0x1b/0x2c
[ 0.327784] ? restore_regs_and_return_to_kernel+0x1b/0x2c
[ 0.327789] ? asm_sysvec_xen_hvm_callback+0x11/0x20
[ 0.327796] ? native_iret+0x7/0x7
[ 0.327801] ? insn_get_displacement+0x4d/0x110
[ 0.327807] insn_decode+0xec/0x100
[ 0.327813] optimize_nops+0x68/0x150
[ 0.327819] ? restore_regs_and_return_to_kernel+0x1d/0x2c
[ 0.327825] ? restore_regs_and_return_to_kernel+0x2c/0x2c
[ 0.327830] ? restore_regs_and_return_to_kernel+0x20/0x2c
[ 0.327837] apply_alternatives+0x181/0x3a0
[ 0.327843] ? restore_regs_and_return_to_kernel+0x1b/0x2c
[ 0.327848] ? fb_is_primary_device+0x25/0x73
[ 0.327855] ? restore_regs_and_return_to_kernel+0x1b/0x2c
[ 0.327861] ? apply_alternatives+0x8/0x3a0
[ 0.327867] ? fb_is_primary_device+0x6e/0x73
[ 0.327872] ? apply_returns+0xfc/0x180
[ 0.327878] ? fb_is_primary_device+0x6e/0x73
[ 0.327883] ? sanitize_boot_params.constprop.0+0xa/0xef
[ 0.327889] ? fb_is_primary_device+0x73/0x73
[ 0.327895] alternative_instructions+0xa9/0x173
[ 0.327904] arch_cpu_finalize_init+0x2c/0x51
[ 0.327909] start_kernel+0x425/0x4ce
[ 0.327916] x86_64_start_reservations+0x24/0x2a
[ 0.327922] xen_start_kernel+0x41e/0x429
[ 0.327928] startup_xen+0x3e/0x3e
[ 0.327934] </TASK>
[ 0.327937] Modules linked in:
[ 0.327943] ---[ end trace c275641b4f1eba81 ]---
[ 0.327948] RIP: e030:native_irq_return_iret+0x0/0x2
[ 0.327954] Code: 5b 41 5b 41 5a 41 59 41 58 58 59 5a 5e 5f 48 83 c4 08 eb 0f 0f 1f 00 90 66 66 2e 0f 1f 84 00 00 00 00 00 f6 44 24 20 04 75 02 <48> cf 57 0f 01 f8 eb 12 0f 20 df 90 90 90 90 90 48 81 e7 ff e7 ff
[ 0.327967] RSP: e02b:ffffffff82e03bc8 EFLAGS: 00010046
[ 0.327972] RAX: 0000000000000000 RBX: ffffffff82e03c30 RCX: ffffffff81e01101
[ 0.327978] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001f
[ 0.327984] RBP: ffffffff82e03bf8 R08: ffffffff81e011ef R09: 0000000000000005
[ 0.327990] R10: 0000000000000006 R11: e8ae0feb75ccff49 R12: ffffffff81e011ef
[ 0.327996] R13: 0000000000000006 R14: ffffffff81e011f1 R15: 0000000000000002
[ 0.328006] FS: 0000000000000000(0000) GS:ffff888015a00000(0000) knlGS:0000000000000000
[ 0.328012] CS: 10000e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 0.328018] CR2: 0000000000000000 CR3: 0000000002e10000 CR4: 0000000000050660
[ 0.328027] Kernel panic - not syncing: Attempted to kill the idle task!
```

# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 20.04.6 LTS
Release: 20.04
Codename: focal

# uname -a
Linux hostname 5.15.0-91-generic #101~20.04.1custom1 SMP Thu Nov 23 12:37:35 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

# cat /proc/version_signature
Ubuntu 5.15.0-91.101~20.04.1custom1-generic 5.15.131

# xl info
host : hostname
release : 5.15.0-91-generic
version : #101~20.04.1custom1 SMP Thu Nov 23 12:37:35 UTC 2023
machine : x86_64
nr_cpus : 80
max_cpu_id : 79
nr_nodes : 2
cores_per_socket : 20
threads_per_core : 2
cpu_mhz : 2294.609
hw_caps : bfebfbff:77fef3ff:2c100800:00000121:0000000f:f3bfbfff:00405f4e:00000100
virt_caps : pv hvm hvm_directio pv_directio hap shadow iommu_hap_pt_share vmtrace
total_memory : 130523
free_memory : 79395
sharing_freed_memory : 0
sharing_used_memory : 0
outstanding_claims : 0
free_cpus : 0
xen_major : 4
xen_minor : 15
xen_extra : .5
xen_version : 4.15.5
xen_caps : xen-3.0-x86_64 hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64
xen_scheduler : credit2
xen_pagesize : 4096
platform_params : virt_start=0xffff800000000000
xen_changeset : Mon Nov 20 09:36:08 2023 +0000 git:0196200b35-dirty
xen_commandline : placeholder console=vga,com2 com2=115200,8n1 dom0_max_vcpus=4-8 dom0_mem=min:6144,max:65536m iommu=on,required,intpost,verbose,debug x2apic=off sched=credit2 flask=enforcing gnttab_max_frames=128 xpti=off smt=on cpufreq=xen:performance spec-ctrl=gds-mit=0
cc_compiler : gcc (Ubuntu 10.3.0-1ubuntu1~20.04) 10.3.0
cc_compile_by :
cc_compile_domain :
cc_compile_date : Mon Nov 20 09:37:08 UTC 2023
build_id : 986e88b638105b0dfc4ecf5c9bbb9743a61b2677
xend_config_format : 4

Revision history for this message
James Dingwall (a-james-launchpad) wrote :

Based on a response to xen-devel post I've cherry-picked these commits to our 5.15 kernel build and since then we have not encountered this problem.

6cf3e4c0d29102c74aca1ce0c1710be9d02e440e # x86/entry: Cleanup PARAVIRT
1462eb381b4c27576a3e818bc9f918765d327fdf # x86/xen: Rework the xen_{cpu,irq,mmu}_opsarrays
8b87d8cec1b31ea710568ae49ba5f5146318da0d # x86/entry,xen: Early rewrite of restore_regs_and_return_to_kernel()
bbf92368b0b1fe472d489e62d3340d7897e9c697 # x86/text-patching: Make text_gen_insn() play nice with ANNOTATE_NOENDBR
ba27d1a80871eb8dbeddf34ec7d396c149cbb8d7 # x86/ibt,paravirt: Use text_gen_insn() for paravirt_patch()

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.