OOPs on boot: invalid opcode: 0000 [#1] SMP NOPTI

Bug #1942215 reported by Paolo Pisati
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
In Progress
Wishlist
linux (Ubuntu)
Fix Released
Undecided
Unassigned
Impish
Won't Fix
Undecided
Unassigned

Bug Description

Using latest Impish kernel 5.13.0-15.15 from ckt/bootstrap PPA, upon boot on vought we get this:

...
[ 11.502916] invalid opcode: 0000 [#1] SMP NOPTI
[ 11.504249] CPU: 95 PID: 1472 Comm: systemd-udevd Not tainted 5.13.0-15-generic #15-Ubuntu
[ 11.505734] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0D.01.0395.022720191340 02/27/2019
[ 11.507260] RIP: 0010:acpi_ds_exec_end_op+0x187/0x774
[ 11.508771] Code: 77 28 48 8b 04 c5 00 9b ea 91 48 89 df ff d0 0f 1f 00 41 89 c4 e9 8f 00 00 00 0f b6 43 0d 8d 50 ff 48 63 d2 48 83 fa 09 76 02 <0f> 0b 83 c0 6c 0f b7 7b 0a 48 89 da 48 98 48 8d 34 c3 e8 c0 3c 01
[ 11.511898] RSP: 0018:ffffaaeca1a776e0 EFLAGS: 00010286
[ 11.513428] RAX: 0000000000000000 RBX: ffff8f08a7573800 RCX: 0000000000000040
[ 11.514972] RDX: ffffffffffffffff RSI: ffffffff91ea9980 RDI: 00000000000002cb
[ 11.516100] RBP: ffffaaeca1a77710 R08: 0000000000000000 R09: ffff8f08a8c84af0
[ 11.517479] R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000000
[ 11.518985] R13: ffff8f08a8c84af0 R14: 0000000000000000 R15: 0000000000000000
[ 11.520425] FS: 00007f7fb403ed00(0000) GS:ffff8f348d5c0000(0000) knlGS:0000000000000000
[ 11.521931] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.523424] CR2: 00007f7fb38d1918 CR3: 0000000129b6a002 CR4: 00000000007706e0
[ 11.524924] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 11.526221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 11.527636] PKRU: 55555554
[ 11.528820] Call Trace:
[ 11.529807] acpi_ps_parse_loop+0x587/0x660
[ 11.531198] acpi_ps_parse_aml+0x1af/0x552
[ 11.532595] acpi_ps_execute_method+0x208/0x2ca
[ 11.533972] acpi_ns_evaluate+0x34e/0x4f0
[ 11.535361] acpi_evaluate_object+0x18e/0x3b4
[ 11.536736] acpi_evaluate_dsm+0xb3/0x120
[ 11.537943] ? acpi_evaluate_dsm+0xb3/0x120
[ 11.539214] nfit_intel_shutdown_status+0xed/0x1b0 [nfit]
[ 11.540603] acpi_nfit_add_dimm+0x3cb/0x670 [nfit]
[ 11.541990] acpi_nfit_register_dimms+0x141/0x460 [nfit]
[ 11.543377] acpi_nfit_init+0x54f/0x620 [nfit]
[ 11.544755] acpi_nfit_add+0x192/0x1f0 [nfit]
[ 11.546116] acpi_device_probe+0x49/0x170
[ 11.547431] really_probe+0x245/0x4c0
[ 11.548749] driver_probe_device+0xf0/0x160
[ 11.550064] device_driver_attach+0xab/0xb0
[ 11.551387] __driver_attach+0xb2/0x140
[ 11.552692] ? device_driver_attach+0xb0/0xb0
[ 11.554001] bus_for_each_dev+0x7e/0xc0
[ 11.555326] driver_attach+0x1e/0x20
[ 11.556630] bus_add_driver+0x135/0x1f0
[ 11.557917] driver_register+0x95/0xf0
[ 11.559226] acpi_bus_register_driver+0x39/0x50
[ 11.560139] nfit_init+0x168/0x1000 [nfit]
[ 11.561230] ? 0xffffffffc0649000
[ 11.562442] do_one_initcall+0x46/0x1d0
[ 11.563701] ? kmem_cache_alloc_trace+0x11c/0x240
[ 11.564846] do_init_module+0x62/0x290
[ 11.565768] load_module+0xaa6/0xb40
[ 11.566811] __do_sys_finit_module+0xc2/0x120
[ 11.567825] __x64_sys_finit_module+0x18/0x20
[ 11.568747] do_syscall_64+0x61/0xb0
[ 11.569694] ? syscall_exit_to_user_mode+0x27/0x50
[ 11.570680] ? __x64_sys_mmap+0x33/0x40
[ 11.571606] ? do_syscall_64+0x6e/0xb0
[ 11.572442] ? asm_exc_page_fault+0x8/0x30
[ 11.573395] entry_SYSCALL_64_after_hwframe+0x44/0xae
[ 11.574392] RIP: 0033:0x7f7fb45d670d
[ 11.575373] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d f3 66 0f 00 f7 d8 64 89 01 48
[ 11.577496] RSP: 002b:00007ffe815a56d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 11.578573] RAX: ffffffffffffffda RBX: 00005624b212e410 RCX: 00007f7fb45d670d
[ 11.579646] RDX: 0000000000000000 RSI: 00007f7fb47683fe RDI: 0000000000000006
[ 11.580712] RBP: 0000000000020000 R08: 0000000000000000 R09: 0000000000000000
[ 11.581774] R10: 0000000000000006 R11: 0000000000000246 R12: 00007f7fb47683fe
[ 11.582847] R13: 00005624b2090bf0 R14: 00005624b208f940 R15: 00005624b2096cd0
[ 11.583907] Modules linked in: nfit(+) mac_hid sch_fq_codel msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c raid1 raid0 multipath linear ast drm_vram_helper i2c_algo_bit drm_ttm_helper ttm crct10dif_pclmul drm_kms_helper crc32_pclmul syscopyarea sysfillrect ghash_clmulni_intel sysimgblt aesni_intel fb_sys_fops cec crypto_simd rc_core i40e cryptd drm i2c_i801 ahci xhci_pci lpc_ich i2c_smbus xhci_pci_renesas libahci wmi
[ 11.589096] ---[ end trace c51e80930ce46555 ]---
...

and reboot fails to restart the board.

Revision history for this message
Paolo Pisati (p-pisati) wrote :

Full dmesg with proper indentation is available here:

https://paste.ubuntu.com/p/qbhf4fTMdQ/

FWTS --dump tarball is here:

https://people.canonical.com/~ppisati/vough-acpi.tgz

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1942215

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: impish
Revision history for this message
Paolo Pisati (p-pisati) wrote :
Revision history for this message
Colin Ian King (colin-king) wrote :

Original source ./drivers/acpi/acpica/dswexec.c in acpi_ds_exec_end_op

                if (ACPI_SUCCESS(status)) {
                        /*
                         * Dispatch the request to the appropriate interpreter handler
                         * routine. There is one routine per opcode "type" based upon the
                         * number of opcode arguments and return type.
                         */
                        status =
                            acpi_gbl_op_type_dispatch[op_type] (walk_state);
                } else {

.. disassembled:

                if (!(walk_state->op_info->flags & AML_NO_OPERAND_RESOLVE)) {
 683: 48 8b 83 10 04 00 00 mov 0x410(%rbx),%rax
 68a: f6 40 11 40 testb $0x40,0x11(%rax)
 68e: 74 21 je 6b1 <acpi_ds_exec_end_op+0x177>
                         * Dispatch the request to the appropriate interpreter handler
                         * routine. There is one routine per opcode "type" based upon the
                         * number of opcode arguments and return type.
                         */
                        status =
                            acpi_gbl_op_type_dispatch[op_type] (walk_state);
 690: 44 89 f0 mov %r14d,%eax
 693: 41 80 fe 0b cmp $0xb,%r14b
 697: 77 28 ja 6c1 <acpi_ds_exec_end_op+0x187>
 699: 48 8b 04 c5 00 00 00 mov 0x0(,%rax,8),%rax
 6a0: 00
 6a1: 48 89 df mov %rbx,%rdi
 6a4: e8 00 00 00 00 call 6a9 <acpi_ds_exec_end_op+0x16f>
 6a9: 41 89 c4 mov %eax,%r12d
 6ac: e9 8f 00 00 00 jmp 740 <acpi_ds_exec_end_op+0x206>
                                                            [walk_state->
 6b1: 0f b6 43 0d movzbl 0xd(%rbx),%eax
                                                             num_operands - 1]),
 6b5: 8d 50 ff lea -0x1(%rax),%edx
                                                            [walk_state->
 6b8: 48 63 d2 movslq %edx,%rdx
 6bb: 48 83 fa 09 cmp $0x9,%rdx
 6bf: 76 02 jbe 6c3 <acpi_ds_exec_end_op+0x189>
 6c1: 0f 0b ud2

^^ crash on 0f 0b ud2 instruction

From https://mudongliang.github.io/x86/html/file_module_x86_id_318.html ud2 does:

"Generates an invalid opcode. This instruction is provided for software testing to explicitly generate an invalid opcode. The opcode for this instruction is reserved for this purpose.

Other than raising the invalid opcode exception, this instruction is the same as the NOP instruction."

Revision history for this message
Colin Ian King (colin-king) wrote (last edit ):

I suspect the op_type in the dispatcher call status = acpi_gbl_op_type_dispatch[op_type] is out of range.

Revision history for this message
In , Colin Ian King (colin-king) wrote :

Source: source/components/dispatcher/dswexec.c
Function: AcpiDsExecEndOp

The following call can be problematic if OpType is out of range:

        if (ACPI_SUCCESS (Status))
        {
            /*
             * Dispatch the request to the appropriate interpreter handler
             * routine. There is one routine per opcode "type" based upon the
             * number of opcode arguments and return type.
             */
            Status = AcpiGbl_OpTypeDispatch[OpType] (WalkState);
        }

It has been observed that an invalid OpType in the Linux kernel has triggered a trap where and out of range OpType was caught at run time. Newer versions of gcc generate a trap on an out of range dispatch call with a ud2 opcode causing kernel oopses such as:

[ 11.507260] RIP: 0010:acpi_ds_exec_end_op+0x187/0x774
[ 11.508771] Code: 77 28 48 8b 04 c5 00 9b ea 91 48 89 df ff d0 0f 1f 00 41 89 c4 e9 8f 00 00 00 0f b6 43 0d 8d 50 ff 48 63 d2 48 83 fa 09 76 02 <0f> 0b 83 c0 6c 0f b7 7b 0a 48 89 da 48 98 48 8d 34 c3 e8 c0 3c 01
[ 11.511898] RSP: 0018:ffffaaeca1a776e0 EFLAGS: 00010286
[ 11.513428] RAX: 0000000000000000 RBX: ffff8f08a7573800 RCX: 0000000000000040
[ 11.514972] RDX: ffffffffffffffff RSI: ffffffff91ea9980 RDI: 00000000000002cb
[ 11.516100] RBP: ffffaaeca1a77710 R08: 0000000000000000 R09: ffff8f08a8c84af0
[ 11.517479] R10: 0000000000000000 R11: 0000000000000003 R12: 0000000000000000
[ 11.518985] R13: ffff8f08a8c84af0 R14: 0000000000000000 R15: 0000000000000000
[ 11.520425] FS: 00007f7fb403ed00(0000) GS:ffff8f348d5c0000(0000) knlGS:0000000000000000
[ 11.521931] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 11.523424] CR2: 00007f7fb38d1918 CR3: 0000000129b6a002 CR4: 00000000007706e0
[ 11.524924] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 11.526221] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 11.527636] PKRU: 55555554
[ 11.528820] Call Trace:
[ 11.529807] acpi_ps_parse_loop+0x587/0x660
[ 11.531198] acpi_ps_parse_aml+0x1af/0x552
[ 11.532595] acpi_ps_execute_method+0x208/0x2ca
[ 11.533972] acpi_ns_evaluate+0x34e/0x4f0
[ 11.535361] acpi_evaluate_object+0x18e/0x3b4
[ 11.536736] acpi_evaluate_dsm+0xb3/0x120
[ 11.537943] ? acpi_evaluate_dsm+0xb3/0x120
[ 11.539214] nfit_intel_shutdown_status+0xed/0x1b0 [nfit]
[ 11.540603] acpi_nfit_add_dimm+0x3cb/0x670 [nfit]
[ 11.541990] acpi_nfit_register_dimms+0x141/0x460 [nfit]
[ 11.543377] acpi_nfit_init+0x54f/0x620 [nfit]
[ 11.544755] acpi_nfit_add+0x192/0x1f0 [nfit]
[ 11.546116] acpi_device_probe+0x49/0x170

I strongly suggest sanity out-of-bounds checks on the OpType before calling the dispatcher.

Revision history for this message
Colin Ian King (colin-king) wrote :

Potential workaround/fix attached

tags: added: patch
Revision history for this message
Paolo Pisati (p-pisati) wrote (last edit ):

a478d9baea4e UBUNTU: [Config] Enable CONFIG_UBSAN_BOUNDS

is the commit that triggers the above "invalid opcode" on boot (and makes it impossible to reboot the box).

Revision history for this message
Andrey Melnikov (temnota-am) wrote :

Not sufficient.
There is another BUG() hidden at line 398:

status = acpi_ex_resolve_operands(walk_state->opcode, &(walk_state->operands[walk_state->num_operands - 1]), walk_state);

in `walk_state->operands[]` array referencing.

After adding same guard for walk_state->operands[]
if (walk_state->num_operands - 1 >= ARRAY_SIZE(walk_state->operands)) {
   ACPI_ERROR((AE_INFO, "Too many operands 0x%X for op_type 0x%X", walk_state->num_operands - 1, op_type));
   status = AE_AML_BAD_OPCODE;
   goto cleanup;
}

got in dmesg:

-- cut--

[ 1.121664] acpi ABCD0000:00: ACPI dock station (docks/bays count: 1)
[ 1.125182] ACPI: PM: Power Resource [PX06]
[ 1.125182] ACPI Error: Too many operands 0xFFFFFFFF for op_type 0x0 (20210604/dswexec-397)
[ 1.125182] No Local Variables are initialized for Method [RREG]
[ 1.125311] Initialized Arguments for Method [RREG]: (3 arguments defined for method invocation)
[ 1.125450] Arg0: 000000002d6b3afd <Obj> Integer 00000000FE028000
[ 1.125588] Arg1: 0000000078d25d8c <Obj> Integer 0000000000000001
[ 1.125591] Arg2: 000000000bca9f52 <Obj> Integer 0000000000000000
[ 1.125591] ACPI Error: Aborting method \_SB.PCI0.GEXP.RREG due to previous error (AE_AML_BAD_OPCODE) (20210604/psparse-529)
[ 1.125591] ACPI Error: Aborting method \_SB.PCI0.GEXP.CSER due to previous error (AE_AML_BAD_OPCODE) (20210604/psparse-529)
[ 1.125591] ACPI Error: Aborting method \_SB.PCI0.GEXP.GEPS due to previous error (AE_AML_BAD_OPCODE) (20210604/psparse-529)
[ 1.125591] ACPI Error: Aborting method \_SB.PCI0.XHC.RHUB.HS06.PX06._STA due to previous error (AE_AML_BAD_OPCODE) (20210604/psparse-529)
[ 1.125591] ACPI Error: Too many operands 0xFFFFFFFF for op_type 0x0 (20210604/dswexec-397)
[ 1.125591] No Local Variables are initialized for Method [RREG]
[ 1.125591] Initialized Arguments for Method [RREG]: (3 arguments defined for method invocation)
[ 1.125591] Arg0: 000000006c708c99 <Obj> Integer 00000000FE028000
[ 1.125703] Arg1: 0000000078d25d8c <Obj> Integer 0000000000000001
[ 1.125838] Arg2: 00000000d8c7f611 <Obj> Integer 0000000000000000
[ 1.126062] ACPI Error: Aborting method \_SB.PCI0.GEXP.RREG due to previous error (AE_AML_BAD_OPCODE) (20210604/psparse-529)
[ 1.126213] ACPI Error: Aborting method \_SB.PCI0.GEXP.CSER due to previous error (AE_AML_BAD_OPCODE) (20210604/psparse-529)
[ 1.126366] ACPI Error: Aborting method \_SB.PCI0.GEXP.GEPS due to previous error (AE_AML_BAD_OPCODE) (20210604/psparse-529)
[ 1.126517] ACPI Error: Aborting method \_SB.PCI0.XHC.RHUB.HS06.PX06._STA due to previous error (AE_AML_BAD_OPCODE) (20210604/psparse-529)

-- cut--

have classical underflow here.

Revision history for this message
Paolo Pisati (p-pisati) wrote :

Thanks Andrey, have you already upstreamed this patch?

Revision history for this message
Paolo Pisati (p-pisati) wrote :
Revision history for this message
In , Colin Ian King (colin-king) wrote :

There is a similar array underflow error in the code here:

        if (!(WalkState->OpInfo->Flags & AML_NO_OPERAND_RESOLVE))
        {
            /* Resolve all operands */

            Status = AcpiExResolveOperands (WalkState->Opcode,
                &(WalkState->Operands [WalkState->NumOperands -1]),
                WalkState);
        }

WalkState->NumOperands - 1 in one specific case is -1 causing an oops.

See: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1942215 - comment #8. A check something like the following is required:

if (walk_state->num_operands - 1 >= ARRAY_SIZE(walk_state->operands)) {
   ACPI_ERROR((AE_INFO, " many operands 0x%X for op_type 0x%X", walk_state->num_operands - 1, op_type));
   status = AE_AML_BAD_OPCODE;
   goto cleanup;
}

..perhaps a walk_state->num_operands < 1 check is required to as the above fix handles the 0 - 1 > 0xffffffff wraparound as too many args.

Revision history for this message
Colin Ian King (colin-king) wrote :

It may be be worth re-writing the check as follows to avoid underflow wrap:

if (walk_state->num_operands < 1 || walk_state->num_operands >= ARRAY_SIZE(walk_state->operands) + 1) {
   ACPI_ERROR((AE_INFO, "Illegal number of operands 0x%X for op_type 0x%X", walk_state->num_operands - 1, op_type));
   status = AE_AML_BAD_OPCODE;
   goto cleanup;
}

Note: if a fix is sent upstream they will nack-it on the ARRAY_SIZE() macro because the ACPCIA core does not support this macro and the code is derived from that code base. I've added notes to the upstream ACPICA bug report so that it will get fixed using their coding standards.

Revision history for this message
In , Robert Moore (robert-moore) wrote :

Colin,
I think there are many places where an invalid opcode will cause problems - it is simply such a serious problem that it is nearly impossible to catch all of the possible problems created.

Revision history for this message
In , Robert Moore (robert-moore) wrote :

In any case, the the DSDT that is posted in comment #4 show the problem?

Revision history for this message
In , Colin Ian King (colin-king) wrote :

I realize that broken AML will exist, but I do think range checking on invalid data that will cause array out of bounds errors is useful. I suspect we will see more of these errors when distros enable run-time sanitizer checks such as CONFIG_UBSAN in Linux.

It's also not impossible to do. In a previous role I worked in input bit checking on an mpeg2 decoder so that no out of range corrupted or invalid data would cause out of range lookups with minimal performance overhead. Took me months, but it we did fix a whole set if bugs that made the code more input resilient.

Revision history for this message
In , Colin Ian King (colin-king) wrote :

I realize that broken AML will exist, but I do think range checking on invalid data that will cause array out of bounds errors is useful. I suspect we will see more of these errors when distros enable run-time sanitizer checks such as CONFIG_UBSAN in Linux.

It's also not impossible to do. In a previous role I worked in input bit checking on an mpeg2 decoder so that no out of range corrupted or invalid data would cause out of range lookups with minimal performance overhead. Took me months, but it we did fix a whole set if bugs that made the code more input resilient.

Yes, the DSDT tripped the issue in question.

Revision history for this message
Paolo Pisati (p-pisati) wrote :

An Impish 5.13 test kernel is available here:

https://people.canonical.com/~ppisati/lp1942215/

linux-image-unsigned-5.13.0-16-generic_5.13.0-16.16~lp1942215_amd64.deb
linux-modules-5.13.0-16-generic_5.13.0-16.16~lp1942215_amd64.deb
linux-modules-extra-5.13.0-16-generic_5.13.0-16.16~lp1942215_amd64.deb

Changed in linux:
importance: Unknown → Wishlist
status: Unknown → In Progress
Revision history for this message
Andrey Melnikov (temnota-am) wrote :

> Thanks Andrey, have you already upstreamed this patch?

No. I'm transform it to:
- if (!(walk_state->op_info->flags & AML_NO_OPERAND_RESOLVE)) {
+ if (!(walk_state->op_info->flags & AML_NO_OPERAND_RESOLVE) && walk_state->num_operands) {

and succesfuly run kernel.
if I return error here - ACPI leave hardware in unusable state.

Revision history for this message
In , Andrey Melnikov (temnota-am) wrote :

Created attachment 1158
broken dsdt from notebook

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.15.0-17.17

---------------
linux (5.15.0-17.17) jammy; urgency=medium

  * jammy/linux: 5.15.0-17.17 -proposed tracker (LP: #1957809)

 -- Andrea Righi <email address hidden> Thu, 13 Jan 2022 17:11:21 +0100

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-oracle-5.15/5.15.0-1006.8~20.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Brian Murray (brian-murray) wrote :

Ubuntu 21.10 (Impish Indri) has reached end of life, so this bug will not be fixed for that specific release.

Changed in linux (Ubuntu Impish):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.