simd-pack-256 usage SIGILL's on platforms without AVX2

Bug #1928097 reported by Marco Heisig
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Undecided
Unassigned

Bug Description

Calling one of the functions %make-simd-pack-256-ub64, %simd-pack-256-singles, %simd-pack-256-doubles, and %simd-pack-256-ub64s on an x86-64 machine that supports AVX but not AVX2 crashes the Lisp image with a SIGILL. The reason is that these functions use VOPs that expand into AVX2 instructions, even on platforms without AVX2 support.

It is tempting to dismiss this bug with 'if you don't have AVX2, don't use SIMD', but people that use my SIMD library on such machines keep complaining about this. I attached a patch that replaces the problematic AVX2 instructions with equivalent AVX instructions (But take the patch with a grain of salt, my assembler skills are weak).

== sbcl --version && uname -a ==

SBCL 2.1.4.75-695b2fd88-WIP (also on SBCL 2.1.2)
Linux marcoheisig-desktop 5.8.0-50-generic #56~20.04.1-Ubuntu SMP Mon Apr 12 21:46:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

== *features* ==
(ALEXANDRIA::SEQUENCE-EMPTYP :SWANK :QUICKLISP :ASDF3.3 :ASDF3.2 :ASDF3.1
 :ASDF3 :ASDF2 :ASDF :OS-UNIX :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :X86-64
 :GENCGC :64-BIT :ANSI-CL :COMMON-LISP :ELF :IEEE-FLOATING-POINT :LINUX
 :LITTLE-ENDIAN :PACKAGE-LOCAL-NICKNAMES :SB-CORE-COMPRESSION :SB-LDB
 :SB-PACKAGE-LOCKS :SB-THREAD :SB-UNICODE :SBCL :UNIX)

== lscpu ==

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 48 bits physical, 48 bits virtual
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 21
Model: 1
Model name: AMD FX(tm)-8150 Eight-Core Processor
Stepping: 2
Frequency boost: enabled
CPU MHz: 1410.490
CPU max MHz: 3600,0000
CPU min MHz: 1400,0000
BogoMIPS: 7223.89
Virtualization: AMD-V
L1d cache: 64 KiB
L1i cache: 256 KiB
L2 cache: 8 MiB
L3 cache: 8 MiB
NUMA node0 CPU(s): 0-7
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, STIBP disabled, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmx
                                 ext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq mo
                                 nitor ssse3 cx16 sse4_1 sse4_2 popcnt aes xsave avx lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse
                                  3dnowprefetch osvw ibs xop skinit wdt fma4 nodeid_msr topoext perfctr_core perfctr_nb cpb hw_pstate ssbd ibpb vmmc
                                 all arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold

Revision history for this message
Marco Heisig (marco-heisig-h) wrote :
Revision history for this message
Stas Boukarev (stassats) wrote :

Intel doesn't recommend mixing integer and float operations. How many machines are there that don't have AVX2 and yet people want to use AVX instructions on them?

Revision history for this message
Marco Heisig (marco-heisig-h) wrote :

> Intel doesn't recommend mixing integer and float operations.

I agree, using vinsertf128 in these places is not pretty. But GCC and Clang also use vinsertf128 for destructuring integer packs on platforms without AVX2. Maybe we could conditionally switch between vinserti128/vinsertf128 depending on whether AVX2 is available or not? Something like

(inst #.(if +avx2-supported+ 'vinserti128 'vinsertf128) dst dst tmp 1)

> How many machines are there that don't have AVX2 and yet people want to use AVX instructions on them?

I get the occasional bug report from early adopters of sb-simd, and one of my development machines is also affected.

My main concern is that sb-ext exports functions that are known to SIGILL on certain platforms. An alternative fix would be to disable 256 bit packs on those platforms. But since making this work is just a five line patch, and since there are good use-cases for 256 bit packs even with just AVX, I'd love to see this fixed.

Revision history for this message
Marco Heisig (marco-heisig-h) wrote :

I just fixed a bug in my previous patch, regarding the handling of single-float packs (vpermilpd instead of vpermilps). The new patch is attached.

I also tried to conditionally switch between AVX2 and AVX instructions, depending on what's available, but I haven't figured out how to detect the presence/absence of AVX2 during build. I tried (plusp (sb-alien:extern-alien "avx_supported" sb-alien:int)), but it seems sb-alien isn't available at the time simd-pack-256.lisp is loaded. Any better ideas?

I'd really like to see this fixed upstream. Please tell me what I should address before my patch can be merged.

Revision history for this message
Douglas Katzman (dougk) wrote :

Marco, We have a thing called *backend-subfeatures* which seems like it may be a fit for this.
The sparc/arith file is full of it. (Intentional double entendre)

A limitation of using subfeatures is that self-compile of SBCL should generally assume lack of any subfeature so that it emits code for any machine; and any alien variables (such as avx2_supported) are unavailable in self-compile.

Would you say that your intended user would compile SBCL on a machine that does/doesn't have avx2, and that the result of that compilation dictates whether subsequently compiled code can assume avx2; OR is it that you intend to compile SBCL on a least-common-denominator machine which can produce code for either?

My guess is that subfeatures will be fine even if it means that a few builtin functions are slightly suboptimal for some machines. Short of JIT-compilation to rebuild bits of SBCL at startup, we can't really produce a self-contained SBCL (that doesn't need to be rebuilt by running any post-build steps) that is at the same time maximally portable and maximally efficient on machines that have more CPU features available. But I can imagine a conflicting opinion.

Revision history for this message
Marco Heisig (marco-heisig-h) wrote :

This problem runs even deeper than just having AVX2 instructions in code that may be executed on a non-AVX2 machine. Some of the testers of my sb-simd library wanted very fine-grained control over the instruction sets being used for each part of their code.

For example, the author of a video game might want to write three different versions of a performance-critical function: One using just SSE2 instructions, one using AVX instructions, and one using both AVX and AVX2 instructions. Even though the development machine might support all these instructions, the machine where the generated executable is deployed to might not. So ideally, the target machine will select the best available version at run time.

The solution we use for sb-simd is that we provide a separate package for each instruction set. Each such package provides, among other things, its own implementation of the functions for creating a SIMD pack, and for accessing the values of a SIMD pack. So, for example, the function for unpacking a 256 bit pack of integers in the sb-simd-avx package uses VEXTRACTF128, and the equivalent function in the sb-simd-avx2 package uses VEXTRACTI128. Since sb-simd now has its own functions for creating and manipulating SIMD packs, we don't have to call any of the 'sb-ext:%make-simd-pack-...' functions or 'sb-ext:%simd-pack-...s' anymore. So as far as we are concerned, this bug is fixed (or rather, we creatively avoided it).

I was also thinking about merging some of the mechanics for picking instruction sets back into SBCL, but it is getting somewhat unwieldy. So I will probably leave SBCL as it is and put all this into the sb-simd (https://github.com/marcoheisig/sb-simd) contrib.

Thank you for your insightful replies, and feel free to close this issue.

Revision history for this message
Marco Heisig (marco-heisig-h) wrote :

Oh, sorry, please don't close this issue yet. There is one problem left that I can't just solve by not calling SBCL's SIMD constructors and unpackers. When loading an AVX2 immediate whose contents are all 1, SBCL emits a VPCMPEQD instruction. And that is only available on AVX2.

So somehow I need a mechanism to tell SBCL whether the generated code is intended to be used on a target machine with AVX2 instructions or not. Or, more generally, a mechanism to tell SBCL which instructions sets it should use for a particular piece of code. Would it be possible to include that information into the compiler policy? Something like (locally (declare (avoid-instruction-set :avx2)) ...) or (locally (declare (use-instruction-sets :avx)) ...)? Or do you have any better ideas how I can get that information into the move-funs and vops?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers