Notice that the call at 227 is *not* calling posix_memalign, but some random address in the same function. So the stack at the point of the call looks like:
sp: %ebp-28
sp+4: 16
sp+8: 16
and %eax is a pointer into the stack. After the call, you have:
and %eax (what would be the return value from posix_memalign if the call had actually happened) is non-zero. So _mm_malloc returns 0, which gets assigned to xa, which leads to the load from xa faulting.
What's not clear to me is how this ever worked in the first place. Probably something to do with compiling without optimization. I'd be willing to bet that if you compiled the shared library with -O2 (and not -fPIC), things would fall over C-side and Lisp-side.
I don't think this counts as a bug we get to fix, not even the stack alignment changes Nikodemus suggested (which might just paper over C-side problems).
...and compiling:
gcc -O2 -msse2 -shared -o libhimd.so himd.c
(notice the lack of -fPIC) and objdump'ing:
objdump -d libhimd.so
gives:
00000200 <himd_sqrt>:
200: 55 push %ebp
201: 89 e5 mov %esp,%ebp
203: 57 push %edi
204: 8d 45 e4 lea -0x1c(%ebp),%eax
207: 56 push %esi
208: 53 push %ebx
209: 83 ec 2c sub $0x2c,%esp
20c: 8b 75 0c mov 0xc(%ebp),%esi
20f: 8b 5d 08 mov 0x8(%ebp),%ebx
212: 89 f7 mov %esi,%edi
214: c7 44 24 08 10 00 00 movl $0x10,0x8(%esp)
21b: 00
21c: c7 44 24 04 10 00 00 movl $0x10,0x4(%esp)
223: 00
224: 89 04 24 mov %eax,(%esp)
227: e8 fc ff ff ff call 228 <himd_sqrt+0x28>
22c: 31 d2 xor %edx,%edx
22e: 85 c0 test %eax,%eax
230: 8d 46 ff lea -0x1(%esi),%eax
233: 0f 44 55 e4 cmove -0x1c(%ebp),%edx
237: 83 e7 01 and $0x1,%edi
23a: 0f 45 f0 cmovne %eax,%esi
23d: 85 f6 test %esi,%esi
23f: 7e 34 jle 275 <himd_sqrt+0x75>
Notice that the call at 227 is *not* calling posix_memalign, but some random address in the same function. So the stack at the point of the call looks like:
sp: %ebp-28
sp+4: 16
sp+8: 16
and %eax is a pointer into the stack. After the call, you have:
sp: <some return address>
sp+4: %ebp-28
sp+8: 16
sp+12: 16
and %eax (what would be the return value from posix_memalign if the call had actually happened) is non-zero. So _mm_malloc returns 0, which gets assigned to xa, which leads to the load from xa faulting.
What's not clear to me is how this ever worked in the first place. Probably something to do with compiling without optimization. I'd be willing to bet that if you compiled the shared library with -O2 (and not -fPIC), things would fall over C-side and Lisp-side.
I don't think this counts as a bug we get to fix, not even the stack alignment changes Nikodemus suggested (which might just paper over C-side problems).