sbcl: malloc.c:2372: sysmalloc: Assertion

Bug #1639410 reported by Ala'a
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Invalid
Undecided
Unassigned

Bug Description

I have some code using CFFI library. The code interface with CUDA library. When I run the code more than once (actually after the second, before the third run), it failed with the following error

sbcl: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
fatal error encountered in SBCL pid 11489(tid 140737353881408):
SIGABRT received.

I'm expecting to run the code repeatedly without error. For double checking, the code was run on Clozure CL version 1.11, without any issue several times.

Test code is at the end.

Linux distribution: Linux Mint 17.1 Rebecca

: uname -a
Linux rock 3.13.0-37-generic #64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

: sbcl --version
SBCL 1.3.11.14-e9f232b

CL-USER(3): *features*
(CFFI-FEATURES:FLAT-NAMESPACE CFFI-FEATURES:X86-64 CFFI-FEATURES:UNIX :CFFI
 CFFI-SYS::FLAT-NAMESPACE :QUICKLISP :SB-BSD-SOCKETS-ADDRINFO
 :ASDF-PACKAGE-SYSTEM :ASDF3.1 :ASDF3 :ASDF2 :ASDF :OS-UNIX
 :NON-BASE-CHARS-EXIST-P :ASDF-UNICODE :64-BIT :64-BIT-REGISTERS
 :ALIEN-CALLBACKS :ANSI-CL :ASH-RIGHT-VOPS :C-STACK-IS-CONTROL-STACK
 :COMMON-LISP :COMPACT-INSTANCE-HEADER :COMPARE-AND-SWAP-VOPS
 :COMPLEX-FLOAT-VOPS :CYCLE-COUNTER :ELF :FLOAT-EQL-VOPS
 :FP-AND-PC-STANDARD-SAVE :GENCGC :IEEE-FLOATING-POINT :IMMOBILE-SPACE
 :INLINE-CONSTANTS :INTEGER-EQL-VOP :LARGEFILE :LINKAGE-TABLE :LINUX
 :LITTLE-ENDIAN :MEMORY-BARRIER-VOPS :MULTIPLY-HIGH-VOPS :OS-PROVIDES-BLKSIZE-T
 :OS-PROVIDES-DLADDR :OS-PROVIDES-DLOPEN :OS-PROVIDES-GETPROTOBY-R
 :OS-PROVIDES-POLL :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T
 :PACKAGE-LOCAL-NICKNAMES :PRECISE-ARG-COUNT-ERROR :RAW-INSTANCE-INIT-VOPS
 :RAW-SIGNED-WORD :SB-DOC :SB-EVAL :SB-FUTEX :SB-LDB :SB-PACKAGE-LOCKS
 :SB-SIMD-PACK :SB-SOURCE-LOCATIONS :SB-TEST :SB-THREAD :SB-UNICODE :SBCL
 :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
 :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :SYMBOL-INFO-VOPS :UNBIND-N-VOP :UNIX
 :UNWIND-TO-FRAME-AND-CALL-VOP :X86-64)

Test code:
------------------------------------------------

(ql:quickload :cffi)

(ql:quickload :cffi-grovel)

(use-package :cffi)

(cffi:load-foreign-library "libcuda.so")

(cffi:load-foreign-library "libcudart.so")

;; Including CFFI-GROVEL auto generated code
;; (modified to include shortest snippet producing the issue)
(cffi:defcstruct (device-properties :size 600)
  (totalglobalmem :unsigned-int :offset 256))
(cl:defconstant size-of-device-properties (cffi:foreign-type-size '(:struct device-properties)))
;; End of CFFI-GROVEL auto generated code

(defcfun (%cuda-get-device-properties "cudaGetDeviceProperties") :int
  (properties :pointer)
  (device-id :int))

(defun device-properties (&optional (device-id 0))
  (with-foreign-object (p '(:struct device-properties))
    (%cuda-get-device-properties p device-id)
    (with-foreign-slots ((totalglobalmem)
    p (:struct device-properties))
      (list
       (list 'totalglobalmem totalglobalmem)))))

(loop for i from 1 upto 10
   do (print i)
   do (print (device-properties)))

Sample Run
------------------------------------------------
: sbcl --load cuda-query-sample-shorter.lisp
This is SBCL 1.3.11.14-e9f232b, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
To load "cffi":
  Load 1 ASDF system:
    cffi
; Loading "cffi"
.
To load "cffi-grovel":
  Load 1 ASDF system:
    cffi-grovel
; Loading "cffi-grovel"

1
((TOTALGLOBALMEM 2081161216))
2
((TOTALGLOBALMEM 2081161216))
sbcl: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
fatal error encountered in SBCL pid 12714(tid 140737353881408):
SIGABRT received.

Welcome to LDB, a low-level debugger for the Lisp runtime environment.

Revision history for this message
Douglas Katzman (dougk) wrote :

could you provide a backtrace from ldb? (the command is 'backtrace')

Revision history for this message
Ala'a (amalawi) wrote :

strangely enough, I'd tried that but it did not do anything! typing anything does nothing.

Revision history for this message
Ala'a (amalawi) wrote :

Also, here is last part of the output, but with strace prepended to the command line, notice that clicking Return twice after 'backtrace' did not do anything, even any thing cought by strace

write(2, "sbcl: malloc.c:2372: sysmalloc: "..., 428sbcl: malloc.c:2372: sysmalloc: Assertion `(old_top == (((mbinptr) (((char *) &((av)->bins[((1) - 1) * 2])) - __builtin_offsetof (struct malloc_chunk, fd)))) && old_size == 0) || ((unsigned long) (old_size) >= (unsigned long)((((__builtin_offsetof (struct malloc_chunk, fd_nextsize))+((2 *(sizeof(size_t))) - 1)) & ~((2 *(sizeof(size_t))) - 1))) && ((old_top)->size & 0x1) && ((unsigned long) old_end & pagemask) == 0)' failed.
) = 428
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
tgkill(1734, 1734, SIGABRT) = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=1734, si_uid=1000} ---
rt_sigprocmask(SIG_BLOCK, NULL, [HUP INT QUIT USR2 PIPE ALRM TERM CHLD TSTP URG XCPU XFSZ VTALRM PROF WINCH IO], 8) = 0
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT USR2 PIPE ALRM TERM CHLD TSTP URG XCPU XFSZ VTALRM PROF WINCH IO], NULL, 8) = 0
write(2, "fatal error encountered", 23fatal error encountered) = 23
write(2, " in SBCL pid 1734", 17 in SBCL pid 1734) = 17
write(2, "(tid 140737353881408)", 21(tid 140737353881408)) = 21
write(2, ":\n", 2:
) = 2
write(2, "SIGABRT received.\n", 18SIGABRT received.
) = 18
write(2, "\n", 1
) = 1
write(2, "\n", 1
) = 1
write(1, "Welcome to LDB, a low-level debu"..., 71Welcome to LDB, a low-level debugger for the Lisp runtime environment.
) = 71
futex(0x7ffff76ab760, FUTEX_WAIT_PRIVATE, 2, NULLbacktrace

backtrace

Revision history for this message
Stas Boukarev (stassats) wrote :

600 is clearly not the right size and it's writing past the heap.
sizeof(struct cudaDeviceProp) says 648 here.

Changed in sbcl:
status: New → Invalid
Revision history for this message
Ala'a (amalawi) wrote :

which version of Cuda?

Revision history for this message
Ala'a (amalawi) wrote :

It is 600 here. Cuda 8.0 (Driver/RunTime)

Revision history for this message
Stas Boukarev (stassats) wrote :

But did you try setting it to 648?

Revision history for this message
Ala'a (amalawi) wrote :

Thanks. This solved it.

Any hint on how to check why g++ and nvcc were reporting 600? a verison issue? or a box issue?
 since sizeof(struct cudaDeviceProp) is what was used by CFFI-GROVEL, to generate the 600

Also any hint on Why it was running in the first time if there was writing past the heap? is it accumulative?

Thanks again

Revision history for this message
Stas Boukarev (stassats) wrote :

Maybe your gcc is compiling in 32-bit mode?

Revision history for this message
Ala'a (amalawi) wrote :

The output from nvcc and g++ both are ELF 64-bit LSB executable,x86-64 (as per 'file').

Revision history for this message
Ala'a (amalawi) wrote :

forgot to add that both compiled with -m64

Revision history for this message
Stas Boukarev (stassats) wrote :

Then it must be using an old header file
Look at
find /usr/include -name cuda_runtime_api.h

Revision history for this message
Ala'a (amalawi) wrote :

confirmed. two headers, /usr/include/ is the older one 5.5, and the NVIDIA installation is /usr/local/cuda-8.0/include
the last one gives 648.

Thanks for the help. and sorry for the noise.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.