segmentation fault after (room)

Bug #353861 reported by Karol Skocik
2
Affects Status Importance Assigned to Milestone
SBCL
Invalid
Undecided
Unassigned

Bug Description

I have connected to our production server using SLIME and typed (room) to repl, got this back:

 TACTIX> (room)
Dynamic space usage is: 84,827,952 bytes.
Read-only space usage is: 3,488 bytes.
Static space usage is: 2,256 bytes.
Control stack usage is: 5,576 bytes.
Binding stack usage is: 472 bytes.
Control and binding stack usage is for the current thread only.
Garbage collection is currently enabled.

At that point SBCL crashed with segmentation fault. Unfortunately I can't provide a test case, it a huge system and I have no idea where/why/what crashes.

I have noticed big performance degradation on the server before this happened, with average events delays 15 ms, (such delays are expected when there are 20 games running at the same time on the server), there was just one at that time.

Also, I suspect there might be a memory leak (either in our application or in SBCL), since heap exhaustion errors happen pretty regularly after several hunders of finished games. This instance was previously under similar heavy load, but with almost no load when the crash happened.

tuser@tmachine32 ~ $ sbcl --version
SBCL 1.0.26-r1-gentoo

That's the version we use, but the application on the production server is as a Lisp executable image.

TACTIX> *features*
(:LTK :SPLIT-SEQUENCE :SBCL-USES-SB-ROTATE-BYTE :SB-BSD-SOCKETS-ADDRINFO :ASDF :SB-THREAD :ANSI-CL :COMMON-LISP :SBCL :SB-DOC
 :SB-PACKAGE-LOCKS :SB-UNICODE :SB-EVAL :SB-SOURCE-LOCATIONS :IEEE-FLOATING-POINT :X86 :UNIX :ELF :LINUX :LARGEFILE :GENCGC
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :C-STACK-IS-CONTROL-STACK :COMPARE-AND-SWAP-VOPS :UNWIND-TO-FRAME-AND-CALL-VOP :RAW-INSTANCE-INIT-VOPS
 :STACK-ALLOCATABLE-CLOSURES :ALIEN-CALLBACKS :CYCLE-COUNTER :LINKAGE-TABLE :OS-PROVIDES-DLOPEN :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T)

tuser@tmachine32 ~ $ uname -a
Linux tmachine32 2.6.18-xenU-ec2-v1.0 #2 SMP Tue Feb 19 10:51:53 EST 2008 i686 Dual-Core AMD Opteron(tm) Processor 2218 HE AuthenticAMD GNU/Linux

Please note that this is EC2 instance, so while the uname tells that it's dual core, we actually have just one core to our disposal. And, it runs in 32-bit mode (limitation of small instance) despite that Opteron is 64-bit.

Let me know whether there is something I can do to gather more useful informations to fix this please.

Thanks,
  Karol Skocik

Revision history for this message
Karol Skocik (karol-skocik) wrote :

I have a test case to reproduce it quite reliably:

;; this is iota from alexandria
(defun iota (n &key (start 0) (step 1))
  (declare (type (integer 0) n) (number start step))
  (loop repeat n
     for i = (+ start (- step step)) then (+ i step)
     collect i))

(defun test-iota (n)
  (flet ((thread-fn ()
           (loop (sleep 0.01)
              (make-array 100 :initial-contents (iota 100)))))
    (loop :repeat n
       :collect (sb-thread:make-thread #'thread-fn)))))

First, run (test-iota 10), then in repl:

(loop :repeat 100 :do (room))

What's interesting is that it does not crash without ":initial-contents (iota 100)" in make-array in thread-fn.

Karol

Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

I haven't been able to reproduce this so far using the reduced test-case. So:

1. Please post full transcript of a session where you reproduce this, including the error message, etc, and running sbcl with --no-userinit and --no-sysinit.

2. Also, if you can, please reproduce this using the upstream version of SBCL, instead of the Gentoo version.

(Note that 1.0.28.51 changes the way MAKE-ARRAY is compiled, so the bug may well be masked using your reduced test in post 1.0.28.50 versions.)

Revision history for this message
Karol Skocik (karol-skocik) wrote : Re: [Bug 353861] Re: segmentation fault after (room)
Download full text (3.3 KiB)

Hi,
  when we upgraded to 1.0.27, the bug went away and haven't occured
since then. Sorry for false alarm.

Thanks,
  Karol Skocik

On Tue, May 26, 2009 at 10:02 AM, Nikodemus Siivola
<email address hidden> wrote:
> I haven't been able to reproduce this so far using the reduced test-
> case. So:
>
> 1. Please post full transcript of a session where you reproduce this,
> including the error message, etc, and running sbcl with --no-userinit
> and --no-sysinit.
>
> 2. Also, if you can, please reproduce this using the upstream version of
> SBCL, instead of the Gentoo version.
>
> (Note that 1.0.28.51 changes the way MAKE-ARRAY is compiled, so the bug
> may well be masked using your reduced test in post 1.0.28.50 versions.)
>
> --
> segmentation fault after (room)
> https://bugs.launchpad.net/bugs/353861
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in Steel Bank Common Lisp: New
>
> Bug description:
> I have connected to our production server using SLIME and typed (room) to repl, got this back:
>
>  TACTIX> (room)
> Dynamic space usage is:   84,827,952 bytes.
> Read-only space usage is:      3,488 bytes.
> Static space usage is:         2,256 bytes.
> Control stack usage is:        5,576 bytes.
> Binding stack usage is:          472 bytes.
> Control and binding stack usage is for the current thread only.
> Garbage collection is currently enabled.
>
> At that point SBCL crashed with segmentation fault. Unfortunately I can't provide a test case, it a huge system and I have no idea where/why/what crashes.
>
> I have noticed big performance degradation on the server before this happened, with average events delays 15 ms, (such delays are expected when there are 20 games running at the same time on the server), there was just one at that time.
>
> Also, I suspect there might be a memory leak (either in our application or in SBCL), since heap exhaustion errors happen pretty regularly after several hunders of finished games. This instance was previously under similar heavy load, but with almost no load when the crash happened.
>
> tuser@tmachine32 ~ $ sbcl --version
> SBCL 1.0.26-r1-gentoo
>
> That's the version we use, but the application on the production server is as a Lisp executable image.
>
> TACTIX> *features*
> (:LTK :SPLIT-SEQUENCE :SBCL-USES-SB-ROTATE-BYTE :SB-BSD-SOCKETS-ADDRINFO :ASDF :SB-THREAD :ANSI-CL :COMMON-LISP :SBCL :SB-DOC
>  :SB-PACKAGE-LOCKS :SB-UNICODE :SB-EVAL :SB-SOURCE-LOCATIONS :IEEE-FLOATING-POINT :X86 :UNIX :ELF :LINUX :LARGEFILE :GENCGC
>  :STACK-GROWS-DOWNWARD-NOT-UPWARD :C-STACK-IS-CONTROL-STACK :COMPARE-AND-SWAP-VOPS :UNWIND-TO-FRAME-AND-CALL-VOP :RAW-INSTANCE-INIT-VOPS
>  :STACK-ALLOCATABLE-CLOSURES :ALIEN-CALLBACKS :CYCLE-COUNTER :LINKAGE-TABLE :OS-PROVIDES-DLOPEN :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T)
>
> tuser@tmachine32 ~ $ uname -a
> Linux tmachine32 2.6.18-xenU-ec2-v1.0 #2 SMP Tue Feb 19 10:51:53 EST 2008 i686 Dual-Core AMD Opteron(tm) Processor 2218 HE AuthenticAMD GNU/Linux
>
> Please note that this is EC2 instance, so while the uname tells that it's dual core, we actually have just one core to our disposal. And, it runs in 32-bit mode (limitation o...

Read more...

Revision history for this message
Nikodemus Siivola (nikodemus) wrote :

Either fixed since 1.0.26, or masked by unrelated changes.

Changed in sbcl:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.