long line with byte 0 cause SBCL to crash into LDB

Bug #1202303 reported by Rademaker
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Invalid
Undecided
Unassigned

Bug Description

Consider the attached files. The zero.log file contains:

$ hexdump -C zero.log
00000000 41 42 00 00 00 43 44 45 0a 46 0a 47 0a 48 0a 49 |AB...CDE.F.G.H.I|
00000010 0a 4a 0a |.J.|
00000013

That is, after the letter B I put three 00 (hexadecimal byte). The file zero300M.log contains:

$ hexdump -C zero300M.log
00000000 41 42 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |AB..............|
00000010 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
12c00000 00 00 43 44 45 0a 46 0a 47 0a 48 0a 49 0a 4a 0a |..CDE.F.G.H.I.J.|
12c00010

That is, only more 00 than only three. The first file is read correctly but the second one causes a crash:

> (with-open-file (in "zero.log")
    (loop for line = (read-line in nil)
   while line
   do
   (format t "~a~%" line)))

Heap exhausted during garbage collection: 1488 bytes available, 2064 requested.
 Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age
   0: 0 0 0 0 0 0 0 0 0 0 0 10737418 0 0 0.0000
   1: 19385 22791 0 0 93 10219 0 0 0 322310352 15593264 10737418 0 0 1.0002
   2: 29967 29962 0 0 469 13849 0 0 37 448042192 21130032 2000000 432 0 0.0000
   3: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
   4: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
   5: 0 0 0 0 0 0 0 0 0 0 0 2000000 0 0 0.0000
   6: 0 0 0 0 1192 173 0 0 0 44728320 0 2000000 1075 0 0.0000
   Total bytes allocated = 1026850528
   Dynamic-space-size bytes = 1073741824
GC control variables:
   *GC-INHIBIT* = true
   *GC-PENDING* = in progress
fatal error encountered in SBCL pid 72033:
Heap exhausted, game over.

Allegro Common Lisp (lisp) is able to read both files without any problem.

More info:

$ sbcl --version
SBCL 1.1.9

$ uname -a
Darwin urca.br.ibm.com 12.4.0 Darwin Kernel Version 12.4.0: Wed May 1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64

> *features*
(:SWANK :QUICKLISP :ASDF2 :ASDF :ASDF-UNICODE :ALIEN-CALLBACKS :ANSI-CL
 :ASH-RIGHT-VOPS :BSD :C-STACK-IS-CONTROL-STACK :COMMON-LISP
 :COMPARE-AND-SWAP-VOPS :COMPLEX-FLOAT-VOPS :CYCLE-COUNTER :DARWIN
 :DARWIN9-OR-BETTER :FLOAT-EQL-VOPS :GENCGC :IEEE-FLOATING-POINT
 :INLINE-CONSTANTS :INODE64 :LINKAGE-TABLE :LITTLE-ENDIAN
 :MACH-EXCEPTION-HANDLER :MACH-O :MEMORY-BARRIER-VOPS :MULTIPLY-HIGH-VOPS
 :OS-PROVIDES-BLKSIZE-T :OS-PROVIDES-DLADDR :OS-PROVIDES-DLOPEN
 :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T :PACKAGE-LOCAL-NICKNAMES
 :RAW-INSTANCE-INIT-VOPS :SB-DOC :SB-EVAL :SB-LDB :SB-PACKAGE-LOCKS
 :SB-SIMD-PACK :SB-SOURCE-LOCATIONS :SB-TEST :SB-UNICODE :SBCL
 :STACK-ALLOCATABLE-CLOSURES :STACK-ALLOCATABLE-FIXED-OBJECTS
 :STACK-ALLOCATABLE-LISTS :STACK-ALLOCATABLE-VECTORS
 :STACK-GROWS-DOWNWARD-NOT-UPWARD :UD2-BREAKPOINTS :UNIX
 :UNWIND-TO-FRAME-AND-CALL-VOP :X86-64)

Revision history for this message
Rademaker (arademaker) wrote :
Revision history for this message
Stas Boukarev (stassats) wrote :

You're trying to create a very long line with READ-LINE, it isn't caused by the 0 byte, but by the lack of a newline.

Changed in sbcl:
status: New → Incomplete
status: Incomplete → Invalid
Revision history for this message
Rademaker (arademaker) wrote :

Stats, but there is a hard limit of size for a line?

Changed in sbcl:
status: Invalid → New
status: New → Opinion
Revision history for this message
Rademaker (arademaker) wrote :

In other words. OK, the problem is not the zero byte, but why the crash when the read-line reads a long line?

Revision history for this message
Paul Khuong (pvk) wrote :

It ran out of space (which is what the first line of error output says "Heap exhausted during garbage collection: 1488 bytes available, 2064 requested").

300 million characters * at 4 bytes per character (we support the full unicode range) = 1.2 GB. Just representing the final string needs more than 200 MB more space than the 1GB heap.

You can increase the heap size with --dynamic-space-size. You might also want to consider working with octets or base-chars and :ascii or :iso-8859-1 external formats.

Changed in sbcl:
status: Opinion → Invalid
Revision history for this message
Christophe Rhodes (csr21-cantab) wrote : Re: [Bug 1202303] Re: long line with byte 0 cause SBCL to crash into LDB

Rademaker <email address hidden> writes:

> Stats, but there is a hard limit of size for a line?

Characters in sbcl take four bytes in strings unless you work hard to
avoid it. Your 300MB-sized file with no newlines is therefore going to
try to construct a 1.2GB-sized string, which is probably larger than
your heap size.

> ** Changed in: sbcl
> Status: New => Opinion

It may be your opinion that this is a bug, and in the general scheme of
things it might be -- but you'd be on stronger ground arguing that if
you could reproduce it in an environment where the heap size is actually
large enough for your data.

Christophe

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.