Comment 4 for bug 2020119

Revision history for this message
Richard M Kreuter (kreuter) wrote :

Hi Mahmood,

tl;dr the simplest general solution here is to increase your dynamic-space-size; for this particular data set, you can also use a more space-efficient string representation.

It's not clear what exact steps you're taking; however, I've encountered heap exhaustion events similar to what you describe, and the ones I've seen aren't memory leaks, but explainable with some understanding of how SBCL works.

So let's use the following as a deterministic reproduction:

--
$ sbcl --dynamic-space-size 1024 --noinform --no-userinit --no-sysinit
* (defun create-test-file (file size-in-bytes bytes-per-line)
    (with-open-file (f file :direction :output :external-format :ascii
                            :if-exists :supersede :if-does-not-exist :create)
      (loop with line = (make-string (1- bytes-per-line) :initial-element #\a
                                     :element-type 'base-char)
            repeat (ceiling size-in-bytes bytes-per-line)
            do (write-line line f))
      (finish-output f)
      (file-length f)))
CREATE-TEST-FILE
* (defun get-file (file)
    (with-open-file (f file)
      (loop for line = (read-line f nil)
            while line
            collect line)))
GET-FILE
* (create-test-file "/tmp/foo.txt" (* 105 1024 1024) 80)
110100480
* (setq *print-length* 1)
1
* (get-file "/tmp/foo.txt")
("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"
 ...)
* (+ (length *) (reduce #'+ * :key #'length))
(+ (length *) (reduce #'+ * :key #'length))
110100480 ;; Ok, every byte is accounted for.
* (get-file "/tmp/foo.txt")
(get-file "/tmp/foo.txt")
Heap exhausted during garbage collection: 176 bytes available, 336 requested.
        Immobile Object Counts
--

However, it's also possible to demonstrate (in an artificial way) that GET-FILE /can/ run several times repeatedly within 1GB of dynamic space:

--
$ sbcl --dynamic-space-size 1024 --noinform --no-userinit --no-sysinit
;; Evaluate the DEFUN for GET-FILE, then
* (sb-ext:gc :full t)
NIL
* (time (progn (get-file "/tmp/foo.txt") nil))
Evaluation took:
  0.628 seconds of real time
  0.628505 seconds of total run time (0.429971 user, 0.198534 system)
  [ Run times consist of 0.369 seconds GC time, and 0.260 seconds non-GC time. ]
  100.16% CPU
  1,629,334,080 processor cycles
  525,666,560 bytes consed

NIL
* (time (loop repeat 100 do (sb-ext:gc :full t) (get-file "/tmp/foo.txt")))
Evaluation took:
  77.371 seconds of real time
  77.278278 seconds of total run time (50.084011 user, 27.194267 system)
  [ Run times consist of 47.966 seconds GC time, and 29.313 seconds non-GC time. ]
  99.88% CPU
  200,545,085,462 processor cycles
  52,567,797,248 bytes consed

NIL
--

So it seems that artificially forcing garbage collection appears to counterindicate a memory leak (at least a fast leak).

What's actually going on here? First, SBCL's STRING type uses 4 bytes of storage per character (IOW, it uses UTF-32 encoding). So 105MiB of pure ASCII text will require around 420MiB of dynamic space (more or less -- the lines GET-LINE produces don't contain the #\Linefeed character, but each string has some overhead and alignment requirements). And then the list structure GET-FILE builds up will take up another ~22MiB for cons cells for the synthesized file (16 bytes per cons times 1376256 lines). So each call to GET-FILE needs to cons at least 442MiB for the object it returns. There's some intermediate consing, too: e.g., READ-LINE sometimes needs to allocate intermediate strings when there are no linefeeds remaining in a buffer; I'm not sure if that's happening in the example above, but if it is, I estimate that that such intermediate strings would account for an additional 34MiB consed. Anyhow, the TIME output above shows each run actually consed ~526MiB (total, not a high-water-mark). This is "close enough", in my view, to 442MiB or 476MiB; a more precise accounting is beyond my reach right now.

Why, then, did the first transcript exhaust the heap? One obvious culprit is that the standard CL read-eval-print loop remembers the last few sets of return values in the special variables, *, **, ***, and /, //, and ///. So those two are likely candidates for "holding on to" data that you might not care about. (Of course you might be holding onto the result of your GET-FILE call in some variables of your own, too.)

Back to your original problem: IIUC, the mnist_train.csv file is entirely ASCII; this means that every character in it is of type BASE-CHAR in SBCL, so you could modify your GET-FILE as follows:

--
(defun get-file (file)
  (with-open-file (f file)
    (loop for line = (read-line f nil)
          while line
          collect (coerce line 'base-string))))
--

The BASE-STRING representation uses 1 byte per character *but can only store ASCII characters*. By inspection, I can run this modified GET-FILE on the mnist_train.csv file repeatedly at the REPL without exhausting the heap.

All the same, I'd recommend increasing your dynamic space size. It's simple and avoids some of the juggling necessary to deal with varying string element types.