Steel Bank Common Lisp

Errors when parsing output from external command

Reported by Dominic Pearson on 2011-09-03
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
SBCL
Undecided
Unassigned

Bug Description

The following function, when combined with the attached PDF, causes issues with SBCL.

--

(defun barf ()
  (with-output-to-string (stream)
    (run-program "/usr/bin/pdfinfo"
                 '("barf.pdf")
                 :output stream)))

--

This of course expects that `pdfinfo' is installed.

% pdfinfo --help
pdfinfo version 0.16.7
Copyright 2005-2011 The Poppler Developers - http://poppler.freedesktop.org
Copyright 1996-2004 Glyph & Cog, LLC

When `pdfinfo' is given a particular PDF file, the following is observed:

--

selfoss% sbcl --load barf.lisp
This is SBCL 1.0.51.0.debian, an implementation of ANSI Common Lisp.
More information about SBCL is available at <http://www.sbcl.org/>.

SBCL is free software, provided as is, with absolutely no warranty.
It is mostly in the public domain; some portions are provided under
BSD-style licenses. See the CREDITS and COPYING files in the
distribution for more information.
* (barf)

debugger invoked on a SB-IMPL::INVALID-UTF8-STARTER-BYTE in thread #<THREAD
                                                                     "initial thread" RUNNING

                                                                     {1002928F31}>:
  Illegal :UTF-8 character starting at byte position 193.

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [USE-VALUE ] Supply a replacement string designator.
  1: [REMOVE-FD-HANDLER] Remove #<SB-IMPL::HANDLER INPUT on descriptor 6: #<CLOSURE (LAMBDA
                                                                                        (SB-IMPL::FD)) {1003526EC9}>>
  2: [ABORT ] Exit debugger, returning to top level.

(SB-IMPL::DECODING-ERROR
 #(84 105 116 108 101 58 32 32 32 32 32 32 ...)
 193
 194
 :UTF-8
 SB-IMPL::INVALID-UTF8-STARTER-BYTE
 193)
0]

debugger invoked on a SIMPLE-ERROR in thread #<THREAD "initial thread" RUNNING
                                                {1002928F31}>:
  non-empty buffer when EOF reached while reading from child: #(84 105 116 108
                                                                101 58 32 32 32
                                                                32 32 32 ...)

Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

restarts (invokable by number or by possibly-abbreviated name):
  0: [REMOVE-FD-HANDLER] Remove #<SB-IMPL::HANDLER INPUT on descriptor 6: #<CLOSURE (LAMBDA
                                                                                        (SB-IMPL::FD)) {1003526EC9}>>
  1: Remove #<SB-IMPL::HANDLER INPUT on descriptor 6: #<CLOSURE (LAMBDA
                                                                                        (SB-IMPL::FD)) {1003526EC9}>>
  2: [ABORT ] Exit debugger, returning to top level.

((LAMBDA (SB-IMPL::FD)) #<unavailable argument>)
0[2]

--

This has been found to occur in the following Debian packaged versions of SBCL:

sbcl 1.0.51.0-1
sbcl 1.0.50.0-1
sbcl 1.0.40.0-2

Normally I would write this off as me doing something wrong, since I come from a Scheme background primarily and am still digging my teeth into CL, but the fact that the function works just fine on CCL pushes me towards thinking this may be an implementation bug with regards to the handling of strange characters outputted by external commands.

I have not yet tried this on 32-bit systems, but I have the following uname -a output:

Linux selfoss 3.0.0-1-amd64 #1 SMP Sun Jul 24 02:24:44 UTC 2011 x86_64 GNU/Linux

Dominic Pearson (dsp-5) wrote :

 status invalid
 done

Dominic Pearson <email address hidden> writes:

> ** Attachment added: "offending PDF file"
> https://bugs.launchpad.net/bugs/840190/+attachment/2347685/+files/barf.pdf

Indeed, the output of pdfinfo includes some non-UTF-8 content. In order
to defend against that, use a non-UTF-8 external format (whichever is
appropriate to your application; ISO-8859-1 is a safe "let everything
through" default).

The possibility of error is inherent in the conversion from bytes,
output from a program, into characters; unless the domain of the
conversion contains all possible byte sequences, there will be the
possibility of decoding errors. I'd be sympathetic to the concept of
allowing run-program to produce an octet stream, but with the API as it
is the answer is to use a unibyte encoding and deal with reconverting
the results later if necessary.

Christophe

Changed in sbcl:
status: New → Invalid
Paul Khuong (pvk) wrote :

pkhuong@2delilah:/tmp$ pdfinfo barf.pdf | file -
/dev/stdin: ISO-8859 English text

The :external-format argument to run-program lets you override the default (which is based on the current LOCALE settings).

Paul Khuong (pvk) wrote :

FWIW, the headers look slightly corrupted: the creation date is reported as "þÿ" (bytes #xFE #xFF). If you actually expect the response to be UTF-8 with some corruption, you can instead exploit the USE-VALUE restart to provide replacement characters.

 status new
 done

I think I closed too hastily.

Dominic Pearson <email address hidden> writes:

> debugger invoked on a SB-IMPL::INVALID-UTF8-STARTER-BYTE in thread #<THREAD
> "initial thread" RUNNING
>
> {1002928F31}>:
> Illegal :UTF-8 character starting at byte position 193.

This error is expected, from the invalid utf-8 content. (decoding utf-8
is defined as a non-forgiving process, for the avoidance of security
problems.)

> debugger invoked on a SIMPLE-ERROR in thread #<THREAD "initial thread" RUNNING
> {1002928F31}>:
> non-empty buffer when EOF reached while reading from child: #(84 105 116 108
> 101 58 32 32 32
> 32 32 32 ...)
>
> Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL.

This error is more worrying.

Christophe

Changed in sbcl:
status: Invalid → New
Dominic Pearson (dsp-5) wrote :

commit 516fe4b0f2272e154575e8024b0b12cbf27c827c
Author: Max Mikhanosha <email address hidden>
Date: Sat, 3 Sep 2011 18:38:26 +0000 (14:38 -0400)
Committer: Christophe Rhodes <email address hidden>
Commit date: Tue, 6 Sep 2011 13:53:26 +0000 (14:53 +0100)

Fix (run-program) to cleanup fd handlers

Signed-off-by: Christophe Rhodes <email address hidden>

Changed in sbcl:
status: New → Fix Committed
Changed in sbcl:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers