init: "script" stanza leaks file descriptor to child process

Bug #619269 reported by Petr Lautrbach on 2010-08-17
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
upstart
Medium
Scott James Remnant
upstart (Ubuntu)
Undecided
Unassigned

Bug Description

Upstart sends job scripts to shell via pipe represented by /dev/fd/<fd> (/proc/self/fd/) file in proc filesystem, e.g.:

/bin/sh -e /dev/fd/9

This fd stays open while child shell process is running. There are some applications like pvdisplay (and other lvm2 tools) which complaint about it.

How to reproduce:

Boot to runlevel 1
Run pvdisplay:

# pvdisplay
File descriptor 9 (pipe:[9737]) leaked on pvdisplay invocation. Parent PID 845: /bin/bash
  --- Physical volume ---
...

Possible solution would be send first closing pipe fd command to shell:

+ close_fd = NIH_MUST (nih_sprintf (NULL, "<&%d- ;", fds[0]));
+ NIH_ZERO (nih_io_write (io, close_fd, strlen (close_fd)));
+
   NIH_ZERO (nih_io_write (io, script, strlen (script)));
   nih_io_shutdown (io);

Good catch, I'm pretty sure I had that <&3-; in there at one point, I must have lost it as I can't see it anywhere in the bzr history

Changed in upstart:
importance: Undecided → Medium
status: New → Triaged
summary: - "script" stanza leaks file descriptor to child process
+ init: "script" stanza leaks file descriptor to child process

Can the shell continue to read the script if the file descriptor the script is on is closed?

Because I'm not convinced this works.

Closing the file descriptor like this may result in the shell failing
to process the entire script, and I haven't seen documented evidence
to the contrary

On Sat, Feb 5, 2011 at 10:45 PM, AmenophisIII <email address hidden> wrote:
> hm seems it is still not in scott's bazaar repo:
> http://bazaar.launchpad.net/~canonical-scott/upstart/trunk/view/head:/init/job_process.c#L317
> why?
>
> redhat fixed it exactly as suggested above
> (https://bugzilla.redhat.com/attachment.cgi?id=455799&action=edit)
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/619269
>
> Title:
>  init: "script" stanza leaks file descriptor to child process
>
> Status in Upstart:
>  Triaged
>
> Bug description:
>  Upstart sends job scripts to shell via pipe represented by
>  /dev/fd/<fd> (/proc/self/fd/) file in proc filesystem, e.g.:
>
>  /bin/sh -e /dev/fd/9
>
>  This fd stays open while child shell process is running. There are
>  some applications like pvdisplay (and other lvm2 tools) which
>  complaint about it.
>
>  How to reproduce:
>
>  Boot to runlevel 1
>  Run pvdisplay:
>
>  # pvdisplay
>  File descriptor 9 (pipe:[9737]) leaked on pvdisplay invocation. Parent PID 845: /bin/bash
>    --- Physical volume ---
>  ...
>
>  Possible solution would be send first closing pipe fd command to
>  shell:
>
>  +  close_fd = NIH_MUST (nih_sprintf (NULL, "<&%d- ;", fds[0]));
>  +  NIH_ZERO (nih_io_write (io, close_fd, strlen (close_fd)));
>  +
>     NIH_ZERO (nih_io_write (io, script, strlen (script)));
>     nih_io_shutdown (io);
>
>
>

Scott James Remnant (scott) wrote :

Setting back to Confirmed, because it's not clear at all to me that you *can* close this file descriptor - it's the file descriptor the shell is receiving the script on!

Changed in upstart:
status: Triaged → Confirmed
Johan Kiviniemi (ion) wrote :

Judging from…

ghc -e 'putStrLn $ "exec 9<&-; " ++ replicate 100000 (head "\n") ++ "ls -l /proc/self/fd;"' | strace sh -e /proc/self/fd/9 9<&0 0</dev/null

…at least dash, zsh and bash read the entire script into memory before executing it. All of them read the script, closed fd 9 and ran the ls command successfully.

Scott James Remnant (scott) wrote :

Interesting, I remember seeing issues even on files where editing the
script while it's running results in strange shell behaviour because
it reads as it goes.

On Fri, Mar 11, 2011 at 11:38 PM, Johan Kiviniemi
<email address hidden> wrote:
> Judging from…
>
> ghc -e 'putStrLn $ "exec 9<&-; " ++ replicate 100000 (head "\n") ++ "ls
> -l /proc/self/fd;"' | strace sh -e /proc/self/fd/9 9<&0 0</dev/null
>
> …at least dash, zsh and bash read the entire script into memory before
> executing it. All of them read the script, closed fd 9 and ran the ls
> command successfully.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/619269
>
> Title:
>  init: "script" stanza leaks file descriptor to child process
>
> Status in Upstart:
>  Confirmed
>
> Bug description:
>  Upstart sends job scripts to shell via pipe represented by
>  /dev/fd/<fd> (/proc/self/fd/) file in proc filesystem, e.g.:
>
>  /bin/sh -e /dev/fd/9
>
>  This fd stays open while child shell process is running. There are
>  some applications like pvdisplay (and other lvm2 tools) which
>  complaint about it.
>
>  How to reproduce:
>
>  Boot to runlevel 1
>  Run pvdisplay:
>
>  # pvdisplay
>  File descriptor 9 (pipe:[9737]) leaked on pvdisplay invocation. Parent PID 845: /bin/bash
>    --- Physical volume ---
>  ...
>
>  Possible solution would be send first closing pipe fd command to
>  shell:
>
>  +  close_fd = NIH_MUST (nih_sprintf (NULL, "<&%d- ;", fds[0]));
>  +  NIH_ZERO (nih_io_write (io, close_fd, strlen (close_fd)));
>  +
>     NIH_ZERO (nih_io_write (io, script, strlen (script)));
>     nih_io_shutdown (io);
>

Johan Kiviniemi (ion) wrote :

When writing the script to a file and running sh filename 9<&0, the shells (dash, zsh, bash) seem to evaluate the code as they read the file. It seems there’s a difference in behavior based on whether the script’s being read from a pipe or from a file.

Scott James Remnant (scott) wrote :

Exactly ... and the way the shell is invoked with /dev/fd it should
look like a file to the shell, rather than a pipe, no?

How are they testing for this, is it documented anywhere?

Scott

On Sat, Mar 12, 2011 at 3:59 PM, Johan Kiviniemi
<email address hidden> wrote:
> When writing the script to a file and running sh filename 9<&0, the
> shells (dash, zsh, bash) seem to evaluate the code as they read the
> file. It seems there’s a difference in behavior based on whether the
> script’s being read from a pipe or from a file.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/619269
>
> Title:
>  init: "script" stanza leaks file descriptor to child process
>
> Status in Upstart:
>  Confirmed
>
> Bug description:
>  Upstart sends job scripts to shell via pipe represented by
>  /dev/fd/<fd> (/proc/self/fd/) file in proc filesystem, e.g.:
>
>  /bin/sh -e /dev/fd/9
>
>  This fd stays open while child shell process is running. There are
>  some applications like pvdisplay (and other lvm2 tools) which
>  complaint about it.
>
>  How to reproduce:
>
>  Boot to runlevel 1
>  Run pvdisplay:
>
>  # pvdisplay
>  File descriptor 9 (pipe:[9737]) leaked on pvdisplay invocation. Parent PID 845: /bin/bash
>    --- Physical volume ---
>  ...
>
>  Possible solution would be send first closing pipe fd command to
>  shell:
>
>  +  close_fd = NIH_MUST (nih_sprintf (NULL, "<&%d- ;", fds[0]));
>  +  NIH_ZERO (nih_io_write (io, close_fd, strlen (close_fd)));
>  +
>     NIH_ZERO (nih_io_write (io, script, strlen (script)));
>     nih_io_shutdown (io);
>

Johan Kiviniemi (ion) wrote :

Having an Upstart job run stat -L /proc/self/fd/N where N is the fd over which Upstart provides the script, one gets:

  File: `/proc/self/fd/13'
  Size: 0 Blocks: 0 IO Block: 4096 fifo
Device: 8h/8d Inode: 556606 Links: 1
Access: (0600/prw-------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2011-03-16 20:29:08.939557392 +0200
Modify: 2011-03-16 20:29:08.939557392 +0200
Change: 2011-03-16 20:29:08.939557392 +0200

It definitely looks like a pipe.

Scott James Remnant (scott) wrote :

Ah, so do you reckon it fstat()s the file descriptor of the opened
shell script to determine whether to read it all? Because I can't find
any code like that in dash

On Wed, Mar 16, 2011 at 11:34 AM, Johan Kiviniemi
<email address hidden> wrote:
> Having an Upstart job run stat -L /proc/self/fd/N where N is the fd over
> which Upstart provides the script, one gets:
>
>  File: `/proc/self/fd/13'
>  Size: 0               Blocks: 0          IO Block: 4096   fifo
> Device: 8h/8d   Inode: 556606      Links: 1
> Access: (0600/prw-------)  Uid: (    0/    root)   Gid: (    0/    root)
> Access: 2011-03-16 20:29:08.939557392 +0200
> Modify: 2011-03-16 20:29:08.939557392 +0200
> Change: 2011-03-16 20:29:08.939557392 +0200
>
> It definitely looks like a pipe.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/619269
>
> Title:
>  init: "script" stanza leaks file descriptor to child process
>
> Status in Upstart:
>  Confirmed
>
> Bug description:
>  Upstart sends job scripts to shell via pipe represented by
>  /dev/fd/<fd> (/proc/self/fd/) file in proc filesystem, e.g.:
>
>  /bin/sh -e /dev/fd/9
>
>  This fd stays open while child shell process is running. There are
>  some applications like pvdisplay (and other lvm2 tools) which
>  complaint about it.
>
>  How to reproduce:
>
>  Boot to runlevel 1
>  Run pvdisplay:
>
>  # pvdisplay
>  File descriptor 9 (pipe:[9737]) leaked on pvdisplay invocation. Parent PID 845: /bin/bash
>    --- Physical volume ---
>  ...
>
>  Possible solution would be send first closing pipe fd command to
>  shell:
>
>  +  close_fd = NIH_MUST (nih_sprintf (NULL, "<&%d- ;", fds[0]));
>  +  NIH_ZERO (nih_io_write (io, close_fd, strlen (close_fd)));
>  +
>     NIH_ZERO (nih_io_write (io, script, strlen (script)));
>     nih_io_shutdown (io);
>

Johan Kiviniemi (ion) wrote :

I must apologize: I managed to completely misread the strace output due to being too tired to think.

What *really* seems to be happening:

• The input file descriptor is duplicated and the original fd is closed.
• The duplicated fd is marked FD_CLOEXEC (dash, bash) or closed just before exec in children (zsh).
• The script *is* evaluated as it’s being read.
• The original half of the input pipe can be closed freely by the script as the shell is reading from a duplicate of the fd that won’t be seen by any children.

dash reading from a file:

open("/tmp/foo.sh", O_RDONLY) = 3
fcntl64(3, F_DUPFD, 10) = 10
close(3) = 0
fcntl64(10, F_SETFD, FD_CLOEXEC) = 0

read(10, "echo foo\n", 8192) = 9
write(1, "foo\n", 4) = 4

dash reading from a pipe:

open("/proc/self/fd/9", O_RDONLY) = 3
fcntl64(3, F_DUPFD, 10) = 10
close(3) = 0
fcntl64(10, F_SETFD, FD_CLOEXEC) = 0

read(10, "exec 9<&-; ls -l /proc/self/fd\n", 8192) = 31
fcntl64(9, F_DUPFD, 10) = 11
close(9) = 0
fcntl64(11, F_SETFD, FD_CLOEXEC) = 0
close(11) = 0

lr-x------ 1 ion ion 64 2011-03-16 22:26 0 -> /dev/null
lrwx------ 1 ion ion 64 2011-03-16 22:26 1 -> /dev/pts/5
lrwx------ 1 ion ion 64 2011-03-16 22:26 2 -> /dev/pts/5
lr-x------ 1 ion ion 64 2011-03-16 22:26 3 -> /proc/13746/fd

bash reading from a pipe:

open("/proc/self/fd/9", O_RDONLY|O_LARGEFILE) = 3

dup2(3, 255) = 255
close(3) = 0
fcntl64(255, F_SETFD, FD_CLOEXEC) = 0

read(255, "e", 1) = 1
read(255, "x", 1) = 1
read(255, "e", 1) = 1
read(255, "c", 1) = 1
read(255, " ", 1) = 1

fcntl64(9, F_DUPFD, 10) = 10

fcntl64(10, F_SETFD, FD_CLOEXEC) = 0
close(9) = 0
close(10) = 0

lr-x------ 1 ion ion 64 2011-03-16 22:26 0 -> /dev/null
lrwx------ 1 ion ion 64 2011-03-16 22:26 1 -> /dev/pts/5
lrwx------ 1 ion ion 64 2011-03-16 22:26 2 -> /dev/pts/5
lr-x------ 1 ion ion 64 2011-03-16 22:26 3 -> /proc/13789/fd

zsh reading from a pipe:

open("/proc/self/fd/9", O_RDONLY|O_NOCTTY|O_LARGEFILE) = 3
fcntl64(3, F_DUPFD, 10) = 11
close(3) = 0

read(11, "exec 9<&-; ls -l /proc/self/fd\n", 4096) = 31

fcntl64(9, F_DUPFD, 10) = 12
close(9) = 0

close(12) = 0

[pid 14022] close(11) = 0

[pid 14022] execve("/bin/ls", ["ls", "-l", "/proc/self/fd"], [/* 53 vars */]) = 0

lr-x------ 1 ion ion 64 2011-03-16 22:33 0 -> /dev/null
lrwx------ 1 ion ion 64 2011-03-16 22:33 1 -> /dev/pts/5
lrwx------ 1 ion ion 64 2011-03-16 22:33 2 -> /dev/pts/5
lr-x------ 1 ion ion 64 2011-03-16 22:33 3 -> /proc/14022/fd

Scott James Remnant (scott) wrote :

Argh, I'm being a complete numpty. I had forgotten that there are
always two file descriptors here.

The shell obviously gets 0, 1 & 2 as standard, and we're telling the
shell to open /proc/self/fd/3 - which equates to an fd 3 that we're
leaving open for itself. So the shell is exec()d with:

  0 <- stdin
  1 -> stdout
  2 -> stderr
  3 <- upstart

The shell still has to open() /proc/self/fd/3, which gives it a new
file descriptor (10). This points to the same underlying "file" and is
equivalent to as if it had just dup()d 3. So now:

  0 <- stdin
  1 -> stdout
  2 -> stderr
  3 <- upstart
 10 <- upstart (reading from, FD_CLOEXEC)

3 and 10 connect to the same underlying file, so it doesn't matter if
the shell closes 3 - the very fact it's read the "exec 3<&-" line
means that it must have opened the file and be reading from it on fd
10. So closing 3 is just cleaning up after a dup().

So if the first line of all passed in scripts was "exec 3<&-", after
that line, the shell would have:

  0 <- stdin
  1 -> stdout
  2 -> stderr
 10 <- upstart (reading from, FD_CLOEXEC)

It still has the pipe to upstart open, it's just on a different fd and
not visible to sub-processes -- if you'd done ls /proc/$$/fd you would
still see it ;-)

Changed in upstart:
status: Confirmed → Triaged
Changed in upstart:
status: Triaged → Fix Committed
assignee: nobody → Scott James Remnant (scott)
milestone: none → 1.1
Changed in upstart:
status: Fix Committed → Fix Released
Changed in upstart (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers