Comment 4 for bug 1558236

Revision history for this message
Stéphane Gourichon (stephane-gourichon-lpad) wrote :

## Status

Bug still occurs every few days, when I use Ubuntu 16.04, not when using others.

## Trying to get more precise info

### Location in source code

Following https://wiki.ubuntu.com/DebuggingProgramCrash I could get an interactive gdb prompt showing that it's blocked in the while loop below:

term-utils/script.c lines 278-287:

static void wait_for_empty_fd(int fd)
{
 struct pollfd fds[] = {
            { .fd = fd, .events = POLLIN }
 };

 while (poll(fds, 1, 10) == 1) {
  DBG(IO, ul_debug(" data to read"));
 }
}

### Activity at debugger level

Here's the gdb log of the loop when pressing return key at each prompt repeats "step" gdb command:

wait_for_empty_fd (fd=<optimized out>) at term-utils/script.c:284
284 while (poll(fds, 1, 10) == 1) {
(gdb)
285 DBG(IO, ul_debug(" data to read"));
(gdb)
284 while (poll(fds, 1, 10) == 1) {
(gdb)
poll (__timeout=10, __nfds=1, __fds=0x7fffd0eff840) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
46 return __poll_alias (__fds, __nfds, __timeout);
(gdb)
poll () at ../sysdeps/unix/syscall-template.S:84
84 ../sysdeps/unix/syscall-template.S: Aucun fichier ou dossier de ce type.
(gdb)

(gdb) bt
#0 poll (__timeout=10, __nfds=1, __fds=0x7fffd0eff840) at /usr/include/x86_64-linux-gnu/bits/poll2.h:46
#1 wait_for_empty_fd (fd=<optimized out>) at term-utils/script.c:284
#2 write_to_shell (ctl=ctl@entry=0x7fffd0f019c0, buf=buf@entry=0x7fffd0eff890 "sometext"..., bufsz=bufsz@entry=1) at term-utils/script.c:303
#3 0x0000000000402e6d in handle_io (ctl=ctl@entry=0x7fffd0f019c0, fd=0, eof=eof@entry=0x7fffd0f01904) at term-utils/script.c:346
#4 0x00000000004023fb in do_io (ctl=0x7fffd0f019c0) at term-utils/script.c:475
#5 main (argc=<optimized out>, argv=<optimized out>) at term-utils/script.c:781

(gdb) print __fds
$1 = (struct pollfd *) 0x7fffd0eff840
(gdb) print *__fds
$2 = {fd = 4, events = 1, revents = 1}

lrwx------ 1 stephane stephane 64 Apr 5 16:38 0 -> /dev/pts/6
lrwx------ 1 stephane stephane 64 Apr 5 16:38 1 -> /dev/pts/6
lrwx------ 1 stephane stephane 64 Apr 5 16:38 2 -> /dev/pts/6
lrwx------ 1 stephane stephane 64 Apr 5 16:38 3 -> /dev/ptmx
lrwx------ 1 stephane stephane 64 Apr 5 16:38 4 -> /dev/pts/21
lrwx------ 1 stephane stephane 64 Apr 5 16:38 5 -> anon_inode:[signalfd]
l-wx------ 1 stephane stephane 64 Apr 5 16:38 6 -> /tmp/myscriptfile.txt
l-wx------ 1 stephane stephane 64 Apr 5 16:38 7 -> /tmp/myscriptfile.timings

## Summary

* "script" polls slave pts at high speed consuming 100% CPU, assuming this won't last. But sometimes it does last forever.
* Polling code was introduced in commit below:

commit 54c6611d6f7b73609a5331f4d0bcf63c4af6429e
Author: Karel Zak <email address hidden>
Date: Wed Jun 17 15:48:50 2015 +0200

    script: fix EOF problems

    * remove STDIN from poll() if:

      * STDIN already closed -- poll returns POLLHUP, for example:

                echo "date" | script

      * detect EOF as returned by read(), for example:

                script -c "echo Hello" < /dev/null

    * don't write to master when there is still something to read from
      slave (it means shell is not initialized yet or busy)

    Signed-off-by: Karel Zak <email address hidden>

## Now what ?

What should be changed?

It script to blame or is is the shell? Additional info: exact same operation (shell, etc) have been performed for years, no problem. Even on this distro (Ubuntu 16.04) it usually works. It *always* works on Ubuntu <=15.10, Debian 8.3, Fedora 23.

How to debug further?