* "script" polls slave pts at high speed consuming 100% CPU, assuming this won't last. But sometimes it does last forever.
* Polling code was introduced in commit below:
commit 54c6611d6f7b73609a5331f4d0bcf63c4af6429e
Author: Karel Zak <email address hidden>
Date: Wed Jun 17 15:48:50 2015 +0200
script: fix EOF problems
* remove STDIN from poll() if:
* STDIN already closed -- poll returns POLLHUP, for example:
echo "date" | script
* detect EOF as returned by read(), for example:
script -c "echo Hello" < /dev/null
* don't write to master when there is still something to read from
slave (it means shell is not initialized yet or busy)
Signed-off-by: Karel Zak <email address hidden>
## Now what ?
What should be changed?
It script to blame or is is the shell? Additional info: exact same operation (shell, etc) have been performed for years, no problem. Even on this distro (Ubuntu 16.04) it usually works. It *always* works on Ubuntu <=15.10, Debian 8.3, Fedora 23.
## Status
Bug still occurs every few days, when I use Ubuntu 16.04, not when using others.
## Trying to get more precise info
### Location in source code
Following https:/ /wiki.ubuntu. com/DebuggingPr ogramCrash I could get an interactive gdb prompt showing that it's blocked in the while loop below:
term-utils/script.c lines 278-287:
static void wait_for_ empty_fd( int fd)
{
struct pollfd fds[] = {
{ .fd = fd, .events = POLLIN }
};
while (poll(fds, 1, 10) == 1) {
DBG(IO, ul_debug(" data to read"));
}
}
### Activity at debugger level
Here's the gdb log of the loop when pressing return key at each prompt repeats "step" gdb command:
wait_for_empty_fd (fd=<optimized out>) at term-utils/ script. c:284 ff840) at /usr/include/ x86_64- linux-gnu/ bits/poll2. h:46 unix/syscall- template. S:84 unix/syscall- template. S: Aucun fichier ou dossier de ce type.
284 while (poll(fds, 1, 10) == 1) {
(gdb)
285 DBG(IO, ul_debug(" data to read"));
(gdb)
284 while (poll(fds, 1, 10) == 1) {
(gdb)
poll (__timeout=10, __nfds=1, __fds=0x7fffd0e
46 return __poll_alias (__fds, __nfds, __timeout);
(gdb)
poll () at ../sysdeps/
84 ../sysdeps/
(gdb)
(gdb) bt ff840) at /usr/include/ x86_64- linux-gnu/ bits/poll2. h:46 script. c:284 entry=0x7fffd0f 019c0, buf=buf@ entry=0x7fffd0e ff890 "sometext"..., bufsz=bufsz@ entry=1) at term-utils/ script. c:303 entry=0x7fffd0f 019c0, fd=0, eof=eof@ entry=0x7fffd0f 01904) at term-utils/ script. c:346 19c0) at term-utils/ script. c:475 script. c:781
#0 poll (__timeout=10, __nfds=1, __fds=0x7fffd0e
#1 wait_for_empty_fd (fd=<optimized out>) at term-utils/
#2 write_to_shell (ctl=ctl@
#3 0x0000000000402e6d in handle_io (ctl=ctl@
#4 0x00000000004023fb in do_io (ctl=0x7fffd0f0
#5 main (argc=<optimized out>, argv=<optimized out>) at term-utils/
(gdb) print __fds
$1 = (struct pollfd *) 0x7fffd0eff840
(gdb) print *__fds
$2 = {fd = 4, events = 1, revents = 1}
lrwx------ 1 stephane stephane 64 Apr 5 16:38 0 -> /dev/pts/6 [signalfd] le.txt le.timings
lrwx------ 1 stephane stephane 64 Apr 5 16:38 1 -> /dev/pts/6
lrwx------ 1 stephane stephane 64 Apr 5 16:38 2 -> /dev/pts/6
lrwx------ 1 stephane stephane 64 Apr 5 16:38 3 -> /dev/ptmx
lrwx------ 1 stephane stephane 64 Apr 5 16:38 4 -> /dev/pts/21
lrwx------ 1 stephane stephane 64 Apr 5 16:38 5 -> anon_inode:
l-wx------ 1 stephane stephane 64 Apr 5 16:38 6 -> /tmp/myscriptfi
l-wx------ 1 stephane stephane 64 Apr 5 16:38 7 -> /tmp/myscriptfi
## Summary
* "script" polls slave pts at high speed consuming 100% CPU, assuming this won't last. But sometimes it does last forever.
* Polling code was introduced in commit below:
commit 54c6611d6f7b736 09a5331f4d0bcf6 3c4af6429e
Author: Karel Zak <email address hidden>
Date: Wed Jun 17 15:48:50 2015 +0200
script: fix EOF problems
* remove STDIN from poll() if:
* STDIN already closed -- poll returns POLLHUP, for example:
* detect EOF as returned by read(), for example:
* don't write to master when there is still something to read from
slave (it means shell is not initialized yet or busy)
Signed-off-by: Karel Zak <email address hidden>
## Now what ?
What should be changed?
It script to blame or is is the shell? Additional info: exact same operation (shell, etc) have been performed for years, no problem. Even on this distro (Ubuntu 16.04) it usually works. It *always* works on Ubuntu <=15.10, Debian 8.3, Fedora 23.
How to debug further?