Comment 4 for bug 1791758

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Kernel core dump analysis:

crash> set
PID: 23697
COMMAND: "kworker/u82:0"
TASK: 88370bcfaa80 [THREAD_INFO: 883708104000]
CPU: 33
STATE: TASK_RUNNING (PANIC)

crash> bt
PID: 23697 TASK: 88370bcfaa80 CPU: 33 COMMAND: "kworker/u82:0"
[...]
#3 [883708107b78] __bad_area_nosemaphore at 8106a889
#4 [883708107bc0] bad_area_nosemaphore at 8106a9a3
#5 [883708107bd0] __do_page_fault at 8106aff2
#6 [883708107c30] trace_do_page_fault at 8106b3f7
#7 [883708107c60] do_async_page_fault at 81063e19
#8 [883708107c70] async_page_fault at 81822958
#9 [883708107cf8] n_tty_receive_buf_common at 814e676a
#10 [883708107dc8] n_tty_receive_buf2 at 814e71f4
#11 [883708107dd8] flush_to_ldisc at 814e9be5
#12 [883708107e20] process_one_work at 8109b0b6
#13 [883708107e68] worker_thread at 8109ba9a
[...]

crash> bt -f
[...]
#9 [883708107cf8] n_tty_receive_buf_common at 814e676a
[...]
883708107d80: 881bc78e4c20 0000000000000000
883708107d90: 0000000000000014 881bc78e4c00
883708107da0: 881bc78e7800 881cba173d80
883708107db0: 883706cae828 883706cae808
883708107dc0: 883708107dd0 814e71f4
#10 [883708107dc8] n_tty_receive_buf2 at 814e71f4
[...]

From the stack frame, we can infer that "struct tty_struct" is at
0x881bc78e7800 :

crash> tty_struct -x 881bc78e7800 | grep name
name = "pts3\000...

Also, from the stack frame we see a value 0x14 there, which represents
the count value in the function:

static int n_tty_receive_buf2(struct tty_struct *tty, const unsigned char *cp,
char *fp, int count)
{
return n_tty_receive_buf_common(tty, cp, fp, count, 1);
}

Since 0x14 mean 20 in decimal, I'd expect a 20 characters string,
which turns to be true ( char *cp is at 881bc78e4c20):

crash> rd -a 881bc78e4c20
881bc78e4c20: source /root/openrc

Something is issuing the command "source /root/openrc" to PTS/3.

Checking the "files" command, we get:

crash> foreach files -R dev/pts/3
PID: 2288 TASK: 883786e2ea40 CPU: 29 COMMAND: "sshd"
ROOT: / CWD: /
FD FILE DENTRY INODE TYPE PATH
9 8839b4ce9a00 881a0ba4da40 8838f71fcf88 CHR /dev/pts/3

And checking ssh processes:

crash> ps|grep ssh
2236 7180 18 8836f9dd2a80 IN 0.0 149480 8636 sshd
2275 2274 37 883706e01540 IN 0.0 37836 4940 ssh
2288 2236 29 883786e2ea40 UN 0.0 149480 1372 sshd
7180 1 17 881cb99b6a40 IN 0.0 57204 5240 sshd
14319 7180 2 8836dfd91540 IN 0.0 149480 8460 sshd

All except 2288 are scheduled after a select() syscall.
PID 2288 looks interesting:

crash> bt 2288
PID: 2288 TASK: 883786e2ea40 CPU: 29 COMMAND: "sshd"
[...]
#4 [88373e0bfb48] down_write at 8181e42d
#5 [88373e0bfb60] tty_unthrottle at 814e75be
#6 [88373e0bfb80] n_tty_open at 814e4fb9
#7 [88373e0bfba0] tty_ldisc_open at 814e8bd5
#8 [88373e0bfbc0] tty_ldisc_reinit at 814e9112
#9 [88373e0bfbf0] tty_reopen at 814df50a
#10 [88373e0bfc08] tty_open at 814e318e
#11 [88373e0bfc70] chrdev_open at 8120d5c4
#12 [88373e0bfcb0] do_dentry_open at 81206a6a
#13 [88373e0bfcf8] vfs_open at 81207b35
#14 [88373e0bfd20] path_openat at 81215fed
[...]

This is the one that seems to be racing with the flush_work from CPU 33
that led to the crash.
Since we have the tty_reopen() in the call trace, it's clear it's re-opening
an existing tty after it was closed, but the flush work was running and
the crash happened.