inaddy@ocfs2defrag:~$ while true; do sudo cat /proc/1452/stack; done > bleh.txt
inaddy@ocfs2defrag:~$ cat bleh.txt | sort -u
[<0>] 0x0
Stack does not help me at all (not being updated in execution path from the user<-> kernel context switch). Still, process is consuming100% of user time:
It spends almost no time in waiting for CPU AND absolutely no time in scheduling (as its the only task currently really running), so its something in userland indeed.... debugging it:
(gdb)
#0 0x0000ffffbf618b10 in _getopt_internal_r
(argc=3, argv=0xfffffffff868, optstring=0xaaaaaaaacfc0 "gvclh",
longopts=0x0, longind=0x0, long_only=0, d=d@entry=0xffffbf6cad88 <getopt_data>,
posixly_correct=<optimized out>)
at getopt.c:508
#1 0x0000ffffbf618f0c in _getopt_internal
(argc=<optimized out>, argv=<optimized out>, optstring=<optimized out>,
longopts=<optimized out>, longind=<optimized out>, long_only=<optimized out>,
posixly_correct=<optimized out>)
at getopt.c:711
#2 0x0000aaaaaaaab43c in parse_opt
(_mode_flag=<synthetic pointer>, argv=0xfffffffff868, argc=3) at main.c:585
#3 main (argc=3, argv=0xfffffffff868) at main.c:653
after some time:
(gdb) bt
#0 0x0000ffffbf618950 in _getopt_internal_r
(argc=3, argv=0xfffffffff868, optstring=0xaaaaaaaacfc0 "gvclh",
longopts=0x0, longind=0x0, long_only=0, d=d@entry=0xffffbf6cad88 <getopt_data>,
posixly_correct=<optimized out>)
at getopt.c:500
#1 0x0000ffffbf618f0c in _getopt_internal
(argc=<optimized out>, argv=<optimized out>, optstring=<optimized out>,
longopts=<optimized out>, longind=<optimized out>, long_only=<optimized out>,
posixly_correct=<optimized out>)
at getopt.c:711
#2 0x0000aaaaaaaab43c in parse_opt
(_mode_flag=<synthetic pointer>, argv=0xfffffffff868, argc=3) at main.c:585
#3 main (argc=3, argv=0xfffffffff868) at main.c:653
after some more time:
(gdb) bt
#0 0x0000ffffbf618cec in _getopt_internal_r
(argc=3, argv=0xfffffffff868, optstring=0xaaaaaaaacfc0 "gvclh",
longopts=0x0, longind=0x0, long_only=0, d=d@entry=0xffffbf6cad88 <getopt_data>,
posixly_correct=<optimized out>)
at getopt.c:512
#1 0x0000ffffbf618f0c in _getopt_internal
(argc=<optimized out>, argv=<optimized out>, optstring=<optimized out>,
longopts=<optimized out>, longind=<optimized out>, long_only=<optimized out>,
posixly_correct=<optimized out>)
at getopt.c:711
#2 0x0000aaaaaaaab43c in parse_opt
(_mode_flag=<synthetic pointer>, argv=0xfffffffff868, argc=3) at main.c:585
#3 main (argc=3, argv=0xfffffffff868) at main.c:653
Checking glibc getopt.c code (around lines 500 <-> 512) I can see we are stuck in this loop:
if (d->__nextchar == NULL || *d->__nextchar == '\0')
{
/* Advance to the next ARGV-element. */
...
}
And I'll try to discover why tomorrow (and report upstream).
inaddy@ ocfs2defrag: ~$ sudo cat /proc/1452/stack
[<0>] 0x0
inaddy@ ocfs2defrag: ~$ while true; do sudo cat /proc/1452/stack; done > bleh.txt ocfs2defrag: ~$ cat bleh.txt | sort -u
inaddy@
[<0>] 0x0
Stack does not help me at all (not being updated in execution path from the user<-> kernel context switch). Still, process is consuming100% of user time:
%Cpu3 :100.0 us, 0.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1452 root 20 0 1936 428 360 R 100.0 0.0 6:27.67 defragfs.ocfs2
Initial check on time spend scheduling to discard kernel issue like I suspected before:
inaddy@ ocfs2defrag: ~$ sudo perf sched timehist -p 1452
time cpu task name wait time sch delay run time
[tid/ pid] (msec) (msec) (msec) ------- ------- ------- -- --------- --------- --------- ocfs2[1452] 0.000 0.000 0.000 ocfs2[1452] 0.029 0.000 567.930 ocfs2[1452] 0.018 0.000 1447.901 ocfs2[1452] 0.027 0.000 1567.908 ocfs2[1452] 0.030 0.000 415.921 ocfs2[1452] 0.027 0.000 567.936 ocfs2[1452] 0.019 0.000 743.943 ocfs2[1452] 0.025 0.000 703.934 ocfs2[1452] 0.026 0.000 671.940 ocfs2[1452] 0.028 0.000 199.957 ocfs2[1452] 0.017 0.000 1143.929 ocfs2[1452] 0.023 0.000 535.945 ocfs2[1452] 0.016 0.000 1447.941 ocfs2[1452] 0.027 0.000 2015.871 ocfs2[1452] 0.036 0.000 535.970 ocfs2[1452] 0.020 0.000 999.899 ocfs2[1452] 0.032 0.000 479.942 ocfs2[1452] 0.029 0.000 1279.924 ocfs2[1452] 0.028 0.000 703.938 ocfs2[1452] 0.027 0.000 535.946
--------------- ------ -------
111.555973 [0003] defragfs.
112.123934 [0003] defragfs.
113.571854 [0003] defragfs.
115.139790 [0003] defragfs.
115.555743 [0003] defragfs.
116.123707 [0003] defragfs.
116.867670 [0003] defragfs.
117.571630 [0003] defragfs.
118.243597 [0003] defragfs.
118.443583 [0003] defragfs.
119.587530 [0003] defragfs.
120.123500 [0003] defragfs.
121.571458 [0003] defragfs.
123.587358 [0003] defragfs.
124.123365 [0003] defragfs.
125.123284 [0003] defragfs.
125.603259 [0003] defragfs.
126.883212 [0003] defragfs.
127.587179 [0003] defragfs.
128.123153 [0003] defragfs.
It spends almost no time in waiting for CPU AND absolutely no time in scheduling (as its the only task currently really running), so its something in userland indeed.... debugging it:
(gdb)
#0 0x0000ffffbf618b10 in _getopt_internal_r f868, optstring= 0xaaaaaaaacfc0 "gvclh", 0xffffbf6cad88 <getopt_data>, correct= <optimized out>)
(argc=3, argv=0xffffffff
longopts=0x0, longind=0x0, long_only=0, d=d@entry=
posixly_
at getopt.c:508
#1 0x0000ffffbf618f0c in _getopt_internal <optimized out>, argv=<optimized out>, optstring= <optimized out>, <optimized out>, longind=<optimized out>, long_only= <optimized out>, correct= <optimized out>)
(argc=
longopts=
posixly_
at getopt.c:711
#2 0x0000aaaaaaaab43c in parse_opt flag=<synthetic pointer>, argv=0xffffffff f868, argc=3) at main.c:585
(_mode_
#3 main (argc=3, argv=0xffffffff f868) at main.c:653
after some time:
(gdb) bt
#0 0x0000ffffbf618950 in _getopt_internal_r f868, optstring= 0xaaaaaaaacfc0 "gvclh", 0xffffbf6cad88 <getopt_data>, correct= <optimized out>)
(argc=3, argv=0xffffffff
longopts=0x0, longind=0x0, long_only=0, d=d@entry=
posixly_
at getopt.c:500
#1 0x0000ffffbf618f0c in _getopt_internal <optimized out>, argv=<optimized out>, optstring= <optimized out>, <optimized out>, longind=<optimized out>, long_only= <optimized out>, correct= <optimized out>)
(argc=
longopts=
posixly_
at getopt.c:711
#2 0x0000aaaaaaaab43c in parse_opt flag=<synthetic pointer>, argv=0xffffffff f868, argc=3) at main.c:585
(_mode_
#3 main (argc=3, argv=0xffffffff f868) at main.c:653
after some more time:
(gdb) bt
#0 0x0000ffffbf618cec in _getopt_internal_r f868, optstring= 0xaaaaaaaacfc0 "gvclh", 0xffffbf6cad88 <getopt_data>, correct= <optimized out>)
(argc=3, argv=0xffffffff
longopts=0x0, longind=0x0, long_only=0, d=d@entry=
posixly_
at getopt.c:512
#1 0x0000ffffbf618f0c in _getopt_internal <optimized out>, argv=<optimized out>, optstring= <optimized out>, <optimized out>, longind=<optimized out>, long_only= <optimized out>, correct= <optimized out>)
(argc=
longopts=
posixly_
at getopt.c:711
#2 0x0000aaaaaaaab43c in parse_opt flag=<synthetic pointer>, argv=0xffffffff f868, argc=3) at main.c:585
(_mode_
#3 main (argc=3, argv=0xffffffff f868) at main.c:653
Checking glibc getopt.c code (around lines 500 <-> 512) I can see we are stuck in this loop:
if (d->__nextchar == NULL || *d->__nextchar == '\0')
{
/* Advance to the next ARGV-element. */
...
}
And I'll try to discover why tomorrow (and report upstream).