ubuntu-realtime

Bug #1998536
Comment #31

Comment 31 for bug 1998536

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2023-03-22:

#31

I stepped back from the bisect/debugging and looked at the higher level stats. The stress-ng test is started with one process for each core, and there are 96 of them. I looked at top[3] during a hang, and many of the stress-ng processes are running 'R'. However, a sysrq-q[2] also shows many stress-ng processes are 'D' in uninterruptible sleep. What also sticks out to me is all the stress-ng processes are running as root with a priority of 20. Looking back at one of the call traces[1], I see jbd2 stuck in an uninterruptible state:
...
[ 4461.908213] task:journal-offline state:D stack: 0 pid:17541 ppid: 1 flags:0x00000226
...

The jdb2 kernel thread also running with a priority of 20[4]. When the hang happens, jbd2 is also stuck in an uninterruptible state(As well as systemd-journal):
...
1521 root 20 0 0 0 0 D 0.0 0.0 4:10.48 jbd2/sda2-8
1593 root 19 -1 64692 15832 14512 D 0.0 0.1 0:01.54 systemd-journal
...

I am pinning all of the stress-ng threads to cores 1-95 and the kernel threads to a housekeeping cpu, 0.

Output from cmdline:
"BOOT_IMAGE=/boot/vmlinuz-5.15.0-1033-realtime root=UUID=3583d8c4-d539-439f-9d50-4341675268cc ro console=tty0 console=ttyS0,115200 skew_tick=1 isolcpus=managed_irq,domain,1-95 intel_pstate=disable nosoftlockup tsc=nowatchdog crashkernel=0M-2G:128M,2G-6G:256M,6G-8G:512M,8G-:768M"

However, even with this pinning, stress-ng ends up running on cpu 0, per the ps output[4]. This appears to be causing a dead-lock between jdb2 and the stress-ng processes, since they share the same priority/niceness.

To confirm this idea, I started test-storage / stress-ng so they had a lower priority than jbd2. I used the following:
sudo nice -10 test-storage

This causes jbd2 to continue to run with a priority of 20, but all the stress-ng threads are run with a priority of 30:

PSR TID PID COMMAND %CPU PRI NI
0 1517 1517 jbd2/sda2-8 5.0 20 0
0 125875 125875 stress-ng 15.5 30 10
0 125882 125882 stress-ng 4.4 30 10
0 125925 125925 stress-ng 4.4 30 10
...

By adding 'nice -10' the test will complete without hanging. It appears the system hang was it waiting to complete I/O, which would never happen since the jdb2 threads cannot preempt stress-ng and causes a dead-lock.

Michael, could you also try running with the following command to confirm the results:
sudo nice -10 test-storage

If this resolves the bug, there are several options:
1. Run the cert suite with a nice value for real-time tests.
2. Change the tests so they do not run as root.
3. Tune the real-time system so stress-ng threads are pinned to isolated cores and and kernel threads are on a housekeeping only core.

I'm going to investigate option 3. I am assigning cores 1-95 as the isolated cores, so stress-ng should not run on core 0, but it is. I'm going to figure out why this is happening.

[0] https://launchpadlibrarian.net/653810449/locking_issue.txt
[1] https://launchpadlibrarian.net/653810490/call_trace.txt
[2] https://launchpadlibrarian.net/655372944/sysrq-w.txt
[3] https://launchpadlibrarian.net/655374168/top-during-hang.txt
[4] https://launchpadlibrarian.net/655380123/ps-test-running.txt

I stepped back from the bisect/debugging and looked at the higher level stats.  The stress-ng test is started with one process for each core, and there are 96 of them.  I looked at top[3] during a hang, and many of the stress-ng processes are running 'R'. However, a sysrq-q[2] also shows many stress-ng processes are 'D' in uninterruptible sleep.  What also sticks out to me is all the stress-ng processes are running as root with a priority of 20. Looking back at one of the call traces[1], I see jbd2 stuck in an uninterruptible state:
...
[ 4461.908213] task:journal-offline state:D stack:    0 pid:17541 ppid:     1 flags:0x00000226
...

The jdb2 kernel thread also running with a priority of 20[4].  When the hang happens, jbd2 is also stuck in an uninterruptible state(As well as systemd-journal):
...
1521 root      20   0       0      0      0 D   0.0   0.0   4:10.48 jbd2/sda2-8
1593 root      19  -1   64692  15832  14512 D   0.0   0.1   0:01.54 systemd-journal
...

I am pinning all of the stress-ng threads to cores 1-95 and the kernel threads to a housekeeping cpu, 0.

However, even with this pinning, stress-ng ends up running on cpu 0, per the ps output[4].  This appears to be causing a dead-lock between jdb2 and the stress-ng processes, since they share the same priority/niceness.

To confirm this idea, I started test-storage / stress-ng so they had a lower priority than jbd2.  I used the following:
sudo nice -10 test-storage

This causes jbd2 to continue to run with a priority of 20, but all the stress-ng threads are run with a priority of 30:

PSR     TID     PID COMMAND         %CPU PRI  NI
0    1517    1517 jbd2/sda2-8      5.0  20   0
0  125875  125875 stress-ng       15.5  30  10
0  125882  125882 stress-ng        4.4  30  10
0  125925  125925 stress-ng        4.4  30  10
...

By adding 'nice -10' the test will complete without hanging.  It appears the system hang was it waiting to complete I/O, which would never happen since the jdb2 threads cannot preempt stress-ng and causes a dead-lock.

Michael, could you also try running with the following command to confirm the results:
sudo nice -10 test-storage

If this resolves the bug, there are several options:
1.  Run the cert suite with a nice value for real-time tests.
2.  Change the tests so they do not run as root.
3.  Tune the real-time system so stress-ng threads are pinned to isolated cores and and kernel threads are on a housekeeping only core.

I'm going to investigate option 3.  I am assigning cores 1-95 as the isolated cores, so stress-ng should not run on core 0, but it is.  I'm going to figure out why this is happening.