Bug #1861359 “swap storms kills interactive use” : Bugs : linux package : Ubuntu

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-29:

#1

Dependencies.txt Edit (2.0 KiB, text/plain; charset="utf-8")
ProcCpuinfoMinimal.txt Edit (1.3 KiB, text/plain; charset="utf-8")

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-29:

#2

vmstat 1 output Edit (9.9 KiB, text/plain)

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-29:

#3

a small bit of top output after an event Edit (5.4 KiB, text/plain)

Steve Beattie (sbeattie) on 2020-01-30

affects:

linux-signed-5.4 (Ubuntu) → linux (Ubuntu)

Revision history for this message

Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote on 2020-01-30: Missing required logs.

#4

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1861359

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: AlsaDevices.txt

#5

AlsaDevices.txt Edit (929 bytes, text/plain)

apport information

tags:	added: apport-collected
description:	updated

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: AplayDevices.txt

#6

AplayDevices.txt Edit (790 bytes, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: AlsaDevices.txt

#7

AlsaDevices.txt Edit (929 bytes, text/plain)

apport information

description:

updated

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: AplayDevices.txt

#8

AplayDevices.txt Edit (790 bytes, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: ArecordDevices.txt

#9

ArecordDevices.txt Edit (278 bytes, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: CRDA.txt

#10

CRDA.txt Edit (468 bytes, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: Card0.Amixer.values.txt

#11

Card0.Amixer.values.txt Edit (2.4 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: Card0.Codecs.codec.0.txt

#12

Card0.Codecs.codec.0.txt Edit (9.9 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: Card0.Codecs.codec.2.txt

#13

Card0.Codecs.codec.2.txt Edit (3.1 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: Card1.Amixer.values.txt

#14

Card1.Amixer.values.txt Edit (745 bytes, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: CurrentDmesg.txt

#15

CurrentDmesg.txt Edit (122.5 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: Lspci.txt

#16

Lspci.txt Edit (15.5 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: Lsusb.txt

#17

Lsusb.txt Edit (1.3 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: Lsusb-t.txt

#18

Lsusb-t.txt Edit (2.3 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: Lsusb-v.txt

#19

Lsusb-v.txt Edit (102.7 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: PciMultimedia.txt

#20

PciMultimedia.txt Edit (656 bytes, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: ProcCpuinfo.txt

#21

ProcCpuinfo.txt Edit (10.3 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: ProcCpuinfoMinimal.txt

#22

ProcCpuinfoMinimal.txt Edit (1.3 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: ProcInterrupts.txt

#23

ProcInterrupts.txt Edit (6.4 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: ProcModules.txt

#24

ProcModules.txt Edit (8.2 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: PulseList.txt

#25

PulseList.txt Edit (25.4 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: RfKill.txt

#26

RfKill.txt Edit (183 bytes, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: UdevDb.txt

#27

UdevDb.txt Edit (276.0 KiB, text/plain)

apport information

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-30: WifiSyslog.txt

#28

WifiSyslog.txt Edit (187.2 KiB, text/plain)

apport information

Seth Arnold (seth-arnold) on 2020-01-30

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2020-01-31:

#29

Seth and I talked about this and I marked this as affects me. If it helps, I saw this on eoan and focal doesn't make a difference (which might suggest the change is between disco and eoan).

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2020-01-31:

#30

FYI, I decided to do this:

$ sudo swapoff -a && sudo swapon -a
$ free -h
total used free shared buff/cache available
Mem: 15Gi 5.9Gi 4.8Gi 2.0Gi 4.8Gi 7.2Gi
Swap: 15Gi 348Mi 15Gi

Even though I am no where near needing to swap, after the swapoff/swapon, it immediately climbed back up to 348M.

Oddly, I did it again after flushing the page cache, and the swap stayed at zero:

$ sudo sync ; sudo sh -c 'echo 1 > /proc/sys/vm/drop_caches'
$ sudo swapoff -a && sudo swapon -a
$ free -h
total used free shared buff/cache available
Mem: 15Gi 5.6Gi 6.8Gi 2.6Gi 3.1Gi 7.0Gi
Swap: 15Gi 0B 15Gi

I don't know if flushing the page cache is related or coincidence with the second swap clearance.

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2020-01-31:

#31

I forgot to mention, I also have nvme.

Colin Ian King (colin-king) on 2020-01-31

Changed in linux (Ubuntu):
assignee:	nobody → Colin Ian King (colin-king)
importance:	Undecided → High

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-01-31:

#32

vmstat 1 output Edit (17.8 KiB, text/plain)

I had a vmstat 1 running; the entire time the system was swapping out, X was unusable.

Thanks

Andrea Righi (arighi) on 2020-02-03

Changed in linux (Ubuntu):
assignee:	Colin Ian King (colin-king) → Andrea Righi (arighi)

Revision history for this message

Andrea Righi (arighi) wrote on 2020-02-03:

#33

Hello Seth, thanks for reporting the problem. I was wondering if this could be related to the memory cgroup controller.

As a simple test could you try to reboot the system adding cgroup_disable=memory to the kernel boot parameters?

In this way if the problem goes away at least we know it's related to the memory cgroup and we can investigate more (it could a misconfiguration enforced by systemd or a recent change in the kernel that introduced this different behavior).

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-02-07:

#34

Download full text (4.9 KiB)

This kernel command line parameter didn't appear to help:

Moments after loading a new URL into firefox:

1 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 1 0 4352 1 0 7168 1 0 10240 1 0 13312 1 0 16128 0 0 18944 2 0 22016 0 0 25600 procs ------- 0 702860 17804 3699780 0 0 0 2288 492 1817 0 1 99 0 0
0 702860 17804 3699780 0 0 0 0 232 525 0 0 100 0 0
0 703112 17804 3699748 0 0 0 0 220 564 0 0 100 0 0
0 707792 17804 3699748 0 0 0 0 384 734 1 0 99 0 0
0 712468 17804 3699996 0 0 0 0 1005 854 2 0 97 0 0
0 712500 17804 3699988 0 0 0 2308 497 1908 0 0 99 0 0
0 712752 17804 3699748 0 0 0 0 240 567 0 0 100 0 0
256 632540 17804 3763996 0 232 4 476 1106 2138 2 1 96 0 0
649568 17804 3761060 0 2980 0 8132 5030 17862 1 7 92 0 0
644588 17804 3758040 0 3032 0 9104 5304 20299 0 7 93 0 0
654444 17804 3755152 0 2892 0 12864 5367 21353 0 8 92 1 0
651200 17804 3752216 0 2932 0 8804 5073 18847 0 5 95 0 0
660636 17804 3749200 0 3016 0 9048 5183 19639 1 6 94 0 0
652444 17804 3746180 0 3020 0 9060 5221 19613 0 5 95 0 0
645228 17804 3743168 0 3012 0 9032 5219 20287 0 7 93 0 0
674648 17804 3740152 0 3016 0 11148 5290 20518 0 7 93 0 0
/>----------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
free buff cache si so bi bo in cs us sy id wa st
667968 17804 3737076 0 3076 0 9228 5336 19913 0 7 93 0 0
663008 17804 3734048 0 3028 0 9084 5377 19518 0 6 94 0 0
655620 17804 3731020 0 3028 0 9080 5356 20395 0 7 93 0 0
649356 17804 3727996 0 3024 0 9076 5258 19858 0 6 94 0 0
677924 17804 3724948 0 3064 0 11388 5503 22313 0 8 92 0 0
670968 17804 3721944 0 3004 0 9016 5211 19591 0 7 93 0 0
665016 17804 3718996 0 2948 0 8844 5167 18978 0 7 93 0 0
658068 17804 3715896 0 3100 0 9300 5304 20485 0 7 93 0 0
650784 17804 3712944 0 2984 0 8952 5150 19377 0 5 95 0 ...

This kernel command line parameter didn't appear to help:

Moments after loading a new URL into firefox:

1  0            0       699176        17804      3699820    0    0     0     0 2289  863   2   1  98   0   0
 0  0            0       702860        17804      3699780    0    0     0  2288  492 1817   0   1  99   0   0
 0  0            0       702860        17804      3699780    0    0     0     0  232  525   0   0 100   0   0
 0  0            0       703112        17804      3699748    0    0     0     0  220  564   0   0 100   0   0
 2  0            0       707792        17804      3699748    0    0     0     0  384  734   1   0  99   0   0
 0  0            0       712468        17804      3699996    0    0     0     0 1005  854   2   0  97   0   0
 0  0            0       712500        17804      3699988    0    0     0  2308  497 1908   0   0  99   0   0
 0  0            0       712752        17804      3699748    0    0     0     0  240  567   0   0 100   0   0
 0  0          256       632540        17804      3763996    0  232     4   476 1106 2138   2   1  96   0   0
 1  0         4352       649568        17804      3761060    0 2980     0  8132 5030 17862   1   7  92   0   0
 1  0         7168       644588        17804      3758040    0 3032     0  9104 5304 20299   0   7  93   0   0
 1  0        10240       654444        17804      3755152    0 2892     0 12864 5367 21353   0   8  92   1   0
 1  0        13312       651200        17804      3752216    0 2932     0  8804 5073 18847   0   5  95   0   0
 1  0        16128       660636        17804      3749200    0 3016     0  9048 5183 19639   1   6  94   0   0
 0  0        18944       652444        17804      3746180    0 3020     0  9060 5221 19613   0   5  95   0   0
 2  0        22016       645228        17804      3743168    0 3012     0  9032 5219 20287   0   7  93   0   0
 0  0        25600       674648        17804      3740152    0 3016     0 11148 5290 20518   0   7  93   0   0
procs -----------------------memory---------------------- ---swap-- -----io---- -system-- --------cpu--------
 r  b         swpd         free         buff        cache   si   so    bi    bo   in   cs  us  sy  id  wa  st
 1  0        28160       667968        17804      3737076    0 3076     0  9228 5336 19913   0   7  93   0   0
 0  0        31488       663008        17804      3734048    0 3028     0  9084 5377 19518   0   6  94   0   0
 2  0        34560       655620        17804      3731020    0 3028     0  9080 5356 20395   0   7  93   0   0
 0  0        37632       649356        17804      3727996    0 3024     0  9076 5258 19858   0   6  94   0   0
 1  0        40192       677924        17804      3724948    0 3064     0 11388 5503 22313   0   8  92   0   0
 0  0        43264       670968        17804      3721944    0 3004     0  9016 5211 19591   0   7  93   0   0
 2  0        46336       665016        17804      3718996    0 2948     0  8844 5167 18978   0   7  93   0   0
 2  0        49664       658068        17804      3715896    0 3100     0  9300 5304 20485   0   7  93   0   0
 1  0        52480       650784        17804      3712944    0 2984     0  8952 5150 19377   0   5  95   0   0
 1  0        55296       677788        17804      3710012    0 2936     0 10892 5300 20757   0   6  93   0   0
 1  0        58624       673908        17804      3707076    0 2932     0  6984 4462 16264   0   4  96   0   0
 0  0        58880       670460        17804      3709260    0 1128     0  2260 1968 6407   1   2  97   0   0
 0  0        58880       670552        17804      3709188    0    0     0     0  259  487   0   0 100   0   0
 0  0        58880       736572        17804      3643148    0    0     0     0  255  499   0   0 100   0   0
 0  0        58880       744792        17804      3642492    0    0     0  2124  535 2045   0   1  99   0   0
 0  0        58880       746520        17804      3642492    0    0     0     0  224  539   0   0 100   0   0
 0  0        58880       746236        17804      3642492    0    0     0     0  289  775   0   0 100   0   0
 0  0        58880       746032        17804      3642752    0    0     0     0  411  831   1   0  99   0   0
 0  0        58880       747872        17804      3642828    0    0     0   136  253  599   0   0 100   0   0
 0  0        58880       747916        17804      3642492    0    0     0  2360  510 1911   0   1  99   0   0
 0  0        58880       738084        17804      3652720    0    0     0     0  441 1131   0   0  99   0   0

$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu Focal Fossa (development branch)
Release:	20.04
Codename:	focal
$ cat /proc/cmdline 
BOOT_IMAGE=/BOOT/ubuntu@/vmlinuz-5.4.0-12-generic root=ZFS=rpool/ROOT/ubuntu ro root=ZFS=rpool/ROOT/ubuntu quiet splash acpi_osi=! "acpi_osi=Windows 2015" cgroup_disable=memory vt.handoff=1
$ uname -a
Linux millbarge 5.4.0-12-generic #15-Ubuntu SMP Tue Jan 21 15:12:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Thanks

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-02-07:

#35

$ systemctl show '*.slice' | grep -e '^Slice' -e '^ControlGroup' -e "^DefaultMem" -e "^Memory"
Slice=system.slice
ControlGroup=/system.slice/system-systemd\x2dfsck.slice
MemoryCurrent=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
Slice=system.slice
ControlGroup=/system.slice/system-getty.slice
MemoryCurrent=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
Slice=-.slice
ControlGroup=/user.slice
MemoryCurrent=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
ControlGroup=/
MemoryCurrent=14869471232
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
Slice=user.slice
ControlGroup=/user.slice/user-1000.slice
MemoryCurrent=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
Slice=system.slice
ControlGroup=/system.slice/system-ubxtool.slice
MemoryCurrent=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
Slice=system.slice
ControlGroup=/system.slice/system-systemd\x2dcryptsetup.slice
MemoryCurrent=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
Slice=system.slice
ControlGroup=/system.slice/system-systemd\x2dbacklight.slice
MemoryCurrent=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity
Slice=-.slice
ControlGroup=/system.slice
MemoryCurrent=[not set]
MemoryAccounting=yes
DefaultMemoryLow=0
DefaultMemoryMin=0
MemoryMin=0
MemoryLow=0
MemoryHigh=infinity
MemoryMax=infinity
MemorySwapMax=infinity
MemoryLimit=infinity

Revision history for this message

Andrea Righi (arighi) wrote on 2020-02-10:

#36

OK, so we know that it's not related to the memory cgroup subsystem.

Another reason of such unexpected swapping activity could be due to memory compaction code that is triggering some direct memory reclaim and forcing to swap out pages.

What do you have in /proc/sys/vm/swappiness? Could you try to set that to 0 (kernel prefers to drop file-backed pages instead of swapping out anonymous pages) and see if the swap out activity is still happening?

Moreover, we can do some more advanced tracing of the direct memory reclaim code, for example:
$ sudo kprobe-perf -s 'p:shrink_node'

This may give us useful information if and how direct memory reclaim code was triggered (this command should print out some stack traces when notice the swap out activity).

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-02-11: Re: [Bug 1861359] Re: swap storms kills interactive use

#37

On Mon, Feb 10, 2020 at 07:53:20AM -0000, Andrea Righi wrote:
> OK, so we know that it's not related to the memory cgroup subsystem.

But this is a good instinct. It does seem to happen when eg firefox or git
is in heavy memory use, not the system as a whole.

> Another reason of such unexpected swapping activity could be due to
> memory compaction code that is triggering some direct memory reclaim and
> forcing to swap out pages.
>
> What do you have in /proc/sys/vm/swappiness? Could you try to set that
> to 0 (kernel prefers to drop file-backed pages instead of swapping out
> anonymous pages) and see if the swap out activity is still happening?

$ cat /proc/sys/vm/swappiness
60

I'll try 0.

> Moreover, we can do some more advanced tracing of the direct memory reclaim code, for example:
> $ sudo kprobe-perf -s 'p:shrink_node'
>

This doesn't work:

# kprobe-perf -s 'p:shrink_node'
ERROR: func shrink_node not in /sys/kernel/debug/tracing/available_filter_functions.
Either it doesn't exist, or, it might be unsafe to kprobe. Exiting. Use -F to override.
# ls -l /sys/kernel/debug/tracing/available_filter_functions
ls: cannot access '/sys/kernel/debug/tracing/available_filter_functions': No such file or directory
# find /sys/kernel/debug/tracing
/sys/kernel/debug/tracing
/sys/kernel/debug/tracing/instances
/sys/kernel/debug/tracing/trace_stat
/sys/kernel/debug/tracing/per_cpu
/sys/kernel/debug/tracing/per_cpu/cpu7
/sys/kernel/debug/tracing/per_cpu/cpu6
/sys/kernel/debug/tracing/per_cpu/cpu5
/sys/kernel/debug/tracing/per_cpu/cpu4
/sys/kernel/debug/tracing/per_cpu/cpu3
/sys/kernel/debug/tracing/per_cpu/cpu2
/sys/kernel/debug/tracing/per_cpu/cpu1
/sys/kernel/debug/tracing/per_cpu/cpu0
/sys/kernel/debug/tracing/options
#

Thanks

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-02-11:

#38

> What do you have in /proc/sys/vm/swappiness? Could you try to set that
> to 0 (kernel prefers to drop file-backed pages instead of swapping out
> anonymous pages) and see if the swap out activity is still happening?

Unfortunately, this did not solve the problem.

Setting swappiness to 0 and re-enabling my swap nearly immediately caused
the whole computer to lag whenever firefox had focus -- and swap use
jumped to 380M, despite having roughly five gigs free space.

Turning swap back off again helped immensely.

Thanks

Revision history for this message

Andrea Righi (arighi) wrote on 2020-02-12:

#39

Weird that kprobe-perf isn't working... I've just tried it on a fresh new installed 20.04 instance and:

ubuntu@ubuntu:~$ uname -a
Linux ubuntu 5.4.0-12-generic #15-Ubuntu SMP Tue Jan 21 15:12:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@ubuntu:~$ sudo ls -l /sys/kernel/debug/tracing/available_filter_functions
-r--r--r-- 1 root root 0 Feb 12 11:32 /sys/kernel/debug/tracing/available_filter_functions
ubuntu@ubuntu:~$ sudo kprobe-perf -s 'p:shrink_node'
Tracing kprobe shrink_node. Ctrl-C to end.
^C
Ending tracing...
ubuntu@ubuntu:~$

Are we using a different kernel?

What are your kernel .config options about ftrace (grep FTRACE /boot/config-`uname -r`)?

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-02-13:

#40

$ uname -a
Linux millbarge 5.4.0-12-generic #15-Ubuntu SMP Tue Jan 21 15:12:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ grep FTRACE /boot/config-`uname -r`
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_STM_SOURCE_FTRACE=m
# CONFIG_PSTORE_FTRACE is not set
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_FTRACE=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_DYNAMIC_FTRACE_WITH_REGS=y
CONFIG_FTRACE_MCOUNT_RECORD=y
# CONFIG_FTRACE_STARTUP_TEST is not set

$ sudo ls -l /sys/kernel/debug/tracing/available_filter_functions
ls: cannot access '/sys/kernel/debug/tracing/available_filter_functions': No such file or directory
$ sudo kprobe-perf -s 'p:shrink_node'
ERROR: func shrink_node not in /sys/kernel/debug/tracing/available_filter_functions.
Either it doesn't exist, or, it might be unsafe to kprobe. Exiting. Use -F to override.

Are you using secure boot?

$ dmesg | grep -i secure
[ 0.000000] secureboot: Secure boot enabled
[ 0.000000] Kernel is locked down from EFI Secure Boot mode; see man kernel_lockdown.7
[ 0.019867] secureboot: Secure boot enabled
[ 0.965284] integrity: Loaded X.509 cert 'millbarge Secure Boot Module Signature key: effb550c01bc71fba674d2a0b19930ccde197df8'
[ 72.131681] Bluetooth: hci0: Secure boot is enabled

Thanks

Revision history for this message

Andrea Righi (arighi) wrote on 2020-02-17:

#41

Ah! You're right, that's the reason! When the kernel is locked down ftrace is explicitly disabled. To confirm that, you should have 0 in /proc/sys/kernel/ftrace_enabled.

Can you try to set it back to 1 and see if kprobe-perf works after that? Otherwise I'll figure out an alternative way to trace down the swapping issue. Thanks.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-02-19:

#42

$ uname -a
Linux millbarge 5.4.0-12-generic #15-Ubuntu SMP Tue Jan 21 15:12:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ cat /proc/sys/kernel/ftrace_enabled
1
$ sudo kprobe-perf -s 'p:shrink_node'
[sudo] password for sarnold:
ERROR: func shrink_node not in /sys/kernel/debug/tracing/available_filter_functions.
Either it doesn't exist, or, it might be unsafe to kprobe. Exiting. Use -F to override.
$ sudo kprobe-perf -F -s 'p:shrink_node'
Tracing kprobe shrink_node. Ctrl-C to end.
/usr/sbin/kprobe-perf: line 227: current_tracer: Permission denied
/usr/sbin/kprobe-perf: line 228: kprobe_events: Permission denied
ERROR: adding kprobe "p:shrink_node shrink_node".
Last 2 dmesg entries (might contain reason):
[1039094.850875] usb 1-4.4.1.3: Manufacturer: u-blox AG - www.u-blox.com
[1039094.860068] cdc_acm 1-4.4.1.3:1.0: ttyACM1: USB ACM device
Exiting.

Does the /proc/sys/kernel/ftrace_enabled output need a new bug report?

Thanks

Revision history for this message

Andrea Righi (arighi) wrote on 2020-02-20:

#43

I guess we can't use ftrace and secure boot at the same time then... would it be possible to disable secure boot / kernel lockdown on your side and run a test using that kprobe-perf command?

If it's not possible or too complicated we'll find an alternative way, maybe I can create a custom kernel and add some specific code to trace shrink_node().

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-02-29:

#44

This web page may be a good reproducer candidate:

https://platform.leolabs.space/visualizations/conjunction?type=conjunction&reportId=2004981040

Loading it in firefox would make my computer unresponsive for over a minute. (Be careful with firefox reloading it when re-opening firefox.)

Loading it in chromium-browser made my computer unresponsive for 45 seconds.

Thanks

Revision history for this message

Andrea Righi (arighi) wrote on 2020-02-29:

#45

Many thanks for the reproducer Seth! I've been able to reproduce the swapping issue on my laptop! Now I can investigate more on my side. I'll keep you posted!

Revision history for this message

Andrea Righi (arighi) wrote on 2020-02-29:

#46

Seth, can you try to see if you can reproduce the problem with the latest unstable kernel (5.4.0-17.21)? https://launchpad.net/~canonical-kernel-team/+archive/ubuntu/unstable

I can't reproduce the problem with it. I have not verified yet, but I suspect it might be related to this commit:

d92ff805127f mm: move_pages: report the number of non-attempted pages

Thanks!

Revision history for this message

Steve Beattie (sbeattie) wrote on 2020-02-29:

#47

On Sat, Feb 29, 2020 at 10:06:42AM -0000, Andrea Righi wrote:
> Seth, can you try to see if you can reproduce the problem with the
> latest unstable kernel (5.4.0-17.21)? https://launchpad.net/~canonical-
> kernel-team/+archive/ubuntu/unstable
>
> I can't reproduce the problem with it. I have not verified yet, but I
> suspect it might be related to this commit:
>
> d92ff805127f mm: move_pages: report the number of non-attempted pages

I've installed this kernel, but haven't rebooted into it. I noticed
while installing it, I got a zillion messages:

depmod: ERROR: ../libkmod/libkmod.c:515 lookup_builtin_file() could not open builtin file '/lib/modules/5.4.0-17-generic/modules.builtin.bin'

Both linux-modules-5.4.0-17-generic and
linux-modules-extra-5.4.0-17-generic were installed, so I'm not sure
what's gone wrong here.

--
Steve Beattie
<email address hidden>
http://NxNW.org/~steve/

Revision history for this message

Andrea Righi (arighi) wrote on 2020-03-02:

#48

The depmod error messages have been fixed in initramfs-tools (see https://bugs.launchpad.net/bugs/1863261). It doesn't actually prevent the kernel from booting, so you can safely reboot.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-03-03:

#49

Andrea, this new kernel looks promising.

Linux millbarge 5.4.0-17-generic #21-Ubuntu SMP Fri Feb 28 16:18:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Unfortunately I didn't check the reproducer before rebooting: the error message I get with it now suggests that it might not work any more.

However, this visualization https://platform.leolabs.space/visualizations/leo sure feels like it would have the heft to reproduce the problem. It worked fine for me in both firefox and chromium-browser:

sarnold 4987 14.1 3.0 3042112 495860 ? Sl 03:42 0:46 /usr/lib/firefox/firefox
sarnold 5070 7.3 1.8 2651760 293108 ? Sl 03:42 0:23 /usr/lib/firefox/firefox -contentproc -childID 1
sarnold 5175 2.5 0.9 2559560 155888 ? Sl 03:42 0:08 /usr/lib/firefox/firefox -contentproc -childID 2
sarnold 5736 5.6 2.9 4144456 476876 ? Sl 03:42 0:17 /usr/lib/firefox/firefox -contentproc -childID 4
sarnold 9668 0.3 0.4 2382156 71616 ? Sl 03:46 0:00 /usr/lib/firefox/firefox -contentproc -childID 6

sarnold 10394 6.7 1.1 3036576 182812 ? Sl 03:48 0:04 /snap/chromium/1040/usr/lib/chrom
sarnold 10672 0.0 0.2 407644 47472 ? S 03:48 0:00 /snap/chromium/1040/usr/lib/chrom
sarnold 10674 0.0 0.0 407644 12932 ? S 03:48 0:00 /snap/chromium/1040/usr/lib/chrom
sarnold 10690 17.9 2.3 1269704 372880 ? Sl 03:48 0:10 /snap/chromium/1040/usr/lib/chrom
sarnold 10694 0.8 0.4 1014184 77752 ? Sl 03:48 0:00 /snap/chromium/1040/usr/lib/chrom
sarnold 10735 60.2 3.2 6219072 521464 ? Sl 03:48 0:36 /snap/chromium/1040/usr/lib/chrom
sarnold 10780 0.0 0.3 5498108 51232 ? Sl 03:48 0:00 /snap/chromium/1040/usr/lib/chrom

I also did several git fast-forward merges on various Linux git trees; these have not been reliable reproducers, but they also completed without trouble.

An uptime of 30 minutes is perhaps too short to declare success but this is certainly promising.

Thanks

Andrea, this new kernel looks promising.

Linux millbarge 5.4.0-17-generic #21-Ubuntu SMP Fri Feb 28 16:18:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Unfortunately I didn't check the reproducer before rebooting: the error message I get with it now suggests that it might not work any more.

However, this visualization https://platform.leolabs.space/visualizations/leo sure feels like it would have the heft to reproduce the problem. It worked fine for me in both firefox and chromium-browser:

sarnold     4987 14.1  3.0 3042112 495860 ?      Sl   03:42   0:46 /usr/lib/firefox/firefox
sarnold     5070  7.3  1.8 2651760 293108 ?      Sl   03:42   0:23 /usr/lib/firefox/firefox -contentproc -childID 1
sarnold     5175  2.5  0.9 2559560 155888 ?      Sl   03:42   0:08 /usr/lib/firefox/firefox -contentproc -childID 2
sarnold     5736  5.6  2.9 4144456 476876 ?      Sl   03:42   0:17 /usr/lib/firefox/firefox -contentproc -childID 4
sarnold     9668  0.3  0.4 2382156 71616 ?       Sl   03:46   0:00 /usr/lib/firefox/firefox -contentproc -childID 6

sarnold    10394  6.7  1.1 3036576 182812 ?      Sl   03:48   0:04 /snap/chromium/1040/usr/lib/chrom
sarnold    10672  0.0  0.2 407644 47472 ?        S    03:48   0:00 /snap/chromium/1040/usr/lib/chrom
sarnold    10674  0.0  0.0 407644 12932 ?        S    03:48   0:00 /snap/chromium/1040/usr/lib/chrom
sarnold    10690 17.9  2.3 1269704 372880 ?      Sl   03:48   0:10 /snap/chromium/1040/usr/lib/chrom
sarnold    10694  0.8  0.4 1014184 77752 ?       Sl   03:48   0:00 /snap/chromium/1040/usr/lib/chrom
sarnold    10735 60.2  3.2 6219072 521464 ?      Sl   03:48   0:36 /snap/chromium/1040/usr/lib/chrom
sarnold    10780  0.0  0.3 5498108 51232 ?       Sl   03:48   0:00 /snap/chromium/1040/usr/lib/chrom

I also did several git fast-forward merges on various Linux git trees; these have not been reliable reproducers, but they also completed without trouble.

An uptime of 30 minutes is perhaps too short to declare success but this is certainly promising.

Thanks

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-03-04:

#50

Andrea, unfortunately this updated kernel hasn't fixed the problem:

01:02:48 up 21:21, 9 users, load average: 1.45, 0.98, 0.58

Linux millbarge 5.4.0-17-generic #21-Ubuntu SMP Fri Feb 28 16:18:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I was able to reproduce the swap growth and hangs with Firefox, doing a few google image searches.

Thanks

Revision history for this message

Andrea Righi (arighi) wrote on 2020-03-04:

#51

@seth-arnold ok I'll do this tests also on my side and see if I can reproduce the problem. If you find a specific web page that can trigger the problem easily let me know. Thanks!

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-03-09:

#52

There is an interesting (to me, anyway) change of behaviour with the -17 kernel: while earlier kernels would appear to be locked solid for 30-60 seconds before the screen could update, -17 allows screen updates every six seconds or so.

I have an always-running mosh session to a remote host running irssi. This session has a clock that updates every second. When this problem strikes I can glance at this clock and by watching the time jumps, gauge how long the hangs are.

When the -17 kernel hangs during these events, this clock updates roughly every six seconds:

eg on seconds 36, 42, 48, 54, 00, 06, 13, 19, 25, 31, 37, 43, 50, 56..

The system is by no means usable during this time, but it is interesting that enough userspace ran for mosh to accept packets and redraw the urxvt terminal, and xorg to update the screen.

My most recent hang happened while firefox loaded https://www.google.com/maps/place/20%C2%B054'35.7%22S+55%C2%B035'05.0%22E/@-20.9099167,55.5826253,2281m/data=!3m2!1e3!4b1!4m5!3m4!1s0x0:0x0!8m2!3d-20.90991!4d55.58471 but it might be the case that this is just what pushed my browser over the edge, after loading several pages of google docs content and a PDF.

Thanks

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2020-03-10:

#53

Using 5.4.0-17.21-generic, my laptop has 16G of ram. If I launch 3 vms (xenial desktop (768M), bionic desktop (1.5G) and focal desktop (2.6G)) then load this page: https://people.canonical.com/~ubuntu-security/oval/com.ubuntu.xenial.cve.oval.xml, at some point while the page is loading, the desktop interactivity degrades. top tells me I am swapping 2.5G. (I have other tabs open in firefox of course). Eventually the system settles down and interactivity is decent again.

top - 07:59:36 up 4 days, 21:27, 1 user, load average: 2.37, 2.69, 2.43
Tasks: 456 total, 2 running, 454 sleeping, 0 stopped, 0 zombie
%Cpu(s): 10.8 us, 8.1 sy, 0.0 ni, 79.1 id, 0.0 wa, 0.0 hi, 2.0 si, 0.0 st
MiB Mem : 15859.5 total, 3358.4 free, 10540.5 used, 1960.5 buff/cache
MiB Swap: 16232.0 total, 13749.7 free, 2482.2 used. 4008.9 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
200286 jamie 20 0 9007480 4.2g 159064 S 0.0 27.3 6:36.74 Web Content
410962 libvirt+ 20 0 3428276 572112 956 S 0.0 3.5 0:16.71 qemu-system-+
199841 jamie 20 0 3704896 502596 138168 S 2.7 3.1 34:35.49 firefox-bin
   6292 jamie 20 0 1173892 481776 2760 S 0.0 3.0 0:33.38 gnome-softwa+
270490 jamie 20 0 2855220 453944 52804 S 0.0 2.8 5:28.90 chrome
   4798 jamie 20 0 4703264 386268 102868 S 10.7 2.4 123:08.13 gnome-shell
200005 jamie 20 0 3168928 383332 82760 S 1.3 2.4 8:57.08 Web Content
411248 libvirt+ 20 0 5171760 376240 1152 S 14.7 2.3 1:01.78 qemu-system-+
200257 jamie 20 0 3072596 346168 36604 S 4.0 2.1 18:06.12 Web Content
411125 libvirt+ 20 0 3944844 296928 2248 R 2.7 1.8 0:42.79 qemu-system-+
200235 jamie 20 0 3129932 293856 135852 S 0.0 1.8 8:12.52 Web Content
200198 jamie 20 0 2980312 268940 40484 S 1.3 1.7 13:50.98 Web Content
   5396 jamie 20 0 1032416 250628 0 S 0.0 1.5 11:31.16 multipass.gui
200112 jamie 20 0 3001112 246340 42248 S 0.0 1.5 3:49.98 WebExtensions
185759 jamie 20 0 2350136 159272 3984 S 0.0 1.0 2:57.77 rhythmbox
   4458 jamie 20 0 1161904 158884 124960 S 12.0 1.0 172:18.63 Xorg
...

Using 5.4.0-17.21-generic, my laptop has 16G of ram. If I launch 3 vms (xenial desktop (768M), bionic desktop (1.5G) and focal desktop (2.6G)) then load this page: https://people.canonical.com/~ubuntu-security/oval/com.ubuntu.xenial.cve.oval.xml, at some point while the page is loading, the desktop interactivity degrades. top tells me I am swapping 2.5G. (I have other tabs open in firefox of course). Eventually the system settles down and interactivity is decent again.

top - 07:59:36 up 4 days, 21:27,  1 user,  load average: 2.37, 2.69, 2.43
Tasks: 456 total,   2 running, 454 sleeping,   0 stopped,   0 zombie
%Cpu(s): 10.8 us,  8.1 sy,  0.0 ni, 79.1 id,  0.0 wa,  0.0 hi,  2.0 si,  0.0 st
MiB Mem :  15859.5 total,   3358.4 free,  10540.5 used,   1960.5 buff/cache
MiB Swap:  16232.0 total,  13749.7 free,   2482.2 used.   4008.9 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND       
 200286 jamie     20   0 9007480   4.2g 159064 S   0.0  27.3   6:36.74 Web Content   
 410962 libvirt+  20   0 3428276 572112    956 S   0.0   3.5   0:16.71 qemu-system-+ 
 199841 jamie     20   0 3704896 502596 138168 S   2.7   3.1  34:35.49 firefox-bin   
   6292 jamie     20   0 1173892 481776   2760 S   0.0   3.0   0:33.38 gnome-softwa+ 
 270490 jamie     20   0 2855220 453944  52804 S   0.0   2.8   5:28.90 chrome        
   4798 jamie     20   0 4703264 386268 102868 S  10.7   2.4 123:08.13 gnome-shell   
 200005 jamie     20   0 3168928 383332  82760 S   1.3   2.4   8:57.08 Web Content   
 411248 libvirt+  20   0 5171760 376240   1152 S  14.7   2.3   1:01.78 qemu-system-+ 
 200257 jamie     20   0 3072596 346168  36604 S   4.0   2.1  18:06.12 Web Content   
 411125 libvirt+  20   0 3944844 296928   2248 R   2.7   1.8   0:42.79 qemu-system-+ 
 200235 jamie     20   0 3129932 293856 135852 S   0.0   1.8   8:12.52 Web Content   
 200198 jamie     20   0 2980312 268940  40484 S   1.3   1.7  13:50.98 Web Content   
   5396 jamie     20   0 1032416 250628      0 S   0.0   1.5  11:31.16 multipass.gui 
 200112 jamie     20   0 3001112 246340  42248 S   0.0   1.5   3:49.98 WebExtensions 
 185759 jamie     20   0 2350136 159272   3984 S   0.0   1.0   2:57.77 rhythmbox     
   4458 jamie     20   0 1161904 158884 124960 S  12.0   1.0 172:18.63 Xorg
...

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-03-25:

#54

BTW, this is still happening in:

Linux millbarge 5.4.0-20-generic #24-Ubuntu SMP Mon Mar 23 20:55:46 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I've seen it both with firefox in trello, firefox in launchpad (typing this comment) and doing two sequential wgets of http://releases.ubuntu.com/18.04/ubuntu-18.04.4-desktop-amd64.iso (a 1.8 GB file).

Thanks

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-03-26:

#55

I'm adding the champagne tag to this bug to bring it to a potential wider audience; I think we may need to take more drastic steps like disabling swap on upgrades, not offering swap in our installers, etc., to try to have a better experience.

Thanks

tags:

added: champagne

Revision history for this message

Sultan Alsawaf (kerneltoast) wrote on 2020-03-27:

#56

This problem is caused by an upstream memory management feature called watermark boosting. Normally, when a memory allocation fails and falls back to the page allocator, the page allocator will wake up kswapd to free up pages in order to make the memory allocation succeed. kswapd tries to free memory until it reaches a minimum amount of memory for each memory zone called the high watermark.

What watermark boosting does is try to preemptively fire up kswapd to free memory when there hasn't been an allocation failure. It does this by increasing kswapd's high watermark goal and then firing up kswapd. The reason why this causes freezes is because, with the increased high watermark goal, kswapd will steal memory from processes that need it in order to make forward progress. These processes will, in turn, try to allocate memory again, which will cause kswapd to steal necessary pages from those processes again, in a positive feedback loop known as page thrashing. When page thrashing occurs, your system is essentially livelocked until the necessary forward progress can be made to stop processes from trying to continuously allocate memory and trigger kswapd to steal it back.

This problem already occurs with kswapd *without* watermark boosting, but it's usually only encountered on machines with a small amount of memory and/or a slow CPU. Watermark boosting just makes the existing problem worse enough to notice on higher spec'd machines.

To fix the issue in this bug, watermark boosting can be disabled with the following:
# echo 0 > /proc/sys/vm/watermark_boost_factor

There's really no harm in doing so, because watermark boosting is an inherently broken feature...

Andrea Righi (arighi) on 2020-03-27

description:

updated

Revision history for this message

Sultan Alsawaf (kerneltoast) wrote on 2020-03-27:

#57

FYI, this bug has nothing to do with the use of swap. It just happens that the slow writeback incurred by using a swap device backed by non-volatile memory makes kswapd's bouts of page thrashing last longer, enough to the point where there's a visual freeze.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-03-30:

#58

Sultan, thanks for the advice.

I set this watermark boost factor to zero as you suggested, and then decided to try a stupid simple benchmark of my storage -- my swap is a zfs dataset on nvme. zfs means it'll go slower than raw nvme block access:

$ dd if=ubuntu-18.04.4-desktop-amd64.iso of=foo bs=$((1024*1024)) conv=fsync
2028+1 records in
2028+1 records out
2126544896 bytes (2.1 GB, 2.0 GiB) copied, 6.91399 s, 308 MB/s

And this is a sequential workload with large blocks rather than random with pages, but it gives the flavour of the speed well enough.

Anyway, my system hung again when re-focusing firefox even after changing the watermark boost factor to zero. It was unusable for ten or twenty seconds. I have loads of free memory, it shouldn't be in swap:

$ free -h
total used free shared buff/cache available
Mem: 15Gi 11Gi 1.6Gi 643Mi 2.2Gi 2.8Gi
Swap: 2.0Gi 23Mi 2.0Gi

Thanks

Revision history for this message

Sultan Alsawaf (kerneltoast) wrote on 2020-03-31:

#59

As a note: https://platform.leolabs.space/visualizations/leo is not a valid reproducer for this bug, since the lags it causes are from overloading the GPU, not from stressing memory.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-04-02:

#60

Sultan put together a kernel with some debugging for me:

[101616.889859] __alloc_pages_nodemask: stall of 3683ms for order-0, mask: 0x100dca
[101616.889863] Call Trace:
[101616.889880] __alloc_pages_nodemask+0x34f/0x3b0
[101616.889887] alloc_pages_vma+0x7f/0x200
[101616.889893] do_anonymous_page+0x118/0x650
[101616.889898] __handle_mm_fault+0x760/0x7a0
[101616.889903] handle_mm_fault+0xca/0x200
[101616.889911] do_user_addr_fault+0x1f9/0x450
[101616.889917] __do_page_fault+0x58/0x90
[101616.889925] ? syscall_slow_exit_work+0x123/0x150
[101616.889930] do_page_fault+0x2c/0xe0
[101616.889938] page_fault+0x34/0x40
[101616.889943] RIP: 0033:0x7f0e38f80190
[101616.889949] Code: 78 04 41 bf 01 00 00 00 44 39 ef 0f 87 4f 01 00 00 41 83 c5 01 48 ba 00 00 00 00 00 80 fa ff 44 89 ee 48 8d 0c f9 48 c1 e6 03 <48> 89 11 48 83 c1 08 48 8b 7b 18 48 01 f7 48 39 f9 75 ed 44 89 68
[101616.889951] RSP: 002b:00007ffd1b151780 EFLAGS: 00010202
[101616.889955] RAX: 00007f0e27619000 RBX: 000006b3e615a810 RCX: 00007f0e27634000
[101616.889958] RDX: fffa800000000000 RSI: 000000000001aff8 RDI: 00000000000035fe
[101616.889960] RBP: 0000000000006bfd R08: 00007ffd1b151820 R09: 00000000292d95be
[101616.889962] R10: 0000000000000011 R11: 00007f0e2632d000 R12: 00007f0e3082e000
[101616.889964] R13: 00000000000035ff R14: 0000000000000001 R15: 0000000000000001

The trigger in this case was firefox loading https://code.launchpad.net/~ahasenack/qa-regression-testing/+git/qa-regression-testing/+merge/381582

Recent calls to journalctl > /tmp/foo had lead to a minute-long hang (though the mouse did make ~10ms motion every six or seven seconds), swap growth from 1MB to 62M, but no stacktraces.

Thanks

Stefan Bader (smb) on 2020-04-03

Changed in linux (Ubuntu Focal):
status:	Confirmed → Fix Committed

Revision history for this message

Mason Loring Bliss (y-mason) wrote on 2020-04-03:

#61

Reporter hasn't confirmed that it's corrected yet... "Fix committed" seems premature.

Changed in linux (Ubuntu Focal):
status:	Fix Committed → Confirmed

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-04-03:

#62

Download full text (7.0 KiB)

Stefan, while recent kernels seem happier than previous kernels (I think -14 era was terrible), I don't think this problem is fixed yet:

sarnold@millbarge:/tmp$ uname -a
Linux millbarge 5.4.0-21-generic #25-Ubuntu SMP Sat Mar 28 13:10:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
sarnold@millbarge:/tmp$ uptime
17:18:02 up 14:59, 8 users, load average: 0.36, 0.27, 0.16
sarnold@millbarge:/tmp$ time journalctl > foo

real 0m47.135s
user 0m36.886s
sys 0m10.180s
sarnold@millbarge:/tmp$ ls -lh /tmp/foo
-rw-rw-r-- 1 sarnold sarnold 841M Apr 3 17:15 /tmp/foo
sarnold@millbarge:/tmp$ wc -l /tmp/foo
4045530 /tmp/foo

During that journalctl run, my mouse could not move for twenty seconds. I left an iovisor tool running to log high-latency __alloc_pages_nodemask() calls (identified by Sultan as a likely cause):

# funcslower-bpfcc __alloc_pages_nodemask
Tracing function calls slower than 1 ms... Ctrl+C to quit.
COMM PID LAT(ms) RVAL FUNC
dp_sync_taskq 716 7.72 fffff2d109115040 __alloc_pages_nodemask
systemd-udevd 44671 1.59 fffff2d10575b800 __alloc_pages_nodemask
git 46218 4.27 fffff2d10500edc0 __alloc_pages_nodemask
z_wr_iss 683 4.90 fffff2d101b4cf40 __alloc_pages_nodemask
systemd-udevd 51359 4.27 fffff2d10ec12800 __alloc_pages_nodemask
systemd-udevd 56585 2.48 fffff2d10d3b0500 __alloc_pages_nodemask
systemd-udevd 56577 2.55 fffff2d106f73380 __alloc_pages_nodemask
systemd-udevd 72552 2.09 fffff2d105249680 __alloc_pages_nodemask
systemd-udevd 76964 1.53 fffff2d1017fd440 __alloc_pages_nodemask
systemd-udevd 80900 3.94 fffff2d102675200 __alloc_pages_nodemask
systemd-udevd 88669 2.09 fffff2d1015fb5c0 __alloc_pages_nodemask
kthreadd 2 4.41 fffff2d10d6f96c0 __alloc_pages_nodemask
kthreadd 2 4.65 fffff2d10d38c3c0 __alloc_pages_nodemask
kthreadd 2 3.97 fffff2d10d3cc780 __alloc_pages_nodemask
updatedb.mloca 106194 2.44 fffff2d10e1c2680 __alloc_pages_nodemask
kthreadd 2 1.20 fffff2d10d36ff40 __alloc_pages_nodemask
kthreadd 2 4.13 fffff2d10d495940 __alloc_pages_nodemask
kthreadd 2 6.34 fffff2d10d317180 __alloc_pages_nodemask
kthreadd 2 4.76 fffff2d110e85000 __alloc_pages_nodemask
systemd-udevd 113822 1.51 fffff2d10f5c8e00 __alloc_pages_nodemask
systemd-udevd 113820 1.91 fffff2d10e611800 __alloc_pages_nodemask
systemd-udevd 122017 3.13 fffff2d10e539700 __alloc_pages_nodemask
systemd-udevd 141102 2.43 fffff2d104487200 __alloc_pages_nodemask
systemd-udevd 144565 4.82 fffff2d106b84040 __alloc_pages_nodemask
systemd-udevd 1156 1.28 fffff2d102325c00 __alloc_pages_nodemask
systemd-udevd 152486 3.07 fffff2d101516500 __alloc_pages_nodemask
systemd-udevd 152485 4.94 fffff2d1015cbbc0 __alloc_pages_nodemask
systemd-udevd 172807 1.33 fffff2d1015f4700 __alloc_pages_nodemask
systemd-udevd 185503 4.46 fffff2d10d321240 __alloc_pages_nodemask
systemd-udevd 189244 5.12 fffff2d10e7f0b00 __alloc_pages_nodemask
systemd-udevd 205973 5.36 fffff2d10d4171c0 __alloc_pages_nodemask
systemd-udevd 210133 5.19 ff...

Stefan, while recent kernels seem happier than previous kernels (I think -14 era was terrible), I don't think this problem is fixed yet:

sarnold@millbarge:/tmp$ uname -a
Linux millbarge 5.4.0-21-generic #25-Ubuntu SMP Sat Mar 28 13:10:28 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
sarnold@millbarge:/tmp$ uptime
 17:18:02 up 14:59,  8 users,  load average: 0.36, 0.27, 0.16
sarnold@millbarge:/tmp$ time journalctl > foo

real	0m47.135s
user	0m36.886s
sys	0m10.180s
sarnold@millbarge:/tmp$ ls -lh /tmp/foo
-rw-rw-r-- 1 sarnold sarnold 841M Apr  3 17:15 /tmp/foo
sarnold@millbarge:/tmp$ wc -l /tmp/foo
4045530 /tmp/foo

During that journalctl run, my mouse could not move for twenty seconds. I left an iovisor tool running to log high-latency __alloc_pages_nodemask() calls (identified by Sultan as a likely cause):

# funcslower-bpfcc __alloc_pages_nodemask
Tracing function calls slower than 1 ms... Ctrl+C to quit.
COMM           PID    LAT(ms)             RVAL FUNC
dp_sync_taskq  716       7.72 fffff2d109115040 __alloc_pages_nodemask 
systemd-udevd  44671     1.59 fffff2d10575b800 __alloc_pages_nodemask 
git            46218     4.27 fffff2d10500edc0 __alloc_pages_nodemask 
z_wr_iss       683       4.90 fffff2d101b4cf40 __alloc_pages_nodemask 
systemd-udevd  51359     4.27 fffff2d10ec12800 __alloc_pages_nodemask 
systemd-udevd  56585     2.48 fffff2d10d3b0500 __alloc_pages_nodemask 
systemd-udevd  56577     2.55 fffff2d106f73380 __alloc_pages_nodemask 
systemd-udevd  72552     2.09 fffff2d105249680 __alloc_pages_nodemask 
systemd-udevd  76964     1.53 fffff2d1017fd440 __alloc_pages_nodemask 
systemd-udevd  80900     3.94 fffff2d102675200 __alloc_pages_nodemask 
systemd-udevd  88669     2.09 fffff2d1015fb5c0 __alloc_pages_nodemask 
kthreadd       2         4.41 fffff2d10d6f96c0 __alloc_pages_nodemask 
kthreadd       2         4.65 fffff2d10d38c3c0 __alloc_pages_nodemask 
kthreadd       2         3.97 fffff2d10d3cc780 __alloc_pages_nodemask 
updatedb.mloca 106194    2.44 fffff2d10e1c2680 __alloc_pages_nodemask 
kthreadd       2         1.20 fffff2d10d36ff40 __alloc_pages_nodemask 
kthreadd       2         4.13 fffff2d10d495940 __alloc_pages_nodemask 
kthreadd       2         6.34 fffff2d10d317180 __alloc_pages_nodemask 
kthreadd       2         4.76 fffff2d110e85000 __alloc_pages_nodemask 
systemd-udevd  113822    1.51 fffff2d10f5c8e00 __alloc_pages_nodemask 
systemd-udevd  113820    1.91 fffff2d10e611800 __alloc_pages_nodemask 
systemd-udevd  122017    3.13 fffff2d10e539700 __alloc_pages_nodemask 
systemd-udevd  141102    2.43 fffff2d104487200 __alloc_pages_nodemask 
systemd-udevd  144565    4.82 fffff2d106b84040 __alloc_pages_nodemask 
systemd-udevd  1156      1.28 fffff2d102325c00 __alloc_pages_nodemask 
systemd-udevd  152486    3.07 fffff2d101516500 __alloc_pages_nodemask 
systemd-udevd  152485    4.94 fffff2d1015cbbc0 __alloc_pages_nodemask 
systemd-udevd  172807    1.33 fffff2d1015f4700 __alloc_pages_nodemask 
systemd-udevd  185503    4.46 fffff2d10d321240 __alloc_pages_nodemask 
systemd-udevd  189244    5.12 fffff2d10e7f0b00 __alloc_pages_nodemask 
systemd-udevd  205973    5.36 fffff2d10d4171c0 __alloc_pages_nodemask 
systemd-udevd  210133    5.19 fffff2d10ffac340 __alloc_pages_nodemask 
systemd-udevd  209621    2.95 fffff2d10d377fc0 __alloc_pages_nodemask 
systemd-udevd  218768    2.24 fffff2d10526e500 __alloc_pages_nodemask 
systemd-udevd  234261    1.42 fffff2d101814680 __alloc_pages_nodemask 
systemd-udevd  235468    3.82 fffff2d10d2afa40 __alloc_pages_nodemask 
systemd-udevd  236826    4.94 fffff2d10d0f8440 __alloc_pages_nodemask 
systemd-udevd  247115    1.13 fffff2d1023ee180 __alloc_pages_nodemask 
systemd-udevd  247573    1.46 fffff2d1015a4040 __alloc_pages_nodemask 
systemd-udevd  271288    5.38 fffff2d10d1b3800 __alloc_pages_nodemask 
dav1d-tile     283460    2.05 fffff2d10cdb0b80 __alloc_pages_nodemask 
dav1d-tile     283460    2.80 fffff2d10cdb8840 __alloc_pages_nodemask 
dav1d-tile     283460    2.83 fffff2d10cdb9200 __alloc_pages_nodemask 
RemVidParent   283460    2.82 fffff2d10cdba200 __alloc_pages_nodemask 
dav1d-tile     283460    4.11 fffff2d10cdbcf00 __alloc_pages_nodemask 
journalctl     286270    1.71 fffff2d10e2668c0 __alloc_pages_nodemask 
journalctl     286270    1.09 fffff2d10ccfac00 __alloc_pages_nodemask 
journalctl     286270    1.11 fffff2d10cce3200 __alloc_pages_nodemask 
journalctl     286270    2.96 fffff2d10cce7380 __alloc_pages_nodemask 
x-terminal-emu 24851     1.14 fffff2d1086a2a00 __alloc_pages_nodemask 
journalctl     286270    1.44 fffff2d10ccc41c0 __alloc_pages_nodemask 
kthreadd       2         1.25 fffff2d110c71000 __alloc_pages_nodemask 
journalctl     286270    1.05 fffff2d10ccb1200 __alloc_pages_nodemask 
journalctl     286270    1.65 fffff2d10ccbd700 __alloc_pages_nodemask 
journalctl     286270    3.08 fffff2d102229000 __alloc_pages_nodemask 
journalctl     286270    1.88 fffff2d10cbfd0c0 __alloc_pages_nodemask 
journalctl     286270    1.54 fffff2d109088100 __alloc_pages_nodemask 
journalctl     286270    1.04 fffff2d105db1800 __alloc_pages_nodemask 
kthreadd       2         2.57 fffff2d1092603c0 __alloc_pages_nodemask 
journalctl     286270    3.59 fffff2d109358780 __alloc_pages_nodemask 
journalctl     286270    1.21 fffff2d10ed83980 __alloc_pages_nodemask 
journalctl     286270    3.17 fffff2d10cb6be40 __alloc_pages_nodemask 
journalctl     286270    1.65 fffff2d10cb6de00 __alloc_pages_nodemask 
journalctl     286270    1.00 fffff2d10cb6fdc0 __alloc_pages_nodemask 
journalctl     286270    3.97 fffff2d10cb46700 __alloc_pages_nodemask 
journalctl     286270    1.08 fffff2d10fef7700 __alloc_pages_nodemask 
journalctl     286270    1.11 fffff2d1106ab840 __alloc_pages_nodemask 
journalctl     286270    1.16 fffff2d10cb204c0 __alloc_pages_nodemask 
journalctl     286270    1.22 fffff2d110d71640 __alloc_pages_nodemask 
journalctl     286270    1.10 fffff2d110e4afc0 __alloc_pages_nodemask 
journalctl     286270    1.03 fffff2d10d912580 __alloc_pages_nodemask 
journalctl     286270    1.06 fffff2d109de5a00 __alloc_pages_nodemask 
journalctl     286270    1.05 fffff2d10cad6f40 __alloc_pages_nodemask 
journalctl     286270    1.52 fffff2d10ca206c0 __alloc_pages_nodemask 
journalctl     286270    1.17 fffff2d10927f300 __alloc_pages_nodemask 
journalctl     286270    1.18 fffff2d10ca1a400 __alloc_pages_nodemask 
kthreadd       2         1.77 fffff2d11008d400 __alloc_pages_nodemask 
journalctl     286270    2.98 fffff2d108f41e40 __alloc_pages_nodemask 
journalctl     286270    1.09 fffff2d10c9fe440 __alloc_pages_nodemask 
journalctl     286270    1.53 fffff2d1090532c0 __alloc_pages_nodemask 
journalctl     286270    1.45 fffff2d10c9d4e40 __alloc_pages_nodemask 
kthreadd       2         1.72 fffff2d10fc46040 __alloc_pages_nodemask 
z_wr_iss_h     684       1.85 fffff2d10a384cc0 __alloc_pages_nodemask 
journalctl     286270    1.29 fffff2d10c9c0480 __alloc_pages_nodemask 
kthreadd       2         1.35 fffff2d109ffd8c0 __alloc_pages_nodemask 
systemd-udevd  288492    2.55 fffff2d10e30a5c0 __alloc_pages_nodemask

This feels better but not yet fixed.

Thanks

Revision history for this message

Andrea Righi (arighi) wrote on 2020-04-07:

#63

TL;DR @seth-arnold, as a test can you try to set the following options?

$ echo $((32 * 1024 * 1024)) | sudo tee /proc/sys/vm/dirty_bytes
$ echo $((32 * 1024 * 1024)) | sudo tee /proc/sys/vm/dirty_background_bytes

Repeat the test and see if the system is still unresponsive.

Details below.

This is what I think it's happening in this last scenario: interactive performance killed when a large I/O writer is running.

The large I/O writer generates a lot of dirty pages, nothing is forcing to sync those pages to the backing store until the dirty_ratio (=20%) / dirty_background_ratio (=10%) thresholds are hit. And they can be quite high with the default settings in systems with a lot of RAM.

For example in a system with 16GB of free/reclaimable memory, the amount of dirty memory that is allowed before a writer is actively forced to flush those pages to the backing store is: 16GB * 20 / 100 = 3.2GB. Flusher threads are started when the amount of dirty pages is 16GB * 10 / 100 = 1.6GB of dirty memory.

So, if the writer doesn't stop, it will consume all the free pages in the system and at that point we are going to have a lot of dirty pages. Then the kernel needs to decide what to do to free up some pages.

Reclaimable memory is the first choice: cached clean pages that already have a copy on the corresponding backing store are easy to reclaim, because they just need to be dropped from the page cache (no I/O involved). Dirty pages are more expensive to reclaim, because they need to be flushed to the backing store before freeing up the page. Same with anonymous memory that needs to be flushed to the swap device, before being able to re-use the page.

So when the system starts to reclaim some pages, we see some swap activity and we also see some I/O due to the flushing of the dirty pages. I think the system becomes sluggish, because there are too many dirty pages, the kernel is spending too much time to select the right pages to reclaim and interactive performance is killed.

This looks like a bug/regression in the kernel and I think we should definitely investigate more and track down the reason of the problem. In the meantime, as a test to prove this thoery I think we could try to reduce the amount of allowed dirty pages in the system, tuning the dirty thresholds: vm.dirty_bytes and vm.dirty_background_bytes (using the *_bytes tuners to have a more fine-grained control on those thresholds) and see if there are some benefits in the specific scenario reported by Seth.

TL;DR @seth-arnold, as a test can you try to set the following options?

$ echo $((32 * 1024 * 1024)) | sudo tee /proc/sys/vm/dirty_bytes
  $ echo $((32 * 1024 * 1024)) | sudo tee /proc/sys/vm/dirty_background_bytes

Repeat the test and see if the system is still unresponsive.

Details below.

This is what I think it's happening in this last scenario: interactive performance killed when a large I/O writer is running.

The large I/O writer generates a lot of dirty pages, nothing is forcing to sync those pages to the backing store until the dirty_ratio (=20%) / dirty_background_ratio (=10%) thresholds are hit. And they can be quite high with the default settings in systems with a lot of RAM.

For example in a system with 16GB of free/reclaimable memory, the amount of dirty memory that is allowed before a writer is actively forced to flush those pages to the backing store is: 16GB * 20 / 100 = 3.2GB. Flusher threads are started when the amount of dirty pages is 16GB * 10 / 100 = 1.6GB of dirty memory.

So, if the writer doesn't stop, it will consume all the free pages in the system and at that point we are going to have a lot of dirty pages. Then the kernel needs to decide what to do to free up some pages.

Reclaimable memory is the first choice: cached clean pages that already have a copy on the corresponding backing store are easy to reclaim, because they just need to be dropped from the page cache (no I/O involved). Dirty pages are more expensive to reclaim, because they need to be flushed to the backing store before freeing up the page. Same with anonymous memory that needs to be flushed to the swap device, before being able to re-use the page.

So when the system starts to reclaim some pages, we see some swap activity and we also see some I/O due to the flushing of the dirty pages. I think the system becomes sluggish, because there are too many dirty pages, the kernel is spending too much time to select the right pages to reclaim and interactive performance is killed.

This looks like a bug/regression in the kernel and I think we should definitely investigate more and track down the reason of the problem. In the meantime, as a test to prove this thoery I think we could try to reduce the amount of allowed dirty pages in the system, tuning the dirty thresholds: vm.dirty_bytes and vm.dirty_background_bytes (using the *_bytes tuners to have a more fine-grained control on those thresholds) and see if there are some benefits in the specific scenario reported by Seth.

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-04-08:

#64

after-limiting-dirty-bytes Edit (61.1 KiB, text/plain)

Thanks Andrea, I don't think that helped. I'll attach a file with vmstat 1 output and funclatency output, along with a few notes on the testing.

Thanks

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-04-08:

#65

I should point out that the period bursts of writes every five seconds in my vmstat 1 output is due to zfs's flushing mechanism; by default it flushes dirty pages every five seconds.

Revision history for this message

Andrea Righi (arighi) wrote on 2020-04-15:

#66

Hi Seth, sorry for my late response.

I did more tests this morning on my laptop tracing the callers of __alloc_pages_nodemask() and I noticed that pretty much all the time it is called by the i915 shrinker. So I tried to disable it and I have to say that on my laptop (at least) the system is always very responsive now even when the system is swapping out pages and I generate a lot of I/O.

I've uploaded this test kernel here (5.4.0-24.28+lp1861359v1):
https://kernel.ubuntu.com/~arighi/LP-1861359/

Can you do one more test with it and verify if it does some improvements also in your case?

Probably it's not yet a valid solution, but if it works it would tell us that the problem is exactly there. Thanks!

Revision history for this message

Andrea Righi (arighi) wrote on 2020-04-16:

#67

I've uploaded another test kernel (5.4.0-24.28+lp1861359v2):
https://kernel.ubuntu.com/~arighi/LP-1861359/

In this one, instead of completely disabling the i915 shrinker, I'm only preventing to swap out the i915 caches when the system is short on memory.

I'm testing this new one on my laptop right now and it seems to be a better compromise, since it still allows to free up some i915 memory, but it also prevents to completely kill interactive performance.

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-04-16:

#68

Download full text (35.2 KiB)

This bug was fixed in the package linux - 5.4.0-24.28

---------------
linux (5.4.0-24.28) focal; urgency=medium

* focal/linux: 5.4.0-24.28 -proposed tracker (LP: #1871939)

* getitimer returns it_value=0 erroneously (LP: #1349028)
- [Config] CONTEXT_TRACKING_FORCE policy should be unset

  * 12d1:1038 Dual-Role OTG device on non-HNP port - unable to enumerate USB
    device on port 1 (LP: #1047527)
    - [Config] USB_OTG_FSM policy not needed

* Add DCPD backlight support for HP CML system (LP: #1871589)
- SAUCE: drm/i915: Force DPCD backlight mode for HP CML 2020 system

  * Backlight brightness cannot be adjusted using keys (LP: #1860303)
    - SAUCE drm/i915: Force DPCD backlight mode for HP Spectre x360 Convertible
      13t-aw100

* CVE-2020-11494
- slcan: Don't transmit uninitialized stack data in padding

  * Ubuntu Kernel Support for OpenPOWER NV Secure & Trusted Boot (LP: #1866909)
    - powerpc: Detect the secure boot mode of the system
    - powerpc/ima: Add support to initialize ima policy rules
    - powerpc: Detect the trusted boot state of the system
    - powerpc/ima: Define trusted boot policy
    - ima: Make process_buffer_measurement() generic
    - certs: Add wrapper function to check blacklisted binary hash
    - ima: Check against blacklisted hashes for files with modsig
    - powerpc/ima: Update ima arch policy to check for blacklist
    - powerpc/ima: Indicate kernel modules appended signatures are enforced
    - powerpc/powernv: Add OPAL API interface to access secure variable
    - powerpc: expose secure variables to userspace via sysfs
    - x86/efi: move common keyring handler functions to new file
    - powerpc: Load firmware trusted keys/hashes into kernel keyring
    - x86/efi: remove unused variables

  * [roce-0227]sync mainline kernel 5.6rc3 roce patchset into ubuntu HWE kernel
    branch (LP: #1864950)
    - RDMA/hns: Cleanups of magic numbers
    - RDMA/hns: Optimize eqe buffer allocation flow
    - RDMA/hns: Add the workqueue framework for flush cqe handler
    - RDMA/hns: Delayed flush cqe process with workqueue
    - RDMA/hns: fix spelling mistake: "attatch" -> "attach"
    - RDMA/hns: Initialize all fields of doorbells to zero
    - RDMA/hns: Treat revision HIP08_A as a special case
    - RDMA/hns: Use flush framework for the case in aeq
    - RDMA/hns: Stop doorbell update while qp state error
    - RDMA/hns: Optimize qp destroy flow
    - RDMA/hns: Optimize qp context create and destroy flow
    - RDMA/hns: Optimize qp number assign flow
    - RDMA/hns: Optimize qp buffer allocation flow
    - RDMA/hns: Optimize qp param setup flow
    - RDMA/hns: Optimize kernel qp wrid allocation flow
    - RDMA/hns: Optimize qp doorbell allocation flow
    - RDMA/hns: Check if depth of qp is 0 before configure

  * [hns3-0316]sync mainline kernel 5.6rc4 hns3 patchset into ubuntu HWE kernel
    branch (LP: #1867586)
    - net: hns3: modify an unsuitable print when setting unknown duplex to fibre
    - net: hns3: add enabled TC numbers and DWRR weight info in debugfs
    - net: hns3: add support for dump MAC ID and loopback status in debugfs
    - net: hns3: add missing help info for QS shaper...

This bug was fixed in the package linux - 5.4.0-24.28

---------------
linux (5.4.0-24.28) focal; urgency=medium

* focal/linux: 5.4.0-24.28 -proposed tracker (LP: #1871939)

* getitimer returns it_value=0 erroneously (LP: #1349028)
    - [Config] CONTEXT_TRACKING_FORCE policy should be unset

* 12d1:1038 Dual-Role OTG device on non-HNP port - unable to enumerate USB
    device on port 1 (LP: #1047527)
    - [Config] USB_OTG_FSM policy not needed

* Add DCPD backlight support for HP CML system (LP: #1871589)
    - SAUCE: drm/i915: Force DPCD backlight mode for HP CML 2020 system

* Backlight brightness cannot be adjusted using keys (LP: #1860303)
    - SAUCE drm/i915: Force DPCD backlight mode for HP Spectre x360 Convertible
      13t-aw100

* CVE-2020-11494
    - slcan: Don't transmit uninitialized stack data in padding

* Ubuntu Kernel Support for OpenPOWER NV Secure & Trusted Boot (LP: #1866909)
    - powerpc: Detect the secure boot mode of the system
    - powerpc/ima: Add support to initialize ima policy rules
    - powerpc: Detect the trusted boot state of the system
    - powerpc/ima: Define trusted boot policy
    - ima: Make process_buffer_measurement() generic
    - certs: Add wrapper function to check blacklisted binary hash
    - ima: Check against blacklisted hashes for files with modsig
    - powerpc/ima: Update ima arch policy to check for blacklist
    - powerpc/ima: Indicate kernel modules appended signatures are enforced
    - powerpc/powernv: Add OPAL API interface to access secure variable
    - powerpc: expose secure variables to userspace via sysfs
    - x86/efi: move common keyring handler functions to new file
    - powerpc: Load firmware trusted keys/hashes into kernel keyring
    - x86/efi: remove unused variables

* [roce-0227]sync mainline kernel 5.6rc3  roce patchset into ubuntu HWE kernel
    branch (LP: #1864950)
    - RDMA/hns: Cleanups of magic numbers
    - RDMA/hns: Optimize eqe buffer allocation flow
    - RDMA/hns: Add the workqueue framework for flush cqe handler
    - RDMA/hns: Delayed flush cqe process with workqueue
    - RDMA/hns: fix spelling mistake: "attatch" -> "attach"
    - RDMA/hns: Initialize all fields of doorbells to zero
    - RDMA/hns: Treat revision HIP08_A as a special case
    - RDMA/hns: Use flush framework for the case in aeq
    - RDMA/hns: Stop doorbell update while qp state error
    - RDMA/hns: Optimize qp destroy flow
    - RDMA/hns: Optimize qp context create and destroy flow
    - RDMA/hns: Optimize qp number assign flow
    - RDMA/hns: Optimize qp buffer allocation flow
    - RDMA/hns: Optimize qp param setup flow
    - RDMA/hns: Optimize kernel qp wrid allocation flow
    - RDMA/hns: Optimize qp doorbell allocation flow
    - RDMA/hns: Check if depth of qp is 0 before configure

* [hns3-0316]sync mainline kernel 5.6rc4  hns3 patchset into ubuntu HWE kernel
    branch (LP: #1867586)
    - net: hns3: modify an unsuitable print when setting unknown duplex to fibre
    - net: hns3: add enabled TC numbers and DWRR weight info in debugfs
    - net: hns3: add support for dump MAC ID and loopback status in debugfs
    - net: hns3: add missing help info for QS shaper in debugfs
    - net: hns3: fix some mixed type assignment
    - net: hns3: rename macro HCLGE_MAX_NCL_CONFIG_LENGTH
    - net: hns3: remove an unnecessary resetting check in
      hclge_handle_hw_ras_error()
    - net: hns3: delete some reduandant code
    - net: hns3: add a check before PF inform VF to reset
    - net: hns3: print out status register when VF receives unknown source
      interrupt
    - net: hns3: print out command code when dump fails in debugfs
    - net: hns3: synchronize some print relating to reset issue
    - net: hns3: delete unnecessary logs after kzalloc fails

* [SRU][F/U/OEM-5.6] UBUNTU: SAUCE: Fix amdgpu hang during acpi event
    (LP: #1871316)
    - SAUCE: drm/amdgpu: Fix oops when pp_funcs is unset in ACPI event

* alsa: make the dmic detection align to the mainline kernel-5.6
    (LP: #1871284)
    - ALSA: hda: add Intel DSP configuration / probe code
    - ALSA: hda: fix intel DSP config
    - ALSA: hda: Allow non-Intel device probe gracefully
    - ALSA: hda: More constifications
    - ALSA: hda: Rename back to dmic_detect option
    - [Config] SND_INTEL_DSP_CONFIG=m

* add_key05 from ubuntu_ltp_syscalls failed (LP: #1869644)
    - KEYS: reaching the keys quotas correctly

* Fix authentication fail on Realtek WiFi 8723de (LP: #1871300)
    - SAUCE: rtw88: No retry and report for auth and assoc
    - SAUCE: rtw88: fix rate for a while after being connected
    - SAUCE: rtw88: Move driver IQK to set channel before association for 11N chip

* Add Mute LED support for an HP laptop (LP: #1871090)
    - ALSA: hda/realtek: Enable mute LED on an HP system

* dscr_sysfs_test / futex_bench / tm-unavailable  in powerpc from
    ubuntu_kernel_selftests timeout on PowerPC nodes with B-5.3 (LP: #1864642)
    - Revert "UBUNTU: SAUCE: selftests/powerpc -- Disable timeout for benchmark
      and tm tests"
    - selftests/powerpc: Turn off timeout setting for benchmarks, dscr, signal, tm

* Update 20.0.4 NVMe Core, NVMe FC Transport and nvme-cli for Broadcom Emulex
    lpfc driver 12.6.0.x dependencies (LP: #1856340)
    - nvme-fc: Sync nvme-fc header to FC-NVME-2
    - nvme-fc and nvmet-fc: sync with FC-NVME-2 header changes
    - nvme-fc: Set new cmd set indicator in nvme-fc cmnd iu
    - nvme-fc: clarify error messages
    - nvme-fc: ensure association_id is cleared regardless of a Disconnect LS
    - nvme: resync include/linux/nvme.h with nvmecli
    - nvme: Fix parsing of ANA log page

* Update Broadcom Emulex lpfc driver in 5.4 to 12.6.0.x from 5.5
    (LP: #1855303)
    - scsi: lpfc: Fix pt2pt discovery on SLI3 HBAs
    - scsi: lpfc: Fix premature re-enabling of interrupts in lpfc_sli_host_down
    - scsi: lpfc: Fix miss of register read failure check
    - scsi: lpfc: Fix NVME io abort failures causing hangs
    - scsi: lpfc: Fix device recovery errors after PLOGI failures
    - scsi: lpfc: Fix GPF on scsi command completion
    - scsi: lpfc: Fix NVMe ABTS in response to receiving an ABTS
    - scsi: lpfc: Fix coverity errors on NULL pointer checks
    - scsi: lpfc: Fix host hang at boot or slow boot
    - scsi: lpfc: Update async event logging
    - scsi: lpfc: Complete removal of FCoE T10 PI support on SLI-4 adapters
    - scsi: lpfc: cleanup: remove unused fcp_txcmlpq_cnt
    - scsi: lpfc: Update lpfc version to 12.4.0.1
    - scsi: lpfc: Make function lpfc_defer_pt2pt_acc static
    - scsi: lpfc: fix lpfc_nvmet_mrq to be bound by hdw queue count
    - scsi: lpfc: Fix reporting of read-only fw error errors
    - scsi: lpfc: Fix lockdep errors in sli_ringtx_put
    - scsi: lpfc: fix coverity error of dereference after null check
    - scsi: lpfc: Slight fast-path performance optimizations
    - scsi: lpfc: Remove lock contention target write path
    - scsi: lpfc: Revise interrupt coalescing for missing scenarios
    - scsi: lpfc: Make FW logging dynamically configurable
    - scsi: lpfc: Add log macros to allow print by serverity or verbosity setting
    - scsi: lpfc: Add FA-WWN Async Event reporting
    - scsi: lpfc: Add FC-AL support to lpe32000 models
    - scsi: lpfc: Add additional discovery log messages
    - scsi: lpfc: Update lpfc version to 12.6.0.0
    - scsi: lpfc: lpfc_attr: Fix Use plain integer as NULL pointer
    - scsi: lpfc: lpfc_nvmet: Fix Use plain integer as NULL pointer
    - scsi: lpfc: fix build error of lpfc_debugfs.c for vfree/vmalloc
    - scsi: lpfc: fix spelling error in MAGIC_NUMER_xxx
    - scsi: lpfc: Fix NULL check before mempool_destroy is not needed
    - scsi: lpfc: Make lpfc_debugfs_ras_log_data static
    - scsi: lpfc: Fix configuration of BB credit recovery in service parameters
    - scsi: lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port
      bounce
    - scsi: lpfc: Fix dynamic fw log enablement check
    - scsi: lpfc: Sync with FC-NVMe-2 SLER change to require Conf with SLER
    - scsi: lpfc: Clarify FAWNN error message
    - scsi: lpfc: Add registration for CPU Offline/Online events
    - scsi: lpfc: Change default IRQ model on AMD architectures
    - scsi: lpfc: Add enablement of multiple adapter dumps
    - scsi: lpfc: Update lpfc version to 12.6.0.1
    - scsi: lpfc: Fix a kernel warning triggered by lpfc_sli4_enable_intr()
    - scsi: lpfc: Fix lpfc_cpumask_of_node_init()
    - scsi: lpfc: fix inlining of lpfc_sli4_cleanup_poll_list()
    - scsi: lpfc: Initialize cpu_map for not present cpus
    - scsi: lpfc: revise nvme max queues to be hdwq count
    - scsi: lpfc: Update lpfc version to 12.6.0.2
    - scsi: lpfc: size cpu map by last cpu id set
    - scsi: lpfc: Fix incomplete NVME discovery when target
    - scsi: lpfc: Fix missing check for CSF in Write Object Mbox Rsp
    - scsi: lpfc: Fix Fabric hostname registration if system hostname changes
    - scsi: lpfc: Fix ras_log via debugfs
    - scsi: lpfc: Fix disablement of FC-AL on lpe35000 models
    - scsi: lpfc: Fix unmap of dpp bars affecting next driver load
    - scsi: lpfc: Fix MDS Latency Diagnostics Err-drop rates
    - scsi: lpfc: Fix improper flag check for IO type
    - scsi: lpfc: Update lpfc version to 12.6.0.3
    - scsi: lpfc: Fix RQ buffer leakage when no IOCBs available
    - scsi: lpfc: Fix lpfc_io_buf resource leak in lpfc_get_scsi_buf_s4 error path
    - scsi: lpfc: Fix broken Credit Recovery after driver load
    - scsi: lpfc: Fix registration of ELS type support in fdmi
    - scsi: lpfc: Fix release of hwq to clear the eq relationship
    - scsi: lpfc: Fix compiler warning on frame size
    - scsi: lpfc: Fix coverity errors in fmdi attribute handling
    - scsi: lpfc: Remove handler for obsolete ELS - Read Port Status (RPS)
    - scsi: lpfc: Clean up hba max_lun_queue_depth checks
    - scsi: lpfc: Update lpfc version to 12.6.0.4
    - scsi: lpfc: Copyright updates for 12.6.0.4 patches
    - scsi: fc: Update Descriptor definition and add RDF and Link Integrity FPINs
    - scsi: lpfc: add RDF registration and Link Integrity FPIN logging

* lockdown on power (LP: #1855668) // Ubuntu Kernel Support for OpenPOWER NV
    Secure & Trusted Boot (LP: #1866909)
    - [Config] Enable configs for OpenPOWER NV Secure & Trusted Boot

* lockdown on power (LP: #1855668)
    - SAUCE: (lockdown) powerpc: lock down kernel in secure boot mode

* Focal update: v5.4.30 upstream stable release (LP: #1870571)
    - mac80211: Check port authorization in the ieee80211_tx_dequeue() case
    - mac80211: fix authentication with iwlwifi/mvm
    - serial: sprd: Fix a dereference warning
    - vt: selection, introduce vc_is_sel
    - vt: ioctl, switch VT_IS_IN_USE and VT_BUSY to inlines
    - vt: switch vt_dont_switch to bool
    - vt: vt_ioctl: remove unnecessary console allocation checks
    - vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console
    - vt: vt_ioctl: fix use-after-free in vt_in_use()
    - platform/x86: pmc_atom: Add Lex 2I385SW to critclk_systems DMI table
    - bpf: Explicitly memset the bpf_attr structure
    - bpf: Explicitly memset some bpf info structures declared on the stack
    - gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 CHT + AXP288 model
    - net: ks8851-ml: Fix IO operations, again
    - clk: imx: Align imx sc clock msg structs to 4
    - clk: imx: Align imx sc clock parent msg structs to 4
    - clk: ti: am43xx: Fix clock parent for RTC clock
    - libceph: fix alloc_msg_with_page_vector() memory leaks
    - arm64: alternative: fix build with clang integrated assembler
    - perf map: Fix off by one in strncpy() size argument
    - ARM: dts: oxnas: Fix clear-mask property
    - ARM: bcm2835-rpi-zero-w: Add missing pinctrl name
    - ARM: dts: imx6: phycore-som: fix arm and soc minimum voltage
    - ARM: dts: N900: fix onenand timings
    - ARM: dts: sun8i: r40: Move AHCI device node based on address order
    - arm64: dts: ls1043a-rdb: correct RGMII delay mode to rgmii-id
    - arm64: dts: ls1046ardb: set RGMII interfaces to RGMII_ID mode
    - Linux 5.4.30

* Miscellaneous Ubuntu changes
    - [Config] CONFIG_RT_GROUP_SCHED=y
    - SAUCE: powerpc/ima: require IMA module signatures only if MODULE_SIG is not
      enabled
    - SAUCE: Update aufs to 5.4.3 20200302
    - SAUCE: drm/amdgpu: Remove missing firmware files from modinfo
    - SAUCE: drm/i915: Fix ref->mutex deadlock in i915_active_wait()
    - SAUCE: drm/i915: Synchronize active and retire callbacks
    - SAUCE: apparmor: add a valid state flags check
    - SAUCE: aapparmor: add consistency check between state and dfa diff encode
      flags
    - SAUCE: aapparmor: remove useless aafs_create_symlink
    - SAUCE: aapparmor: fail unpack if profile mode is unknown
    - SAUCE: apparmor: ensure that dfa state tables have entries
    - SAUCE: apparmor: fix potential label refcnt leak in aa_change_profile
    - SAUCE: security/apparmor/label.c: Clean code by removing redundant
      instructions
    - [Config] Remove PCIEASPM_DEBUG from annotations
    - [Config] Remove HEADER_TEST from annotations
    - SAUCE: selftests/seccomp: allow clock_nanosleep instead of nanosleep
    - [Debian] Allow building linux-libc-dev from linux-riscv
    - [Packaging] Remove riscv64 packaging from master kernel
    - [Config] Remove CONFIG_SND_HDA_INTEL_DETECT_DMIC from annotations

* Miscellaneous upstream changes
    - net/bpfilter: remove superfluous testing message
    - apparmor: increase left match history buffer size

linux (5.4.0-23.27) focal; urgency=medium

* Miscellaneous Ubuntu changes
    - [Packaging] Enable riscv64 build

linux (5.4.0-22.26) focal; urgency=medium

* focal/linux: 5.4.0-22.26 -proposed tracker (LP: #1870502)

* Packaging resync (LP: #1786013)
    - [Packaging] update variants
    - [Packaging] update helper scripts
    - update dkms package versions

* [SFC-0316]sync mainline kernel 5.7rc1 SFC patchset into ubuntu HWE kernel
    branch (LP: #1867588)
    - spi: Allow SPI controller override device buswidth
    - spi: HiSilicon v3xx: Properly set CMD_CONFIG for Dual/Quad modes
    - spi: HiSilicon v3xx: Use DMI quirk to set controller buswidth override bits

* [hns3-0316]sync mainline kernel 5.6rc4  hns3 patchset into ubuntu HWE kernel
    branch (LP: #1867586)
    - net: hns3: fix VF VLAN table entries inconsistent issue
    - net: hns3: fix RMW issue for VLAN filter switch
    - net: hns3: clear port base VLAN when unload PF

* [sas-0316]sync mainline kernel 5.6rc1 roce patchset into ubuntu HWE kernel
    branch (LP: #1867587)
    - scsi: hisi_sas: use threaded irq to process CQ interrupts
    - scsi: hisi_sas: replace spin_lock_irqsave/spin_unlock_restore with
      spin_lock/spin_unlock
    - scsi: hisi_sas: Replace magic number when handle channel interrupt
    - scsi: hisi_sas: Modify the file permissions of trigger_dump to write only
    - scsi: hisi_sas: Add prints for v3 hw interrupt converge and automatic
      affinity
    - scsi: hisi_sas: Rename hisi_sas_cq.pci_irq_mask

* Revert "nvme_fc: add module to ops template to allow module references"
    (LP: #1869947)
    - SAUCE: Revert "nvme_fc: add module to ops template to allow module
      references"

* suspend only works once on ThinkPad X1 Carbon gen 7 (LP: #1865570)
    - Revert "UBUNTU: SAUCE: e1000e: Disable s0ix flow for X1 Carbon 7th"
    - SAUCE: e1000e: bump up timeout to wait when ME un-configure ULP mode

* Focal update: v5.4.29 upstream stable release (LP: #1870142)
    - mmc: core: Allow host controllers to require R1B for CMD6
    - mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for erase/trim/discard
    - mmc: core: Respect MMC_CAP_NEED_RSP_BUSY for eMMC sleep command
    - mmc: sdhci-omap: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY
    - mmc: sdhci-tegra: Fix busy detection by enabling MMC_CAP_NEED_RSP_BUSY
    - ACPI: PM: s2idle: Rework ACPI events synchronization
    - cxgb4: fix throughput drop during Tx backpressure
    - cxgb4: fix Txq restart check during backpressure
    - geneve: move debug check after netdev unregister
    - hsr: fix general protection fault in hsr_addr_is_self()
    - ipv4: fix a RCU-list lock in inet_dump_fib()
    - macsec: restrict to ethernet devices
    - mlxsw: pci: Only issue reset when system is ready
    - mlxsw: spectrum_mr: Fix list iteration in error path
    - net/bpfilter: fix dprintf usage for /dev/kmsg
    - net: cbs: Fix software cbs to consider packet sending time
    - net: dsa: Fix duplicate frames flooded by learning
    - net: dsa: mt7530: Change the LINK bit to reflect the link status
    - net: dsa: tag_8021q: replace dsa_8021q_remove_header with __skb_vlan_pop
    - net: ena: Add PCI shutdown handler to allow safe kexec
    - net: mvneta: Fix the case where the last poll did not process all rx
    - net/packet: tpacket_rcv: avoid a producer race condition
    - net: phy: dp83867: w/a for fld detect threshold bootstrapping issue
    - net: phy: mdio-bcm-unimac: Fix clock handling
    - net: phy: mdio-mux-bcm-iproc: check clk_prepare_enable() return value
    - net: qmi_wwan: add support for ASKEY WWHC050
    - net/sched: act_ct: Fix leak of ct zone template on replace
    - net_sched: cls_route: remove the right filter from hashtable
    - net_sched: hold rtnl lock in tcindex_partial_destroy_work()
    - net_sched: keep alloc_hash updated after hash allocation
    - net: stmmac: dwmac-rk: fix error path in rk_gmac_probe
    - NFC: fdp: Fix a signedness bug in fdp_nci_send_patch()
    - r8169: re-enable MSI on RTL8168c
    - slcan: not call free_netdev before rtnl_unlock in slcan_open
    - tcp: also NULL skb->dev when copy was needed
    - tcp: ensure skb->dev is NULL before leaving TCP stack
    - tcp: repair: fix TCP_QUEUE_SEQ implementation
    - vxlan: check return value of gro_cells_init()
    - bnxt_en: Fix Priority Bytes and Packets counters in ethtool -S.
    - bnxt_en: fix memory leaks in bnxt_dcbnl_ieee_getets()
    - bnxt_en: Return error if bnxt_alloc_ctx_mem() fails.
    - bnxt_en: Free context memory after disabling PCI in probe error path.
    - bnxt_en: Reset rings if ring reservation fails during open()
    - net: ip_gre: Separate ERSPAN newlink / changelink callbacks
    - net: ip_gre: Accept IFLA_INFO_DATA-less configuration
    - hsr: use rcu_read_lock() in hsr_get_node_{list/status}()
    - hsr: add restart routine into hsr_get_node_list()
    - hsr: set .netnsok flag
    - net/mlx5: DR, Fix postsend actions write length
    - net/mlx5e: Enhance ICOSQ WQE info fields
    - net/mlx5e: Fix missing reset of SW metadata in Striding RQ reset
    - net/mlx5e: Fix ICOSQ recovery flow with Striding RQ
    - net/mlx5e: Do not recover from a non-fatal syndrome
    - cgroup-v1: cgroup_pidlist_next should update position index
    - nfs: add minor version to nfs_server_key for fscache
    - cpupower: avoid multiple definition with gcc -fno-common
    - drivers/of/of_mdio.c:fix of_mdiobus_register()
    - cgroup1: don't call release_agent when it is ""
    - [Config] updateconfigs for DPAA_ERRATUM_A050385
    - dt-bindings: net: FMan erratum A050385
    - arm64: dts: ls1043a: FMan erratum A050385
    - fsl/fman: detect FMan erratum A050385
    - drm/amd/display: update soc bb for nv14
    - drm/amdgpu: correct ROM_INDEX/DATA offset for VEGA20
    - drm/exynos: Fix cleanup of IOMMU related objects
    - iommu/vt-d: Silence RCU-list debugging warnings
    - s390/qeth: don't reset default_out_queue
    - s390/qeth: handle error when backing RX buffer
    - scsi: ipr: Fix softlockup when rescanning devices in petitboot
    - mac80211: Do not send mesh HWMP PREQ if HWMP is disabled
    - dpaa_eth: Remove unnecessary boolean expression in dpaa_get_headroom
    - sxgbe: Fix off by one in samsung driver strncpy size arg
    - net: hns3: fix "tc qdisc del" failed issue
    - iommu/vt-d: Fix debugfs register reads
    - iommu/vt-d: Populate debugfs if IOMMUs are detected
    - iwlwifi: mvm: fix non-ACPI function
    - i2c: hix5hd2: add missed clk_disable_unprepare in remove
    - Input: raydium_i2c_ts - fix error codes in raydium_i2c_boot_trigger()
    - Input: fix stale timestamp on key autorepeat events
    - Input: synaptics - enable RMI on HP Envy 13-ad105ng
    - Input: avoid BIT() macro usage in the serio.h UAPI header
    - IB/rdmavt: Free kernel completion queue when done
    - RDMA/core: Fix missing error check on dev_set_name()
    - gpiolib: Fix irq_disable() semantics
    - RDMA/nl: Do not permit empty devices names during RDMA_NLDEV_CMD_NEWLINK/SET
    - RDMA/mad: Do not crash if the rdma device does not have a umad interface
    - ceph: check POOL_FLAG_FULL/NEARFULL in addition to OSDMAP_FULL/NEARFULL
    - ceph: fix memory leak in ceph_cleanup_snapid_map()
    - ARM: dts: dra7: Add bus_dma_limit for L3 bus
    - ARM: dts: omap5: Add bus_dma_limit for L3 bus
    - x86/ioremap: Fix CONFIG_EFI=n build
    - perf probe: Fix to delete multiple probe event
    - perf probe: Do not depend on dwfl_module_addrsym()
    - rtlwifi: rtl8188ee: Fix regression due to commit d1d1a96bdb44
    - tools: Let O= makes handle a relative path with -C option
    - scripts/dtc: Remove redundant YYLOC global declaration
    - scsi: sd: Fix optimal I/O size for devices that change reported values
    - nl80211: fix NL80211_ATTR_CHANNEL_WIDTH attribute type
    - mac80211: drop data frames without key on encrypted links
    - mac80211: mark station unauthorized before key removal
    - mm/swapfile.c: move inode_lock out of claim_swapfile
    - drivers/base/memory.c: indicate all memory blocks as removable
    - mm/sparse: fix kernel crash with pfn_section_valid check
    - mm: fork: fix kernel_stack memcg stats for various stack implementations
    - gpiolib: acpi: Correct comment for HP x2 10 honor_wakeup quirk
    - gpiolib: acpi: Rework honor_wakeup option into an ignore_wake option
    - gpiolib: acpi: Add quirk to ignore EC wakeups on HP x2 10 BYT + AXP288 model
    - bpf: Fix cgroup ref leak in cgroup_bpf_inherit on out-of-memory
    - RDMA/core: Ensure security pkey modify is not lost
    - afs: Fix handling of an abort from a service handler
    - genirq: Fix reference leaks on irq affinity notifiers
    - xfrm: handle NETDEV_UNREGISTER for xfrm device
    - vti[6]: fix packet tx through bpf_redirect() in XinY cases
    - RDMA/mlx5: Fix the number of hwcounters of a dynamic counter
    - RDMA/mlx5: Fix access to wrong pointer while performing flush due to error
    - RDMA/mlx5: Block delay drop to unprivileged users
    - xfrm: fix uctx len check in verify_sec_ctx_len
    - xfrm: add the missing verify_sec_ctx_len check in xfrm_add_acquire
    - xfrm: policy: Fix doulbe free in xfrm_policy_timer
    - afs: Fix client call Rx-phase signal handling
    - afs: Fix some tracing details
    - afs: Fix unpinned address list during probing
    - ieee80211: fix HE SPR size calculation
    - mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211 TX
    - netfilter: flowtable: reload ip{v6}h in nf_flow_tuple_ip{v6}
    - netfilter: nft_fwd_netdev: validate family and chain type
    - netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
    - i2c: nvidia-gpu: Handle timeout correctly in gpu_i2c_check_status()
    - bpf, x32: Fix bug with JMP32 JSET BPF_X checking upper bits
    - bpf: Initialize storage pointers to NULL to prevent freeing garbage pointer
    - bpf/btf: Fix BTF verification of enum members in struct/union
    - bpf, sockmap: Remove bucket->lock from sock_{hash|map}_free
    - ARM: dts: sun8i-a83t-tbs-a711: Fix USB OTG mode detection
    - vti6: Fix memory leak of skb if input policy check fails
    - r8169: fix PHY driver check on platforms w/o module softdeps
    - clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources
    - USB: serial: option: add support for ASKEY WWHC050
    - USB: serial: option: add BroadMobi BM806U
    - USB: serial: option: add Wistron Neweb D19Q1
    - USB: cdc-acm: restore capability check order
    - USB: serial: io_edgeport: fix slab-out-of-bounds read in
      edge_interrupt_callback
    - usb: musb: fix crash with highmen PIO and usbmon
    - media: flexcop-usb: fix endpoint sanity check
    - media: usbtv: fix control-message timeouts
    - staging: kpc2000: prevent underflow in cpld_reconfigure()
    - staging: rtl8188eu: Add ASUS USB-N10 Nano B1 to device table
    - staging: wlan-ng: fix ODEBUG bug in prism2sta_disconnect_usb
    - staging: wlan-ng: fix use-after-free Read in hfa384x_usbin_callback
    - ahci: Add Intel Comet Lake H RAID PCI ID
    - libfs: fix infoleak in simple_attr_read()
    - media: ov519: add missing endpoint sanity checks
    - media: dib0700: fix rc endpoint lookup
    - media: stv06xx: add missing descriptor sanity checks
    - media: xirlink_cit: add missing descriptor sanity checks
    - media: v4l2-core: fix a use-after-free bug of sd->devnode
    - update wireguard dkms package version
    - [Config] updateconfigs for NET_REDIRECT
    - net: Fix CONFIG_NET_CLS_ACT=n and CONFIG_NFT_FWD_NETDEV={y, m} build
    - Linux 5.4.29

* Restore kernel control of PCIe DPC via option (LP: #1869423)
    - PCI/DPC: Add "pcie_ports=dpc-native" to allow DPC without AER control

* swap storms kills interactive use (LP: #1861359)
    - SAUCE: mm/page_alloc.c: disable memory reclaim watermark boosting by default

* sysfs: incorrect network device permissions on network namespace change
    (LP: #1865359)
    - sysfs: add sysfs_file_change_owner()
    - sysfs: add sysfs_link_change_owner()
    - sysfs: add sysfs_group{s}_change_owner()
    - sysfs: add sysfs_change_owner()
    - device: add device_change_owner()
    - drivers/base/power: add dpm_sysfs_change_owner()
    - net-sysfs: add netdev_change_owner()
    - net-sysfs: add queue_change_owner()
    - net: fix sysfs permssions when device changes network namespace
    - sysfs: fix static inline declaration of sysfs_groups_change_owner()

* Kernel Oops - general protection fault: 0000 [#1] SMP PTI after
    disconnecting thunderbolt docking station (LP: #1864754)
    - SAUCE: ptp: free ptp clock properly

* [Selftests] Apply various fixes and improvements (LP: #1870543)
    - SAUCE: selftests: net: ip_defrag: limit packet to 1000 fragments
    - SAUCE: kselftest/runner: avoid using timeout if timeout is disabled
    - SAUCE: selftests/seccomp -- Disable timeout for seccomp tests

* Focal update: v5.4.28 upstream stable release (LP: #1869061)
    - locks: fix a potential use-after-free problem when wakeup a waiter
    - locks: reinstate locks_delete_block optimization
    - spi: spi-omap2-mcspi: Support probe deferral for DMA channels
    - drm/mediatek: Find the cursor plane instead of hard coding it
    - phy: ti: gmii-sel: fix set of copy-paste errors
    - phy: ti: gmii-sel: do not fail in case of gmii
    - ARM: dts: dra7-l4: mark timer13-16 as pwm capable
    - spi: qup: call spi_qup_pm_resume_runtime before suspending
    - powerpc: Include .BTF section
    - cifs: fix potential mismatch of UNC paths
    - cifs: add missing mount option to /proc/mounts
    - ARM: dts: dra7: Add "dma-ranges" property to PCIe RC DT nodes
    - spi: pxa2xx: Add CS control clock quirk
    - spi/zynqmp: remove entry that causes a cs glitch
    - drm/exynos: dsi: propagate error value and silence meaningless warning
    - drm/exynos: dsi: fix workaround for the legacy clock name
    - drm/exynos: hdmi: don't leak enable HDMI_EN regulator if probe fails
    - drivers/perf: fsl_imx8_ddr: Correct the CLEAR bit definition
    - drivers/perf: arm_pmu_acpi: Fix incorrect checking of gicc pointer
    - altera-stapl: altera_get_note: prevent write beyond end of 'key'
    - dm bio record: save/restore bi_end_io and bi_integrity
    - dm integrity: use dm_bio_record and dm_bio_restore
    - riscv: avoid the PIC offset of static percpu data in module beyond 2G limits
    - ASoC: stm32: sai: manage rebind issue
    - spi: spi_register_controller(): free bus id on error paths
    - riscv: Force flat memory model with no-mmu
    - riscv: Fix range looking for kernel image memblock
    - drm/amdgpu: clean wptr on wb when gpu recovery
    - drm/amd/display: Clear link settings on MST disable connector
    - drm/amd/display: fix dcc swath size calculations on dcn1
    - xenbus: req->body should be updated before req->state
    - xenbus: req->err should be updated before req->state
    - block, bfq: fix overwrite of bfq_group pointer in bfq_find_set_group()
    - parse-maintainers: Mark as executable
    - binderfs: use refcount for binder control devices too
    - Revert "drm/fbdev: Fallback to non tiled mode if all tiles not present"
    - usb: quirks: add NO_LPM quirk for RTL8153 based ethernet adapters
    - USB: serial: option: add ME910G1 ECM composition 0x110b
    - usb: host: xhci-plat: add a shutdown
    - USB: serial: pl2303: add device-id for HP LD381
    - usb: xhci: apply XHCI_SUSPEND_DELAY to AMD XHCI controller 1022:145c
    - usb: typec: ucsi: displayport: Fix NULL pointer dereference
    - usb: typec: ucsi: displayport: Fix a potential race during registration
    - USB: cdc-acm: fix close_delay and closing_wait units in TIOCSSERIAL
    - USB: cdc-acm: fix rounding error in TIOCSSERIAL
    - ALSA: line6: Fix endless MIDI read loop
    - ALSA: hda/realtek - Enable headset mic of Acer X2660G with ALC662
    - ALSA: hda/realtek - Enable the headset of Acer N50-600 with ALC662
    - ALSA: seq: virmidi: Fix running status after receiving sysex
    - ALSA: seq: oss: Fix running status after receiving sysex
    - ALSA: pcm: oss: Avoid plugin buffer overflow
    - ALSA: pcm: oss: Remove WARNING from snd_pcm_plug_alloc() checks
    - tty: fix compat TIOCGSERIAL leaking uninitialized memory
    - tty: fix compat TIOCGSERIAL checking wrong function ptr
    - iio: chemical: sps30: fix missing triggered buffer dependency
    - iio: st_sensors: remap SMO8840 to LIS2DH12
    - iio: trigger: stm32-timer: disable master mode when stopping
    - iio: accel: adxl372: Set iio_chan BE
    - iio: magnetometer: ak8974: Fix negative raw values in sysfs
    - iio: adc: stm32-dfsdm: fix sleep in atomic context
    - iio: adc: at91-sama5d2_adc: fix differential channels in triggered mode
    - iio: light: vcnl4000: update sampling periods for vcnl4200
    - iio: light: vcnl4000: update sampling periods for vcnl4040
    - mmc: rtsx_pci: Fix support for speed-modes that relies on tuning
    - mmc: sdhci-of-at91: fix cd-gpios for SAMA5D2
    - mmc: sdhci-cadence: set SDHCI_QUIRK2_PRESET_VALUE_BROKEN for UniPhier
    - CIFS: fiemap: do not return EINVAL if get nothing
    - kbuild: Disable -Wpointer-to-enum-cast
    - staging: rtl8188eu: Add device id for MERCUSYS MW150US v2
    - staging: greybus: loopback_test: fix poll-mask build breakage
    - staging/speakup: fix get_word non-space look-ahead
    - intel_th: msu: Fix the unexpected state warning
    - intel_th: Fix user-visible error codes
    - intel_th: pci: Add Elkhart Lake CPU support
    - modpost: move the namespace field in Module.symvers last
    - rtc: max8907: add missing select REGMAP_IRQ
    - arm64: compat: Fix syscall number of compat_clock_getres
    - xhci: Do not open code __print_symbolic() in xhci trace events
    - btrfs: fix log context list corruption after rename whiteout error
    - drm/amd/amdgpu: Fix GPR read from debugfs (v2)
    - drm/lease: fix WARNING in idr_destroy
    - stm class: sys-t: Fix the use of time_after()
    - memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
    - mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
    - mm, memcg: throttle allocators based on ancestral memory.high
    - mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case
    - mm: do not allow MADV_PAGEOUT for CoW pages
    - epoll: fix possible lost wakeup on epoll_ctl() path
    - mm: slub: be more careful about the double cmpxchg of freelist
    - mm, slub: prevent kmalloc_node crashes and memory leaks
    - page-flags: fix a crash at SetPageError(THP_SWAP)
    - x86/mm: split vmalloc_sync_all()
    - futex: Fix inode life-time issue
    - futex: Unbreak futex hashing
    - arm64: smp: fix smp_send_stop() behaviour
    - arm64: smp: fix crash_smp_send_stop() behaviour
    - nvmet-tcp: set MSG_MORE only if we actually have more to send
    - drm/bridge: dw-hdmi: fix AVI frame colorimetry
    - staging: greybus: loopback_test: fix potential path truncation
    - staging: greybus: loopback_test: fix potential path truncations
    - Linux 5.4.28

* Pop sound from build-in speaker during cold boot and resume from S3
    (LP: #1866357) // Focal update: v5.4.28 upstream stable release
    (LP: #1869061)
    - ALSA: hda/realtek: Fix pop noise on ALC225

* Focal update: v5.4.28 upstream stable release (LP: #1869061)
    - perf/x86/amd: Add support for Large Increment per Cycle Events
    - EDAC/amd64: Add family ops for Family 19h Models 00h-0Fh
    - x86/MCE/AMD, EDAC/mce_amd: Add new Load Store unit McaType
    - EDAC/mce_amd: Always load on SMCA systems
    - x86/amd_nb: Add Family 19h PCI IDs
    - EDAC/amd64: Drop some family checks for newer systems

* Update mpt3sas Driver to 33.100.00.00 for Ubuntu 20.04 (LP: #1863574)
    - scsi: mpt3sas: Register trace buffer based on NVDATA settings
    - scsi: mpt3sas: Display message before releasing diag buffer
    - scsi: mpt3sas: Free diag buffer without any status check
    - scsi: mpt3sas: Maintain owner of buffer through UniqueID
    - scsi: mpt3sas: clear release bit when buffer reregistered
    - scsi: mpt3sas: Reuse diag buffer allocated at load time
    - scsi: mpt3sas: Add app owned flag support for diag buffer
    - scsi: mpt3sas: Fail release cmnd if diag buffer is released
    - scsi: mpt3sas: Use Component img header to get Package ver
    - scsi: mpt3sas: Fix module parameter max_msix_vectors
    - scsi: mpt3sas: Bump mpt3sas driver version to 32.100.00.00
    - scsi: mpt3sas: Clean up some indenting
    - scsi: mpt3sas: change allocation option
    - scsi: mpt3sas: Update MPI Headers to v02.00.57
    - scsi: mpt3sas: Add support for NVMe shutdown
    - scsi: mpt3sas: renamed _base_after_reset_handler function
    - scsi: mpt3sas: Add support IOCs new state named COREDUMP
    - scsi: mpt3sas: Handle CoreDump state from watchdog thread
    - scsi: mpt3sas: print in which path firmware fault occurred
    - scsi: mpt3sas: Optimize mpt3sas driver logging
    - scsi: mpt3sas: Print function name in which cmd timed out
    - scsi: mpt3sas: Remove usage of device_busy counter
    - scsi: mpt3sas: Update drive version to 33.100.00.00

* Ubuntu 20.04: megaraid_sas driver update to version 07.713.01.00-rc1
    (LP: #1863581)
    - scsi: megaraid_sas: Unique names for MSI-X vectors
    - scsi: megaraid_sas: remove unused variables 'debugBlk','fusion'
    - compat_ioctl: use correct compat_ptr() translation in drivers
    - scsi: megaraid_sas: Make poll_aen_lock static
    - scsi: megaraid_sas: Reset adapter if FW is not in READY state after device
      resume
    - scsi: megaraid_sas: Set no_write_same only for Virtual Disk
    - scsi: megaraid_sas: Update optimal queue depth for SAS and NVMe devices
    - scsi: megaraid_sas: Do not kill host bus adapter, if adapter is already dead
    - scsi: megaraid_sas: Do not kill HBA if JBOD Seqence map or RAID map is
      disabled
    - scsi: megaraid_sas: Do not set HBA Operational if FW is not in operational
      state
    - scsi: megaraid_sas: Re-Define enum DCMD_RETURN_STATUS
    - scsi: megaraid_sas: Limit the number of retries for the IOCTLs causing
      firmware fault
    - scsi: megaraid_sas: Use Block layer API to check SCSI device in-flight IO
      requests
    - scsi: megaraid_sas: Update driver version to 07.713.01.00-rc1
    - scsi: megaraid_sas: fixup MSIx interrupt setup during resume

-- Seth Forshee <seth.forshee@canonical.com>  Thu, 09 Apr 2020 16:49:54 -0500

Changed in linux (Ubuntu Focal):
status:	Confirmed → Fix Released

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2020-04-16:

#69

This entry:

* swap storms kills interactive use (LP: #1861359)
- SAUCE: mm/page_alloc.c: disable memory reclaim watermark boosting by default

closed this bug, but per latest comments, that isn't sufficient to address the issue. Putting back to Confirmed.

Changed in linux (Ubuntu Focal):
status:	Fix Released → Confirmed

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-04-17:

#70

Andrea, I've been running the v1 kernel for a day or so now:

[ 0.000000] Linux version 5.4.0-24-generic (arighi@sita) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu1)) #28+lp1861359v1 SMP Wed Apr 15 14:49:33 UTC 2020 (Ubuntu 5.4.0-24.28+lp1861359v1-generic 5.4.30)

$ uptime
02:21:45 up 1 day, 9 min, 14 users, load average: 0.42, 0.25, 0.23

and have not seen any hangs or stalls of any sort in the ~ten hours that I've been using it.

Normally these problems start to appear about a day into use, so it's perhaps still early to declare success, but this is looking promising.

Even loading https://platform.leolabs.space/visualization in firefox just causes momentary glitches in mouse movement, like 50ms kind of range. (I know this range is pretty poor for humans to tell, but it sure feels like less than half a second by a lot.)

Thanks

Revision history for this message

Andrea Righi (arighi) wrote on 2020-04-17:

#71

Seth, thanks for the update!

JFYI, I've just upladed also a v3 kernel (5.4.0-24.28+lp1861359v3) that I'm currently testing on my laptop with positive result. This change is even smaller than the previous one (v2), because we simply disable the direct swap out in the i915 shrinker (I915_SHRINK_WRITEBACK). It's really a one-line change, so the regression potential is very minimal.

If I don't find any problem I would consider to include this last fix. It would be great if you could also do a test with this one (so please ignore the v2 kernel). Thanks!

Andrea Righi (arighi) on 2020-04-17

description:

updated

Revision history for this message

Launchpad Janitor (janitor) wrote on 2020-04-21:

#72

This bug was fixed in the package linux - 5.4.0-26.30

---------------
linux (5.4.0-26.30) focal; urgency=medium

* focal/linux: 5.4.0-26.30 -proposed tracker (LP: #1873882)

* Packaging resync (LP: #1786013)
- update dkms package versions

* swap storms kills interactive use (LP: #1861359)
- SAUCE: drm/i915: prevent direct writeback from the shrinker

* 5.4.0-24.28 does not seem to apply rtprio, whereas -21 does. (LP: #1873315)
- [Config] lowlatency: turn off RT_GROUP_SCHED

* [RTL810xE] No ethernet connection (LP: #1871182)
- net: phy: realtek: fix handling of RTL8105e-integrated PHY

-- Andrea Righi <email address hidden> Mon, 20 Apr 2020 18:33:49 +0200

Changed in linux (Ubuntu Focal):
status:	Confirmed → Fix Released

Revision history for this message

Seth Arnold (seth-arnold) wrote on 2020-08-14:

#73

I was reminded of this bug earlier today -- Andrea, Sultan, thanks so much for fixing my issues. I've been happily running along for months now. :) Thanks!

Jeff Lane  (bladernr) on 2020-08-19

tags:

added: ubuntu-certified

Ubuntu
linux package

swap storms kills interactive use

Bug Description

CVE References

Other bug subscribers

Bug attachments

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Fix Released	High	Andrea Righi
	Focal	Fix Released	High	Andrea Righi

Ubuntulinux package

swap storms kills interactive use

Bug Description

CVE References

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package