Activity log for bug #1898786

Date Who What changed Old value New value Message
2020-10-06 20:47:33 Benjamin Allot bug added bug
2020-10-06 20:48:53 Benjamin Allot bug added subscriber The Canonical Sysadmins
2020-10-06 21:00:11 Ubuntu Kernel Bot linux (Ubuntu): status New Confirmed
2020-10-06 22:23:39 Haw Loeung bug added subscriber Haw Loeung
2020-10-06 23:41:02 Haw Loeung description Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [2]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command: | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly [0]: https://pastebin.canonical.com/p/wYYKwHdRXk/ [1]: https://pastebin.canonical.com/p/n2Tw57QyBC/ [2]: https://pastebin.canonical.com/p/3QqFTfdHhX/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices: total 0 crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions: linux-restricted-modules-4.15.0-118-generic N/A linux-backports-modules-4.15.0-118-generic N/A linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly [0]: https://pastebin.canonical.com/p/wYYKwHdRXk/ [1]: https://pastebin.canonical.com/p/n2Tw57QyBC/ [2]: https://pastebin.canonical.com/p/3QqFTfdHhX/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP
2020-10-07 00:57:52 William Grant bug added subscriber William Grant
2020-10-07 07:17:59 Juerg Haefliger nominated for series Ubuntu Bionic
2020-10-07 07:17:59 Juerg Haefliger bug task added linux (Ubuntu Bionic)
2020-10-07 07:32:42 Colin Watson bug added subscriber Colin Watson
2020-10-07 17:10:30 Mauricio Faria de Oliveira bug added subscriber Mauricio Faria de Oliveira
2020-10-08 09:44:05 Benjamin Allot description Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly [0]: https://pastebin.canonical.com/p/wYYKwHdRXk/ [1]: https://pastebin.canonical.com/p/n2Tw57QyBC/ [2]: https://pastebin.canonical.com/p/3QqFTfdHhX/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP
2020-10-08 09:46:07 Benjamin Allot description Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly Here is a trace of the bcache event related lock in the kernel obtained with some bpfcc-tools [4]. klockstat-bpfcc -c bch_ -i 5 -s 3 The trace has been run in parallel with the following command line echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker: $(date)"; echo 0 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep 60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep 60; echo "End of test: $(date)" [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/ ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP
2020-10-08 14:46:39 Benjamin Allot description Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly Here is a trace of the bcache event related lock in the kernel obtained with some bpfcc-tools [4]. klockstat-bpfcc -c bch_ -i 5 -s 3 The trace has been run in parallel with the following command line echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker: $(date)"; echo 0 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep 60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep 60; echo "End of test: $(date)" [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/ ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly Here is a trace of the bcache event related lock in the kernel obtained with some bpfcc-tools [4]. klockstat-bpfcc -c bch_ -i 5 -s 3 The trace has been run in parallel with the following command line echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker: $(date)"; echo 0 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep 60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep 60; echo "End of test: $(date)" Trying to dig more, we reduced by 20 GB the memory allocated to a VM on the server. * The bcache btree size fluctuation seems "normal" [5] * I noticed that, when the shrinker was enabled,a lot of time was spent in the locks during "bch_btree_insert_node". I decided to check if one of the function called during bch_btree_insert_node was taking longer than usual when the shrinker was enabled. I finally found the "funclatency" tool and tried do have the same approach I had with the klockstat [7]. However, that was inconclusive. I could see there that the bch_btree_insert_node was barely called during the whole duration of the test. Which made me think it's amount of time spent in lock is more due to another process acquiring the lock. I'm going to try to have another go with some perf/klockstat/funclatency focused on bch_mca_scan and the function called there. Also, here are some memory related metrics [8]. [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/ [5]: https://pastebin.ubuntu.com/p/McjxxqTVjn/ [6]: https://pastebin.ubuntu.com/p/KmrnW4Ng8F/ [7]: https://pastebin.ubuntu.com/p/fSX4c7tTFV/ [8]: https://pastebin.ubuntu.com/p/CZgXkgKhmJ/ ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP
2020-10-08 21:03:03 Benjamin Allot description Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly Here is a trace of the bcache event related lock in the kernel obtained with some bpfcc-tools [4]. klockstat-bpfcc -c bch_ -i 5 -s 3 The trace has been run in parallel with the following command line echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker: $(date)"; echo 0 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep 60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep 60; echo "End of test: $(date)" Trying to dig more, we reduced by 20 GB the memory allocated to a VM on the server. * The bcache btree size fluctuation seems "normal" [5] * I noticed that, when the shrinker was enabled,a lot of time was spent in the locks during "bch_btree_insert_node". I decided to check if one of the function called during bch_btree_insert_node was taking longer than usual when the shrinker was enabled. I finally found the "funclatency" tool and tried do have the same approach I had with the klockstat [7]. However, that was inconclusive. I could see there that the bch_btree_insert_node was barely called during the whole duration of the test. Which made me think it's amount of time spent in lock is more due to another process acquiring the lock. I'm going to try to have another go with some perf/klockstat/funclatency focused on bch_mca_scan and the function called there. Also, here are some memory related metrics [8]. [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/ [5]: https://pastebin.ubuntu.com/p/McjxxqTVjn/ [6]: https://pastebin.ubuntu.com/p/KmrnW4Ng8F/ [7]: https://pastebin.ubuntu.com/p/fSX4c7tTFV/ [8]: https://pastebin.ubuntu.com/p/CZgXkgKhmJ/ ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly Here is a trace of the bcache event related lock in the kernel obtained with some bpfcc-tools [4]. klockstat-bpfcc -c bch_ -i 5 -s 3 The trace has been run in parallel with the following command line echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker: $(date)"; echo 0 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep 60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep 60; echo "End of test: $(date)" Trying to dig more, we reduced by 20 GB the memory allocated to a VM on the server. * The bcache btree size fluctuation seems "normal" [5] * I noticed that, when the shrinker was enabled,a lot of time was spent in the locks during "bch_btree_insert_node". I decided to check if one of the function called during bch_btree_insert_node was taking longer than usual when the shrinker was enabled. I finally found the "funclatency" tool and tried do have the same approach I had with the klockstat [7]. However, that was inconclusive. I could see there that the bch_btree_insert_node was barely called during the whole duration of the test. Which made me think it's amount of time spent in lock is more due to another process acquiring the lock. I'm going to try to have another go with some perf/klockstat/funclatency focused on bch_mca_scan and the function called there. Also, here are some memory related metrics [8]. Now another perf stacktrace with the command used [9]. Strangely this one doesn't show any bch_mca_scan at all. I enabled the shrinker again, hoping to get more traces, but apparently the timeframe was not right. Not enough load to trigger the cliff resulting in a 1sec IOwait plateau. Which is interesting because that means that without the maximal workload, the kernel can cope with the shrinker. [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/ [5]: https://pastebin.ubuntu.com/p/McjxxqTVjn/ [6]: https://pastebin.ubuntu.com/p/KmrnW4Ng8F/ [7]: https://pastebin.ubuntu.com/p/fSX4c7tTFV/ [8]: https://pastebin.ubuntu.com/p/CZgXkgKhmJ/ [9]: https://pastebin.ubuntu.com/p/DzKCP8NGdf/ ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP
2020-10-08 23:22:20 Benjamin Allot description Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly Here is a trace of the bcache event related lock in the kernel obtained with some bpfcc-tools [4]. klockstat-bpfcc -c bch_ -i 5 -s 3 The trace has been run in parallel with the following command line echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker: $(date)"; echo 0 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep 60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep 60; echo "End of test: $(date)" Trying to dig more, we reduced by 20 GB the memory allocated to a VM on the server. * The bcache btree size fluctuation seems "normal" [5] * I noticed that, when the shrinker was enabled,a lot of time was spent in the locks during "bch_btree_insert_node". I decided to check if one of the function called during bch_btree_insert_node was taking longer than usual when the shrinker was enabled. I finally found the "funclatency" tool and tried do have the same approach I had with the klockstat [7]. However, that was inconclusive. I could see there that the bch_btree_insert_node was barely called during the whole duration of the test. Which made me think it's amount of time spent in lock is more due to another process acquiring the lock. I'm going to try to have another go with some perf/klockstat/funclatency focused on bch_mca_scan and the function called there. Also, here are some memory related metrics [8]. Now another perf stacktrace with the command used [9]. Strangely this one doesn't show any bch_mca_scan at all. I enabled the shrinker again, hoping to get more traces, but apparently the timeframe was not right. Not enough load to trigger the cliff resulting in a 1sec IOwait plateau. Which is interesting because that means that without the maximal workload, the kernel can cope with the shrinker. [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/ [5]: https://pastebin.ubuntu.com/p/McjxxqTVjn/ [6]: https://pastebin.ubuntu.com/p/KmrnW4Ng8F/ [7]: https://pastebin.ubuntu.com/p/fSX4c7tTFV/ [8]: https://pastebin.ubuntu.com/p/CZgXkgKhmJ/ [9]: https://pastebin.ubuntu.com/p/DzKCP8NGdf/ ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly Here is a trace of the bcache event related lock in the kernel obtained with some bpfcc-tools [4]. klockstat-bpfcc -c bch_ -i 5 -s 3 The trace has been run in parallel with the following command line echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker: $(date)"; echo 0 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep 60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep 60; echo "End of test: $(date)" Trying to dig more, we reduced by 20 GB the memory allocated to a VM on the server. * The bcache btree size fluctuation seems "normal" [5] * I noticed that, when the shrinker was enabled,a lot of time was spent in the locks during "bch_btree_insert_node". [6] I decided to check if one of the function called during bch_btree_insert_node was taking longer than usual when the shrinker was enabled. I finally found the "funclatency" tool and tried do have the same approach I had with the klockstat [7]. However, that was inconclusive. I could see there that the bch_btree_insert_node was barely called during the whole duration of the test. Which made me think it's amount of time spent in lock is more due to another process acquiring the lock. I'm going to try to have another go with some perf/klockstat/funclatency focused on bch_mca_scan and the function called there. Also, here are some memory related metrics [8]. Now another perf stacktrace with the command used [9]. Strangely this one doesn't show any bch_mca_scan at all. I enabled the shrinker again, hoping to get more traces, but apparently the timeframe was not right. Not enough load to trigger the cliff resulting in a 1sec IOwait plateau. Which is interesting because that means that without the maximal workload, the kernel can cope with the shrinker. [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/ [5]: https://pastebin.ubuntu.com/p/McjxxqTVjn/ [6]: https://pastebin.ubuntu.com/p/KmrnW4Ng8F/ [7]: https://pastebin.ubuntu.com/p/fSX4c7tTFV/ [8]: https://pastebin.ubuntu.com/p/CZgXkgKhmJ/ [9]: https://pastebin.ubuntu.com/p/DzKCP8NGdf/ ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP
2020-10-09 04:34:18 Matthew Ruffell bug added subscriber Matthew Ruffell
2020-10-09 13:56:15 Guilherme G. Piccoli bug added subscriber Guilherme G. Piccoli
2020-10-20 22:17:21 Matthew Ruffell summary Issue with bcache bch_mca_scan causing huge IO wait bcache: Issues with large IO wait in bch_mca_scan() when shrinker is enabled
2020-10-20 22:17:29 Matthew Ruffell nominated for series Ubuntu Focal
2020-10-20 22:17:29 Matthew Ruffell bug task added linux (Ubuntu Focal)
2020-10-20 22:17:36 Matthew Ruffell linux (Ubuntu): status Confirmed Fix Released
2020-10-20 22:17:41 Matthew Ruffell linux (Ubuntu Bionic): status New In Progress
2020-10-20 22:17:45 Matthew Ruffell linux (Ubuntu Focal): status New In Progress
2020-10-20 22:17:49 Matthew Ruffell linux (Ubuntu Bionic): importance Undecided Medium
2020-10-20 22:17:51 Matthew Ruffell linux (Ubuntu Focal): importance Undecided Medium
2020-10-20 22:17:55 Matthew Ruffell linux (Ubuntu Bionic): assignee Matthew Ruffell (mruffell)
2020-10-20 22:17:58 Matthew Ruffell linux (Ubuntu Focal): assignee Matthew Ruffell (mruffell)
2020-10-20 22:18:30 Matthew Ruffell description Hello, In short, we faced an issue with a huge IO wait on a bionic Ubuntu 4.15.0-118.119-generic kernel. This is the full list of process and the kernel function they were stuck in [0]. The main issue can probably be summarized by this perf reports * first identify that the cpu are stuck in idle because of something[1] * second, see what kernel function seems to stuck the process kswapd0 and kswapd1 [2]. We could see that this seems to be the mutex_lock in the bch_mca_scan function [3]. After running the command:  | sudo bash -c "echo 1 > /sys/fs/bcache/f1a1e8cb-3e6b-40ea-852e-583c48d0c2b8/internal/btree_shrinker_disabled" The server started to respond normally and the IO wait dropped significantly Here is a trace of the bcache event related lock in the kernel obtained with some bpfcc-tools [4]. klockstat-bpfcc -c bch_ -i 5 -s 3 The trace has been run in parallel with the following command line echo "Shrinker disabled: $(date)"; sleep 60; echo "Enabling shrinker: $(date)"; echo 0 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled ; sleep 60; echo "Disabling shrinker: $(date)"; echo 1 | sudo tee /sys/block/bcache0/bcache/cache/internal/btree_shrinker_disabled; sleep 60; echo "End of test: $(date)" Trying to dig more, we reduced by 20 GB the memory allocated to a VM on the server. * The bcache btree size fluctuation seems "normal" [5] * I noticed that, when the shrinker was enabled,a lot of time was spent in the locks during "bch_btree_insert_node". [6] I decided to check if one of the function called during bch_btree_insert_node was taking longer than usual when the shrinker was enabled. I finally found the "funclatency" tool and tried do have the same approach I had with the klockstat [7]. However, that was inconclusive. I could see there that the bch_btree_insert_node was barely called during the whole duration of the test. Which made me think it's amount of time spent in lock is more due to another process acquiring the lock. I'm going to try to have another go with some perf/klockstat/funclatency focused on bch_mca_scan and the function called there. Also, here are some memory related metrics [8]. Now another perf stacktrace with the command used [9]. Strangely this one doesn't show any bch_mca_scan at all. I enabled the shrinker again, hoping to get more traces, but apparently the timeframe was not right. Not enough load to trigger the cliff resulting in a 1sec IOwait plateau. Which is interesting because that means that without the maximal workload, the kernel can cope with the shrinker. [0]: https://pastebin.ubuntu.com/p/QYXPdsMCWC/ [1]: https://pastebin.ubuntu.com/p/BFsvF7H54r/ [2]: https://pastebin.ubuntu.com/p/35qdsHYHf5/ [3]: https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/bionic/tree/drivers/md/bcache/btree.c?h=Ubuntu-4.15.0-118.119#n674 [4]: https://pastebin.ubuntu.com/p/qhyqP35fCw/ [5]: https://pastebin.ubuntu.com/p/McjxxqTVjn/ [6]: https://pastebin.ubuntu.com/p/KmrnW4Ng8F/ [7]: https://pastebin.ubuntu.com/p/fSX4c7tTFV/ [8]: https://pastebin.ubuntu.com/p/CZgXkgKhmJ/ [9]: https://pastebin.ubuntu.com/p/DzKCP8NGdf/ ==================== $ cat /proc/version_signature Ubuntu 4.15.0-118.119-generic 4.15.18 ProblemType: Bug DistroRelease: Ubuntu 18.04 Package: linux-image-4.15.0-118-generic 4.15.0-118.119 ProcVersionSignature: User Name 4.15.0-118.119-generic 4.15.18 Uname: Linux 4.15.0-118-generic x86_64 AlsaDevices:  total 0  crw-rw---- 1 root audio 116, 1 Sep 29 10:04 seq  crw-rw---- 1 root audio 116, 33 Sep 29 10:04 timer AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay' ApportVersion: 2.20.9-0ubuntu7.16 Architecture: amd64 ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord' AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1: Date: Tue Oct 6 20:36:18 2020 IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig' MachineType: HP ProLiant DL380 G7 PciMultimedia: ProcFB: 0 radeondrmfb ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-118-generic root=UUID=c6ad1629-a506-4043-a339-6d57f0708d12 ro console=ttyS1,115200 nosplash RelatedPackageVersions:  linux-restricted-modules-4.15.0-118-generic N/A  linux-backports-modules-4.15.0-118-generic N/A  linux-firmware 1.173.18 RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill' SourcePackage: linux UpgradeStatus: Upgraded to bionic on 2019-09-27 (375 days ago) dmi.bios.date: 05/05/2011 dmi.bios.vendor: HP dmi.bios.version: P67 dmi.chassis.type: 23 dmi.chassis.vendor: HP dmi.modalias: dmi:bvnHP:bvrP67:bd05/05/2011:svnHP:pnProLiantDL380G7:pvr:cvnHP:ct23:cvr: dmi.product.family: ProLiant dmi.product.name: ProLiant DL380 G7 dmi.sys.vendor: HP BugLink: https://bugs.launchpad.net/bugs/1898786 [Impact] Systems that utilise bcache can experience extremely high IO wait times when under constant IO pressure. The IO wait times seem to stay at a consistent 1 second, and never drop as long as the bcache shrinker is enabled. If you disable the shrinker, then IO wait drops significantly to normal levels. We did some perf analysis, and it seems we spend a huge amount of time in bch_mca_scan(), likely waiting for the mutex "&c->bucket_lock". Looking at the recent commits in Bionic, we found the following commit merged in upstream 5.1-rc1 and backported to 4.15.0-87-generic through upstream stable: commit 9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b Author: Coly Li <colyli@suse.de> Date: Wed Nov 13 16:03:24 2019 +0800 Subject: bcache: at least try to shrink 1 node in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b It mentions in the description that: > If sc->nr_to_scan is smaller than c->btree_pages, after the above > calculation, variable 'nr' will be 0 and nothing will be shrunk. It is > frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make > nr to be zero. Then bch_mca_scan() will do nothing more then acquiring > and releasing mutex c->bucket_lock. This seems to be what is going on here, but the above commit only addresses when nr is 0. From what I can see, the problems we are experiencing are when nr is 1 or 2, and again, we just waste time in bch_mca_scan() waiting on c->bucket_lock, only to release c->bucket_lock since the shrinker loop never executes since there is no work to do. [Fix] The following commits fix the problem, and all landed in 5.6-rc1: commit 125d98edd11464c8e0ec9eaaba7d682d0f832686 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:40 2020 +0800 Subject: bcache: remove member accessed from struct btree Link: https://github.com/torvalds/linux/commit/125d98edd11464c8e0ec9eaaba7d682d0f832686 commit d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:41 2020 +0800 Subject: bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 commit e3de04469a49ee09c89e80bf821508df458ccee6 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:42 2020 +0800 Subject: bcache: reap from tail of c->btree_cache in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/e3de04469a49ee09c89e80bf821508df458ccee6 The first commit is a dependency to the second and third commits. The first commit is a dependency of the other two. The first commit removes a "recently accessed" marker, used to indicate if a particular cache has been used recently, and if it has been, not consider it for cache eviction. The commit mentions that under heavy IO, all caches will end up being recently accessed, and nothing will ever be shrunk. The second commit changes a previous design decision of skipping the first 3 caches to shrink, since it is a common case to call bch_mca_scan() with nr being 1, or 2, just like 0 was common in the very first commit I mentioned. This time, if we land on 1 or 2, the loop exits and nothing happens, and we waste time waiting on locks, just like the very first commit I mentioned. The fix is to try shrink caches from the tail of the list, and not the beginning. The third commit fixes a minor issue where we don't want to re-arrange the linked list c->btree_cache, which is what the second commit ended up doing, and instead, just shrink the cache at the end of the linked list, and not change the order. All commits are clean cherry picks to Bionic and Focal. [Testcase] This is kind of hard to test, since the problem shows up in production environments that are under constant IO pressure, with many different items entering and leaving the cache. The Launchpad git server is currently suffering this issue, and has been sitting at a constant IO wait of 1 second / slightly over 1 second which was causing slow response times, which was leading to build failures when git clones ended up timing out. We installed a test kernel, which is available in the following PPA: https://launchpad.net/~mruffell/+archive/ubuntu/sf294907-test Once the test kernel had been installed, IO wait times with the shrinker enabled dropped to normal levels, and the git server became responsive again. We have been monitoring the performance of the git server and watching IO wait times in grafana over the past week, and everything is looking good, and indicate that these patches solve the issue. [Regression Potential] If a regression were to occur, it would only affect users of bcache who have the shrinker enabled, which I believe is the default setting in bcache deployments. In that case, users could disable the shrinker as a workaround until a fix is available. You can disable the shrinker with: "sudo bash -c "echo 1 > /sys/fs/bcache/<uuid>/internal/btree_shrinker_disabled"" The patches remove the "recently accessed" marker, which changes the behaviour slightly when it comes to deciding what to evict from the cache. Some workloads that aren't under IO pressure may see a difference in what items are evicted from the cache, which may slightly change bcache's performance profile. Since the patches prune the cache from the tail of the cache linked list, we will likely end up pruning more nodes than previously, since it was common to land on the first 3 caches, which were hard coded to be ignored and not pruned, which may also slightly change the bcache's performance profile.
2020-10-20 22:18:57 Matthew Ruffell description BugLink: https://bugs.launchpad.net/bugs/1898786 [Impact] Systems that utilise bcache can experience extremely high IO wait times when under constant IO pressure. The IO wait times seem to stay at a consistent 1 second, and never drop as long as the bcache shrinker is enabled. If you disable the shrinker, then IO wait drops significantly to normal levels. We did some perf analysis, and it seems we spend a huge amount of time in bch_mca_scan(), likely waiting for the mutex "&c->bucket_lock". Looking at the recent commits in Bionic, we found the following commit merged in upstream 5.1-rc1 and backported to 4.15.0-87-generic through upstream stable: commit 9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b Author: Coly Li <colyli@suse.de> Date: Wed Nov 13 16:03:24 2019 +0800 Subject: bcache: at least try to shrink 1 node in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b It mentions in the description that: > If sc->nr_to_scan is smaller than c->btree_pages, after the above > calculation, variable 'nr' will be 0 and nothing will be shrunk. It is > frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make > nr to be zero. Then bch_mca_scan() will do nothing more then acquiring > and releasing mutex c->bucket_lock. This seems to be what is going on here, but the above commit only addresses when nr is 0. From what I can see, the problems we are experiencing are when nr is 1 or 2, and again, we just waste time in bch_mca_scan() waiting on c->bucket_lock, only to release c->bucket_lock since the shrinker loop never executes since there is no work to do. [Fix] The following commits fix the problem, and all landed in 5.6-rc1: commit 125d98edd11464c8e0ec9eaaba7d682d0f832686 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:40 2020 +0800 Subject: bcache: remove member accessed from struct btree Link: https://github.com/torvalds/linux/commit/125d98edd11464c8e0ec9eaaba7d682d0f832686 commit d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:41 2020 +0800 Subject: bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 commit e3de04469a49ee09c89e80bf821508df458ccee6 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:42 2020 +0800 Subject: bcache: reap from tail of c->btree_cache in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/e3de04469a49ee09c89e80bf821508df458ccee6 The first commit is a dependency to the second and third commits. The first commit is a dependency of the other two. The first commit removes a "recently accessed" marker, used to indicate if a particular cache has been used recently, and if it has been, not consider it for cache eviction. The commit mentions that under heavy IO, all caches will end up being recently accessed, and nothing will ever be shrunk. The second commit changes a previous design decision of skipping the first 3 caches to shrink, since it is a common case to call bch_mca_scan() with nr being 1, or 2, just like 0 was common in the very first commit I mentioned. This time, if we land on 1 or 2, the loop exits and nothing happens, and we waste time waiting on locks, just like the very first commit I mentioned. The fix is to try shrink caches from the tail of the list, and not the beginning. The third commit fixes a minor issue where we don't want to re-arrange the linked list c->btree_cache, which is what the second commit ended up doing, and instead, just shrink the cache at the end of the linked list, and not change the order. All commits are clean cherry picks to Bionic and Focal. [Testcase] This is kind of hard to test, since the problem shows up in production environments that are under constant IO pressure, with many different items entering and leaving the cache. The Launchpad git server is currently suffering this issue, and has been sitting at a constant IO wait of 1 second / slightly over 1 second which was causing slow response times, which was leading to build failures when git clones ended up timing out. We installed a test kernel, which is available in the following PPA: https://launchpad.net/~mruffell/+archive/ubuntu/sf294907-test Once the test kernel had been installed, IO wait times with the shrinker enabled dropped to normal levels, and the git server became responsive again. We have been monitoring the performance of the git server and watching IO wait times in grafana over the past week, and everything is looking good, and indicate that these patches solve the issue. [Regression Potential] If a regression were to occur, it would only affect users of bcache who have the shrinker enabled, which I believe is the default setting in bcache deployments. In that case, users could disable the shrinker as a workaround until a fix is available. You can disable the shrinker with: "sudo bash -c "echo 1 > /sys/fs/bcache/<uuid>/internal/btree_shrinker_disabled"" The patches remove the "recently accessed" marker, which changes the behaviour slightly when it comes to deciding what to evict from the cache. Some workloads that aren't under IO pressure may see a difference in what items are evicted from the cache, which may slightly change bcache's performance profile. Since the patches prune the cache from the tail of the cache linked list, we will likely end up pruning more nodes than previously, since it was common to land on the first 3 caches, which were hard coded to be ignored and not pruned, which may also slightly change the bcache's performance profile. BugLink: https://bugs.launchpad.net/bugs/1898786 [Impact] Systems that utilise bcache can experience extremely high IO wait times when under constant IO pressure. The IO wait times seem to stay at a consistent 1 second, and never drop as long as the bcache shrinker is enabled. If you disable the shrinker, then IO wait drops significantly to normal levels. We did some perf analysis, and it seems we spend a huge amount of time in bch_mca_scan(), likely waiting for the mutex "&c->bucket_lock". Looking at the recent commits in Bionic, we found the following commit merged in upstream 5.1-rc1 and backported to 4.15.0-87-generic through upstream stable: commit 9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b Author: Coly Li <colyli@suse.de> Date: Wed Nov 13 16:03:24 2019 +0800 Subject: bcache: at least try to shrink 1 node in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b It mentions in the description that: > If sc->nr_to_scan is smaller than c->btree_pages, after the above > calculation, variable 'nr' will be 0 and nothing will be shrunk. It is > frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make > nr to be zero. Then bch_mca_scan() will do nothing more then acquiring > and releasing mutex c->bucket_lock. This seems to be what is going on here, but the above commit only addresses when nr is 0. From what I can see, the problems we are experiencing are when nr is 1 or 2, and again, we just waste time in bch_mca_scan() waiting on c->bucket_lock, only to release c->bucket_lock since the shrinker loop never executes since there is no work to do. [Fix] The following commits fix the problem, and all landed in 5.6-rc1: commit 125d98edd11464c8e0ec9eaaba7d682d0f832686 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:40 2020 +0800 Subject: bcache: remove member accessed from struct btree Link: https://github.com/torvalds/linux/commit/125d98edd11464c8e0ec9eaaba7d682d0f832686 commit d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:41 2020 +0800 Subject: bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 commit e3de04469a49ee09c89e80bf821508df458ccee6 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:42 2020 +0800 Subject: bcache: reap from tail of c->btree_cache in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/e3de04469a49ee09c89e80bf821508df458ccee6 The first commit is a dependency of the other two. The first commit removes a "recently accessed" marker, used to indicate if a particular cache has been used recently, and if it has been, not consider it for cache eviction. The commit mentions that under heavy IO, all caches will end up being recently accessed, and nothing will ever be shrunk. The second commit changes a previous design decision of skipping the first 3 caches to shrink, since it is a common case to call bch_mca_scan() with nr being 1, or 2, just like 0 was common in the very first commit I mentioned. This time, if we land on 1 or 2, the loop exits and nothing happens, and we waste time waiting on locks, just like the very first commit I mentioned. The fix is to try shrink caches from the tail of the list, and not the beginning. The third commit fixes a minor issue where we don't want to re-arrange the linked list c->btree_cache, which is what the second commit ended up doing, and instead, just shrink the cache at the end of the linked list, and not change the order. All commits are clean cherry picks to Bionic and Focal. [Testcase] This is kind of hard to test, since the problem shows up in production environments that are under constant IO pressure, with many different items entering and leaving the cache. The Launchpad git server is currently suffering this issue, and has been sitting at a constant IO wait of 1 second / slightly over 1 second which was causing slow response times, which was leading to build failures when git clones ended up timing out. We installed a test kernel, which is available in the following PPA: https://launchpad.net/~mruffell/+archive/ubuntu/sf294907-test Once the test kernel had been installed, IO wait times with the shrinker enabled dropped to normal levels, and the git server became responsive again. We have been monitoring the performance of the git server and watching IO wait times in grafana over the past week, and everything is looking good, and indicate that these patches solve the issue. [Regression Potential] If a regression were to occur, it would only affect users of bcache who have the shrinker enabled, which I believe is the default setting in bcache deployments. In that case, users could disable the shrinker as a workaround until a fix is available. You can disable the shrinker with: "sudo bash -c "echo 1 > /sys/fs/bcache/<uuid>/internal/btree_shrinker_disabled"" The patches remove the "recently accessed" marker, which changes the behaviour slightly when it comes to deciding what to evict from the cache. Some workloads that aren't under IO pressure may see a difference in what items are evicted from the cache, which may slightly change bcache's performance profile. Since the patches prune the cache from the tail of the cache linked list, we will likely end up pruning more nodes than previously, since it was common to land on the first 3 caches, which were hard coded to be ignored and not pruned, which may also slightly change the bcache's performance profile.
2020-10-20 22:20:29 Matthew Ruffell tags amd64 apport-bug bionic amd64 apport-bug bionic focal
2020-10-20 22:20:36 Matthew Ruffell tags amd64 apport-bug bionic focal amd64 apport-bug bionic focal sts
2020-10-21 01:28:06 Matthew Ruffell description BugLink: https://bugs.launchpad.net/bugs/1898786 [Impact] Systems that utilise bcache can experience extremely high IO wait times when under constant IO pressure. The IO wait times seem to stay at a consistent 1 second, and never drop as long as the bcache shrinker is enabled. If you disable the shrinker, then IO wait drops significantly to normal levels. We did some perf analysis, and it seems we spend a huge amount of time in bch_mca_scan(), likely waiting for the mutex "&c->bucket_lock". Looking at the recent commits in Bionic, we found the following commit merged in upstream 5.1-rc1 and backported to 4.15.0-87-generic through upstream stable: commit 9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b Author: Coly Li <colyli@suse.de> Date: Wed Nov 13 16:03:24 2019 +0800 Subject: bcache: at least try to shrink 1 node in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b It mentions in the description that: > If sc->nr_to_scan is smaller than c->btree_pages, after the above > calculation, variable 'nr' will be 0 and nothing will be shrunk. It is > frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make > nr to be zero. Then bch_mca_scan() will do nothing more then acquiring > and releasing mutex c->bucket_lock. This seems to be what is going on here, but the above commit only addresses when nr is 0. From what I can see, the problems we are experiencing are when nr is 1 or 2, and again, we just waste time in bch_mca_scan() waiting on c->bucket_lock, only to release c->bucket_lock since the shrinker loop never executes since there is no work to do. [Fix] The following commits fix the problem, and all landed in 5.6-rc1: commit 125d98edd11464c8e0ec9eaaba7d682d0f832686 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:40 2020 +0800 Subject: bcache: remove member accessed from struct btree Link: https://github.com/torvalds/linux/commit/125d98edd11464c8e0ec9eaaba7d682d0f832686 commit d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:41 2020 +0800 Subject: bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 commit e3de04469a49ee09c89e80bf821508df458ccee6 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:42 2020 +0800 Subject: bcache: reap from tail of c->btree_cache in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/e3de04469a49ee09c89e80bf821508df458ccee6 The first commit is a dependency of the other two. The first commit removes a "recently accessed" marker, used to indicate if a particular cache has been used recently, and if it has been, not consider it for cache eviction. The commit mentions that under heavy IO, all caches will end up being recently accessed, and nothing will ever be shrunk. The second commit changes a previous design decision of skipping the first 3 caches to shrink, since it is a common case to call bch_mca_scan() with nr being 1, or 2, just like 0 was common in the very first commit I mentioned. This time, if we land on 1 or 2, the loop exits and nothing happens, and we waste time waiting on locks, just like the very first commit I mentioned. The fix is to try shrink caches from the tail of the list, and not the beginning. The third commit fixes a minor issue where we don't want to re-arrange the linked list c->btree_cache, which is what the second commit ended up doing, and instead, just shrink the cache at the end of the linked list, and not change the order. All commits are clean cherry picks to Bionic and Focal. [Testcase] This is kind of hard to test, since the problem shows up in production environments that are under constant IO pressure, with many different items entering and leaving the cache. The Launchpad git server is currently suffering this issue, and has been sitting at a constant IO wait of 1 second / slightly over 1 second which was causing slow response times, which was leading to build failures when git clones ended up timing out. We installed a test kernel, which is available in the following PPA: https://launchpad.net/~mruffell/+archive/ubuntu/sf294907-test Once the test kernel had been installed, IO wait times with the shrinker enabled dropped to normal levels, and the git server became responsive again. We have been monitoring the performance of the git server and watching IO wait times in grafana over the past week, and everything is looking good, and indicate that these patches solve the issue. [Regression Potential] If a regression were to occur, it would only affect users of bcache who have the shrinker enabled, which I believe is the default setting in bcache deployments. In that case, users could disable the shrinker as a workaround until a fix is available. You can disable the shrinker with: "sudo bash -c "echo 1 > /sys/fs/bcache/<uuid>/internal/btree_shrinker_disabled"" The patches remove the "recently accessed" marker, which changes the behaviour slightly when it comes to deciding what to evict from the cache. Some workloads that aren't under IO pressure may see a difference in what items are evicted from the cache, which may slightly change bcache's performance profile. Since the patches prune the cache from the tail of the cache linked list, we will likely end up pruning more nodes than previously, since it was common to land on the first 3 caches, which were hard coded to be ignored and not pruned, which may also slightly change the bcache's performance profile. BugLink: https://bugs.launchpad.net/bugs/1898786 [Impact] Systems that utilise bcache can experience extremely high IO wait times when under constant IO pressure. The IO wait times seem to stay at a consistent 1 second, and never drop as long as the bcache shrinker is enabled. If you disable the shrinker, then IO wait drops significantly to normal levels. We did some perf analysis, and it seems we spend a huge amount of time in bch_mca_scan(), likely waiting for the mutex "&c->bucket_lock". Looking at the recent commits in Bionic, we found the following commit merged in upstream 5.1-rc1 and backported to 4.15.0-87-generic through upstream stable: commit 9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b Author: Coly Li <colyli@suse.de> Date: Wed Nov 13 16:03:24 2019 +0800 Subject: bcache: at least try to shrink 1 node in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/9fcc34b1a6dd4b8e5337e2b6ef45e428897eca6b It mentions in the description that: > If sc->nr_to_scan is smaller than c->btree_pages, after the above > calculation, variable 'nr' will be 0 and nothing will be shrunk. It is > frequeently observed that only 1 or 2 is set to sc->nr_to_scan and make > nr to be zero. Then bch_mca_scan() will do nothing more then acquiring > and releasing mutex c->bucket_lock. This seems to be what is going on here, but the above commit only addresses when nr is 0. From what I can see, the problems we are experiencing are when nr is 1 or 2, and again, we just waste time in bch_mca_scan() waiting on c->bucket_lock, only to release c->bucket_lock since the shrinker loop never executes since there is no work to do. [Fix] The following commits fix the problem, and all landed in 5.6-rc1: commit 125d98edd11464c8e0ec9eaaba7d682d0f832686 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:40 2020 +0800 Subject: bcache: remove member accessed from struct btree Link: https://github.com/torvalds/linux/commit/125d98edd11464c8e0ec9eaaba7d682d0f832686 commit d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:41 2020 +0800 Subject: bcache: reap c->btree_cache_freeable from the tail in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/d5c9c470b01177e4d90cdbf178b8c7f37f5b8795 commit e3de04469a49ee09c89e80bf821508df458ccee6 Author: Coly Li <colyli@suse.de> Date: Fri Jan 24 01:01:42 2020 +0800 Subject: bcache: reap from tail of c->btree_cache in bch_mca_scan() Link: https://github.com/torvalds/linux/commit/e3de04469a49ee09c89e80bf821508df458ccee6 The first commit is a dependency of the other two. The first commit removes a "recently accessed" marker, used to indicate if a particular cache has been used recently, and if it has been, not consider it for cache eviction. The commit mentions that under heavy IO, all caches will end up being recently accessed, and nothing will ever be shrunk. The second commit changes a previous design decision of skipping the first 3 caches to shrink, since it is a common case to call bch_mca_scan() with nr being 1, or 2, just like 0 was common in the very first commit I mentioned. This time, if we land on 1 or 2, the loop exits and nothing happens, and we waste time waiting on locks, just like the very first commit I mentioned. The fix is to try shrink caches from the tail of the list, and not the beginning. The third commit fixes a minor issue where we don't want to re-arrange the linked list c->btree_cache, which is what the second commit ended up doing, and instead, just shrink the cache at the end of the linked list, and not change the order. One minor backport / context adjustment was required in the first commit for Bionic, and the rest are all clean cherry picks to Bionic and Focal. [Testcase] This is kind of hard to test, since the problem shows up in production environments that are under constant IO pressure, with many different items entering and leaving the cache. The Launchpad git server is currently suffering this issue, and has been sitting at a constant IO wait of 1 second / slightly over 1 second which was causing slow response times, which was leading to build failures when git clones ended up timing out. We installed a test kernel, which is available in the following PPA: https://launchpad.net/~mruffell/+archive/ubuntu/sf294907-test Once the test kernel had been installed, IO wait times with the shrinker enabled dropped to normal levels, and the git server became responsive again. We have been monitoring the performance of the git server and watching IO wait times in grafana over the past week, and everything is looking good, and indicate that these patches solve the issue. [Regression Potential] If a regression were to occur, it would only affect users of bcache who have the shrinker enabled, which I believe is the default setting in bcache deployments. In that case, users could disable the shrinker as a workaround until a fix is available. You can disable the shrinker with: "sudo bash -c "echo 1 > /sys/fs/bcache/<uuid>/internal/btree_shrinker_disabled"" The patches remove the "recently accessed" marker, which changes the behaviour slightly when it comes to deciding what to evict from the cache. Some workloads that aren't under IO pressure may see a difference in what items are evicted from the cache, which may slightly change bcache's performance profile. Since the patches prune the cache from the tail of the cache linked list, we will likely end up pruning more nodes than previously, since it was common to land on the first 3 caches, which were hard coded to be ignored and not pruned, which may also slightly change the bcache's performance profile.
2020-10-22 18:24:41 Ian May linux (Ubuntu Focal): status In Progress Fix Committed
2020-10-22 19:06:29 Ian May linux (Ubuntu Bionic): status In Progress Fix Committed
2020-11-17 10:03:19 Ubuntu Kernel Bot tags amd64 apport-bug bionic focal sts amd64 apport-bug bionic focal sts verification-needed-bionic
2020-11-17 10:05:19 Ubuntu Kernel Bot tags amd64 apport-bug bionic focal sts verification-needed-bionic amd64 apport-bug bionic focal sts verification-needed-bionic verification-needed-focal
2020-11-26 02:24:59 Matthew Ruffell tags amd64 apport-bug bionic focal sts verification-needed-bionic verification-needed-focal amd64 apport-bug bionic focal sts verification-done-focal verification-needed-bionic
2020-11-26 03:21:01 Matthew Ruffell tags amd64 apport-bug bionic focal sts verification-done-focal verification-needed-bionic amd64 apport-bug bionic focal sts verification-done-bionic verification-done-focal
2020-11-30 15:46:09 Launchpad Janitor linux (Ubuntu Focal): status Fix Committed Fix Released
2020-11-30 15:46:09 Launchpad Janitor cve linked 2020-14351
2020-11-30 15:46:09 Launchpad Janitor cve linked 2020-4788
2020-12-02 05:53:32 Launchpad Janitor linux (Ubuntu Bionic): status Fix Committed Fix Released