Activity log for bug #1929612

Date Who What changed Old value New value Message
2021-05-25 20:41:11 Abram Wiebe bug added bug
2021-05-25 20:42:37 Abram Wiebe description ==Case Study== I was trying to figure out why Ubuntu was killing my program, even though system real memory pressure is high. It turned out to be a quirk of the lazy allocater, and for some reason if there is not enough real memory it is trigger happy with the OOMkiller instead of trying to solve the problem by garbage collecting first. I understand that the memory pressure watermark is introduced so heavy scans aren't taking place all the time, but if the system encounters a potential OOM, there should be a policy setting to try a heavy scan before going straight to OOMKilling. ==Summary== The kernel will leave memory mapped into a program's address space as part of lazy allocation, and will only free pages when a request both reduces total free memory (real+swap) below `vm.min_freekbytes`, and does not exceed free memory available. Any long lived process will always accumulate a huge pool of virtual memory that is over-committed because of the kernel's desire to do this. If the amount of memory available is greater than the watermark, and any program in the system then makes an allocation request which exceeds total free and available memory the OOMKiller is launched to kill programs. ==The ideal case== Introduce a `vm.overcommit_memory` policy that attempts to relocate memory before treating the system as OOM. If reclaim/relocate takes longer than a timeout or if after compacting there is not(or could not feasibly be with a quick preflight sum), then treat the system as OOM, instead of immediately killing the most memory hungry program. ==The workaround== Setting ``` sysctl vm.overcommit_memory=1 #Always grant memory sysctl vm.min_free_kbytes= $A_LARGE_NUMBER_SAY_A_GIG ``` Allows the system to recover by sidestepping the issue, but this setting is too low by default in Ubuntu for a memory hungry program and something that wants periodic large allocations running at once. In particular I was running a background scientific job, and tried to watch a youtube video in Firefox, and either the database server wanted a large allocation for a transaction or Firefox wanted a large allocation for a new window or video buffer, but one of those allocations was to large and prompted the OOMkiller to fire, even though by all accounts the amount of real memory in use was small, and neither task was actually using anywhere close to all of the memory it had been mapped, because it had freed that memory and had just been running for a while. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-image-5.8.0-53-generic 5.8.0-53.60~20.04.1 ProcVersionSignature: Ubuntu 5.8.0-53.60~20.04.1-generic 5.8.18 Uname: Linux 5.8.0-53-generic x86_64 NonfreeKernelModules: nvidia_modeset nvidia ApportVersion: 2.20.11-0ubuntu27.17 Architecture: amd64 CasperMD5CheckResult: skip CurrentDesktop: ubuntu:GNOME Date: Tue May 25 13:12:15 2021 InstallationDate: Installed on 2021-03-12 (74 days ago) InstallationMedia: Ubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64 (20210209.1) SourcePackage: linux-signed-hwe-5.8 UpgradeStatus: No upgrade log present (probably fresh install) ==Case Study== I was trying to figure out why Ubuntu was killing my program, even though system real memory pressure is not that high, though virtual memory seemed abnormally high compared to on other platforms (which as it turn out just do opportunistic page reclaim). It turned out to be a quirk of the lazy allocater, and for some reason if there is not enough real memory it is trigger happy with the OOMkiller instead of trying to solve the problem by garbage collecting first. I understand that the memory pressure watermark is introduced so heavy scans aren't taking place all the time, but if the system encounters a potential OOM, there should be a policy setting to try a heavy scan before going straight to OOMKilling. ==Summary== The kernel will leave memory mapped into a program's address space as part of lazy allocation, and will only free pages when a request both reduces total free memory (real+swap) below `vm.min_freekbytes`, and does not exceed free memory available. Any long lived process will always accumulate a huge pool of virtual memory that is over-committed because of the kernel's desire to do this. If the amount of memory available is greater than the watermark, and any program in the system then makes an allocation request which exceeds total free and available memory the OOMKiller is launched to kill programs. ==The ideal case== Introduce a `vm.overcommit_memory` policy that attempts to relocate memory before treating the system as OOM. If reclaim/relocate takes longer than a timeout or if after compacting there is not(or could not feasibly be with a quick preflight sum), then treat the system as OOM, instead of immediately killing the most memory hungry program. ==The workaround== Setting ``` sysctl vm.overcommit_memory=1 #Always grant memory sysctl vm.min_free_kbytes= $A_LARGE_NUMBER_SAY_A_GIG ``` Allows the system to recover by sidestepping the issue, but this setting is too low by default in Ubuntu for a memory hungry program and something that wants periodic large allocations running at once. In particular I was running a background scientific job, and tried to watch a youtube video in Firefox, and either the database server wanted a large allocation for a transaction or Firefox wanted a large allocation for a new window or video buffer, but one of those allocations was to large and prompted the OOMkiller to fire, even though by all accounts the amount of real memory in use was small, and neither task was actually using anywhere close to all of the memory it had been mapped, because it had freed that memory and had just been running for a while. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-image-5.8.0-53-generic 5.8.0-53.60~20.04.1 ProcVersionSignature: Ubuntu 5.8.0-53.60~20.04.1-generic 5.8.18 Uname: Linux 5.8.0-53-generic x86_64 NonfreeKernelModules: nvidia_modeset nvidia ApportVersion: 2.20.11-0ubuntu27.17 Architecture: amd64 CasperMD5CheckResult: skip CurrentDesktop: ubuntu:GNOME Date: Tue May 25 13:12:15 2021 InstallationDate: Installed on 2021-03-12 (74 days ago) InstallationMedia: Ubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64 (20210209.1) SourcePackage: linux-signed-hwe-5.8 UpgradeStatus: No upgrade log present (probably fresh install)
2021-05-26 13:06:55 Terry Rudd bug added subscriber Terry Rudd
2022-12-14 19:00:54 Abram Wiebe description ==Case Study== I was trying to figure out why Ubuntu was killing my program, even though system real memory pressure is not that high, though virtual memory seemed abnormally high compared to on other platforms (which as it turn out just do opportunistic page reclaim). It turned out to be a quirk of the lazy allocater, and for some reason if there is not enough real memory it is trigger happy with the OOMkiller instead of trying to solve the problem by garbage collecting first. I understand that the memory pressure watermark is introduced so heavy scans aren't taking place all the time, but if the system encounters a potential OOM, there should be a policy setting to try a heavy scan before going straight to OOMKilling. ==Summary== The kernel will leave memory mapped into a program's address space as part of lazy allocation, and will only free pages when a request both reduces total free memory (real+swap) below `vm.min_freekbytes`, and does not exceed free memory available. Any long lived process will always accumulate a huge pool of virtual memory that is over-committed because of the kernel's desire to do this. If the amount of memory available is greater than the watermark, and any program in the system then makes an allocation request which exceeds total free and available memory the OOMKiller is launched to kill programs. ==The ideal case== Introduce a `vm.overcommit_memory` policy that attempts to relocate memory before treating the system as OOM. If reclaim/relocate takes longer than a timeout or if after compacting there is not(or could not feasibly be with a quick preflight sum), then treat the system as OOM, instead of immediately killing the most memory hungry program. ==The workaround== Setting ``` sysctl vm.overcommit_memory=1 #Always grant memory sysctl vm.min_free_kbytes= $A_LARGE_NUMBER_SAY_A_GIG ``` Allows the system to recover by sidestepping the issue, but this setting is too low by default in Ubuntu for a memory hungry program and something that wants periodic large allocations running at once. In particular I was running a background scientific job, and tried to watch a youtube video in Firefox, and either the database server wanted a large allocation for a transaction or Firefox wanted a large allocation for a new window or video buffer, but one of those allocations was to large and prompted the OOMkiller to fire, even though by all accounts the amount of real memory in use was small, and neither task was actually using anywhere close to all of the memory it had been mapped, because it had freed that memory and had just been running for a while. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-image-5.8.0-53-generic 5.8.0-53.60~20.04.1 ProcVersionSignature: Ubuntu 5.8.0-53.60~20.04.1-generic 5.8.18 Uname: Linux 5.8.0-53-generic x86_64 NonfreeKernelModules: nvidia_modeset nvidia ApportVersion: 2.20.11-0ubuntu27.17 Architecture: amd64 CasperMD5CheckResult: skip CurrentDesktop: ubuntu:GNOME Date: Tue May 25 13:12:15 2021 InstallationDate: Installed on 2021-03-12 (74 days ago) InstallationMedia: Ubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64 (20210209.1) SourcePackage: linux-signed-hwe-5.8 UpgradeStatus: No upgrade log present (probably fresh install) ==Case Study== I was trying to figure out why Ubuntu was killing my program, even though system real memory pressure is not that high, though virtual memory seemed abnormally high compared to on other platforms (which as it turn out just do opportunistic page reclaim). It turned out to be a quirk of the lazy allocater, and for some reason if there is not enough real memory it is trigger happy with the OOMkiller instead of trying to solve the problem by garbage collecting first. I understand that the memory pressure watermark is introduced so heavy scans aren't taking place all the time, but if the system encounters a potential OOM, there should be a policy setting to try a heavy scan before going straight to OOMKilling. ==Summary== The kernel will leave memory mapped into a program's address space as part of lazy allocation, and will only free pages when a request both reduces total free memory (real+swap) below `vm.min_freekbytes`, and does not exceed free memory available. Any long lived process will always accumulate a huge pool of virtual memory that is over-committed because of the kernel's desire to do this. If the amount of memory available is greater than the watermark, and any program in the system then makes an allocation request which exceeds total free and available memory the OOMKiller is launched to kill programs. ==The ideal case== Introduce a `vm.overcommit_memory` policy that attempts to relocate memory before treating the system as OOM. If reclaim/relocate takes longer than a timeout or if after compacting there is not(or could not feasibly be with a quick preflight sum), then treat the system as OOM, instead of immediately killing the most memory hungry program. ==The workaround== Setting ``` sysctl vm.overcommit_memory=1 #Always grant memory sysctl vm.min_free_kbytes= $A_LARGE_NUMBER_SAY_A_GIG ``` Allows the system to recover by sidestepping the issue, but this setting is too low by default in Ubuntu for a memory hungry program and something that wants periodic large allocations running at once. In particular I was running a background scientific job, and tried to watch a youtube video in Firefox, and either the database server wanted a large allocation for a transaction or Firefox wanted a large allocation for a new window or video buffer, but one of those allocations was to large and prompted the OOMkiller to fire, even though by all accounts the amount of real memory in use was small, and neither task was actually using anywhere close to all of the memory it had been mapped, because it had freed that memory and had just been running for a while. ProblemType: Bug DistroRelease: Ubuntu 20.04 Package: linux-image-5.8.0-53-generic 5.8.0-53.60~20.04.1 ProcVersionSignature: Ubuntu 5.8.0-53.60~20.04.1-generic 5.8.18 Uname: Linux 5.8.0-53-generic x86_64 NonfreeKernelModules: nvidia_modeset nvidia ApportVersion: 2.20.11-0ubuntu27.17 Architecture: amd64 CasperMD5CheckResult: skip CurrentDesktop: ubuntu:GNOME Date: Tue May 25 13:12:15 2021 InstallationDate: Installed on 2021-03-12 (74 days ago) InstallationMedia: Ubuntu 20.04.2.0 LTS "Focal Fossa" - Release amd64 (20210209.1) SourcePackage: linux-signed-hwe-5.8 UpgradeStatus: No upgrade log present (probably fresh install) ----------------------------- Update 2022-12-14 Bug seems to have been caused by a second oomkillerd supplied by systemd, which kills long running programs too aggressively. sudo systemctl disable systemd-oomd.service sudo systemctl mask systemd-oomd https://askubuntu.com/questions/1404888/how-do-i-disable-the-systemd-oom-process-killer-in-ubuntu-22-04