apparent memory usage regression - not getting freed?

Bug #1844962 reported by Joe Barnett
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

In eoan on a 5.3 kernel, have noticed a few incidents of the system becoming sluggish/unresponsive, which appears to be caused by low available memory. top is reporting 13 of 16G are "used", but adding up the memory reported by top sorted by memory usage appears like it should be closer to 3-4G "used".

Attaching two top screenshots, one shows 3.3G used and system behaving well, the other shows 14G used and system lagging. In both cases, approximately the same programs are being run taking up approximately the same amount of resident memory, so not sure where all the memory usage is coming from.

ProblemType: Bug
DistroRelease: Ubuntu 19.10
Package: linux-image-5.3.0-10-generic 5.3.0-10.11
ProcVersionSignature: Ubuntu 5.3.0-10.11-generic 5.3.0-rc8
Uname: Linux 5.3.0-10-generic x86_64
ApportVersion: 2.20.11-0ubuntu7
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: jbarnett 5046 F.... pulseaudio
CurrentDesktop: GNOME
Date: Sun Sep 22 21:26:02 2019
InstallationDate: Installed on 2019-08-17 (37 days ago)
InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Alpha amd64 (20190305.1)
Lsusb:
 Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 001 Device 003: ID 0489:e0a2 Foxconn / Hon Hai
 Bus 001 Device 004: ID 27c6:5395 HTMicroelectronics Goodix Fingerprint Device
 Bus 001 Device 002: ID 0bda:58f4 Realtek Semiconductor Corp. Integrated_Webcam_HD
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. XPS 15 9575
ProcFB: 0 i915drmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.3.0-10-generic root=UUID=83cade14-7628-437d-8517-36ad82f00d20 ro quiet splash usbcore.dyndbg=+p vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-5.3.0-10-generic N/A
 linux-backports-modules-5.3.0-10-generic N/A
 linux-firmware 1.182
SourcePackage: linux
UpgradeStatus: Upgraded to eoan on 2019-09-18 (4 days ago)
dmi.bios.date: 10/10/2018
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.2.0
dmi.board.name: 0N338G
dmi.board.vendor: Dell Inc.
dmi.board.version: A00
dmi.chassis.type: 10
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.2.0:bd10/10/2018:svnDellInc.:pnXPS159575:pvr:rvnDellInc.:rn0N338G:rvrA00:cvnDellInc.:ct10:cvr:
dmi.product.family: XPS
dmi.product.name: XPS 15 9575
dmi.product.sku: 080D
dmi.sys.vendor: Dell Inc.

Revision history for this message
Joe Barnett (thejoe) wrote :
Revision history for this message
Joe Barnett (thejoe) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joe Barnett (thejoe) wrote :

seems this gets triggered by the dolphin emulator, but not under disco kernel 5.0.0-25 with an otherwise eoan system.

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Can you please perform a kernel bisection?

Revision history for this message
Joe Barnett (thejoe) wrote :
Download full text (4.7 KiB)

899fbde1464639e3d12eaffdad8481a59b367fcb is the first bad commit
commit 899fbde1464639e3d12eaffdad8481a59b367fcb
Author: Philip Yang <email address hidden>
Date: Thu Dec 13 15:35:28 2018 -0500

    drm/amdgpu: replace get_user_pages with HMM mirror helpers

    Use HMM helper function hmm_vma_fault() to get physical pages backing
    userptr and start CPU page table update track of those pages. Then use
    hmm_vma_range_done() to check if those pages are updated before
    amdgpu_cs_submit for gfx or before user queues are resumed for kfd.

    If userptr pages are updated, for gfx, amdgpu_cs_ioctl will restart
    from scratch, for kfd, restore worker is rescheduled to retry.

    HMM simplify the CPU page table concurrent update check, so remove
    guptasklock, mmu_invalidations, last_set_pages fields from
    amdgpu_ttm_tt struct.

    HMM does not pin the page (increase page ref count), so remove related
    operations like release_pages(), put_page(), mark_page_dirty().

    Signed-off-by: Philip Yang <email address hidden>
    Reviewed-by: Felix Kuehling <email address hidden>
    Reviewed-by: Christian König <email address hidden>
    Signed-off-by: Alex Deucher <email address hidden>

:040000 040000 0c9f0e2e82e5e4d2d3a4c0daea22eb911244b771 fdcdc7c80f5383486962edf4561e205b55bd8c21 M drivers

$ git bisect log
# bad: [f74c2bb98776e2de508f4d607cd519873065118e] Linux 5.3-rc8
# good: [1c163f4c7b3f621efff9b28a47abb36f7378d783] Linux 5.0
git bisect start 'v5.3-rc8' 'v5.0'
# good: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
git bisect good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c
# good: [a2d635decbfa9c1e4ae15cb05b68b2559f7f827c] Merge tag 'drm-next-2019-05-09' of git://anongit.freedesktop.org/drm/drm
git bisect good a2d635decbfa9c1e4ae15cb05b68b2559f7f827c
# good: [8f6ccf6159aed1f04c6d179f61f6fb2691261e84] Merge tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
git bisect good 8f6ccf6159aed1f04c6d179f61f6fb2691261e84
# good: [8f6ccf6159aed1f04c6d179f61f6fb2691261e84] Merge tag 'clone3-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
git bisect good 8f6ccf6159aed1f04c6d179f61f6fb2691261e84
# bad: [be8454afc50f43016ca8b6130d9673bdd0bd56ec] Merge tag 'drm-next-2019-07-16' of git://anongit.freedesktop.org/drm/drm
git bisect bad be8454afc50f43016ca8b6130d9673bdd0bd56ec
# bad: [be8454afc50f43016ca8b6130d9673bdd0bd56ec] Merge tag 'drm-next-2019-07-16' of git://anongit.freedesktop.org/drm/drm
git bisect bad be8454afc50f43016ca8b6130d9673bdd0bd56ec
# good: [d72619706abc4aa7e540ea882dae883cee7cc3b3] Merge tag 'tty-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
git bisect good d72619706abc4aa7e540ea882dae883cee7cc3b3
# bad: [83145f110eb2ada9d54fcbcf416c02de126381c1] drm/amdgpu: don't invalidate caches in RELEASE_MEM, only do the writeback
git bisect bad 83145f110eb2ada9d54fcbcf416c02de126381c1
# bad: [b239c01727459ba08c44b79e6225d3c58723f282] drm/amdgpu: add mcbp driver parameter
git bisect bad b239c01727459ba08c44b79e6225d3c58723f282
# good: [e1dc68a4b149d47536cd001d0d0abad...

Read more...

Revision history for this message
Joe Barnett (thejoe) wrote :

(also, still an issue with 5.3.0-13)

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

Since your email address isn't disclosed, please raise the issue to amdgpu mailing list, <email address hidden>.

Revision history for this message
Joe Barnett (thejoe) wrote :
Revision history for this message
Joe Barnett (thejoe) wrote :
Revision history for this message
Joe Barnett (thejoe) wrote :
Revision history for this message
Joe Barnett (thejoe) wrote :
Joe Barnett (thejoe)
tags: added: patch-accepted-upstream
Revision history for this message
Joe Barnett (thejoe) wrote :
Revision history for this message
Joe Barnett (thejoe) wrote :

confirmed fixed in 5.3.0-24.26

Changed in linux (Ubuntu):
status: Confirmed → Fix Committed
Joe Barnett (thejoe)
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.