Justify default vm.vfs_cache_pressure: 1 value or increase

Bug #1770171 reported by Bryan Quigley
36
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Fix Released
Medium
Unassigned

Bug Description

We've had multiple reports of ceph osds swapping out due to the use of slab likely caused by vm.vfs_cache_pressure: 1.

The default value of 1 was set in [1] but it wasn't justified at the time - I think it should be or likely increased by some amount.

[1] https://github.com/openstack/charm-ceph-osd/commit/79c6c286498e8235e5d17c17e1a1d63bb0e21259

Revision history for this message
James Page (james-page) wrote :

Well I guess the rationale is now lost in history - if we want to change this, lets justify why here and then we can tweak the defaults with a helpful comment.

Revision history for this message
James Page (james-page) wrote :

FWIW I'd prefer to see OpenStack/Ceph deployments on MAAS without swap enabled... its a source of unhelpful performance issues as illustrated by this bug report.

Changed in charm-ceph-osd:
status: New → Incomplete
importance: Undecided → Medium
Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

Setting vfs_cache_pressure to 1 for all cases is likely to cause excessive memory usage in the dentry and inode caches for most workloads. For most uses, the default value of 100 is reasonable.

The vfs_cache_pressure value specifies the percentage of objects in each of the "dentry" and "inode_entry" slab caches used by filesystems that will be viewed as "freeable" by the slab shrinking logic. Some other variables also adjust the actual number of objects that the kernel will try to free, but for the freeable quantity, a vfs_cache_pressure of 100 will attempt to free 100 times as many objects in a cache as a setting of 1. Similarly, a vfs_cache_pressure of 200 will attempt to free twice as many as a setting of 100.

This only comes into play when the kernel has entered reclaim, i.e., it is trying to free cached objects in order to make space to satisfy an allocation that would otherwise fail (or an allocation has already failed or watermarks have been reached and this is occurring asynchronously). By setting vfs_cache_pressure to 1, the kernel will disproportionately reclaim pages from the page cache instead of from the dentry/inode caches, and those will grow with almost no bound (if vfs_cache_pressure is 0, they will literally grow without bound until memory is exhausted).

If the system as a whole has a low cache hit ratio on the objects in the dentry and inode caches, they will simply consume memory that is kept idle, and force out page cache pages (file data, block data and anonymous pages). Eventually, the system will resort to swapping of pages and if all else fails to killing processes to free memory. With very low vfs_cache_pressure values, it is more likely that processes will be killed to free memory before dentry / inode cache objects are released.

I'll note here that the charm commit in question also sets vm.swappiness to 1, which is unlikely to be ideal as a default, either. This will heavily favor (ratio 1:199) releasing file backed pages over writing anonymous pages to swap ("swapping" a file backed page just frees the page, as it can be re-read from its backing file). So, this would, e.g., favor keeping almost all process anonymous pages (stack, heap, etc), even for idle processes, in memory over keeping file backed pages in the page cache. This one is not as actively harmful for most cases as vfs_cache_pressure = 1, though. As the rationale for this change is not recorded in the commit log, either, it's unclear what the goal was without actual data.

Changed in charm-ceph-osd:
status: Incomplete → New
Revision history for this message
Bryan Quigley (bryanquigley) wrote :

@james-page
I propose we drop setting either vm.swappiness or vfs_cache_pressure so they go to the system defaults. Thoughts?

Revision history for this message
Jay Vosburgh (jvosburgh) wrote :

Bryan, presumably you mean both and not either.

Revision history for this message
Bryan Quigley (bryanquigley) wrote :

Oops. yup I meant drop configuring both vm.swappiness and vfs_cache_pressure so they go the defaults.

Revision history for this message
Bryan Quigley (bryanquigley) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master)
Download full text (3.3 KiB)

Reviewed: https://review.openstack.org/570582
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=3527bf4ae1723a10f49774fef646aaa5b9fc0c45
Submitter: Zuul
Branch: master

commit 3527bf4ae1723a10f49774fef646aaa5b9fc0c45
Author: Bryan Quigley <email address hidden>
Date: Fri May 25 10:07:13 2018 -0400

    Removes vm.swappiness and vfs_cache_pressure

    They were both set at 1 in the same commit without justification
    and both can be bad things to set that low. This commit will
    just let the kernel defaults come through.

    Details on how bad it is set to these to 1 courtesy of Jay Vosburgh.

    vfs_cache_pressure
    Setting vfs_cache_pressure to 1 for all cases is likely to cause
    excessive memory usage in the dentry and inode caches for most
    workloads. For most uses, the default value of 100 is reasonable.

    The vfs_cache_pressure value specifies the percentage of objects
    in each of the "dentry" and "inode_entry" slab caches used by
    filesystems that will be viewed as "freeable" by the slab shrinking
    logic. Some other variables also adjust the actual number of objects
    that the kernel will try to free, but for the freeable quantity,
    a vfs_cache_pressure of 100 will attempt to free 100 times as many
    objects in a cache as a setting of 1. Similarly, a vfs_cache_pressure
    of 200 will attempt to free twice as many as a setting of 100.

    This only comes into play when the kernel has entered reclaim,
    i.e., it is trying to free cached objects in order to make space to
    satisfy an allocation that would otherwise fail (or an allocation
    has already failed or watermarks have been reached and this is
    occurring asynchronously). By setting vfs_cache_pressure to 1,
    the kernel will disproportionately reclaim pages from the page
    cache instead of from the dentry/inode caches, and those will
    grow with almost no bound (if vfs_cache_pressure is 0, they will
    literally grow without bound until memory is exhausted).

    If the system as a whole has a low cache hit ratio on the objects
    in the dentry and inode caches, they will simply consume memory
    that is kept idle, and force out page cache pages (file data,
    block data and anonymous pages). Eventually, the system will resort
    to swapping of pages and if all else fails to killing processes to
    free memory. With very low vfs_cache_pressure values, it is more
    likely that processes will be killed to free memory before
    dentry / inode cache objects are released.

    We have had several customers alleviate problems be setting
    thus value back to the defaults - or having to make them
    higher to clean things up after being at 1 for so long.

    vm.swappiness
    Setting this to 1 will heavily favor (ratio 1:199) releasing file
    backed pages over writing anonymous pages to swap ("swapping" a
    file backed page just frees the page, as it can be re-read from
    its backing file). So, this would, e.g., favor keeping almost all
    process anonymous pages (stack, heap, etc), even for idle processes,
    in memory over keeping file backed pa...

Read more...

Changed in charm-ceph-osd:
status: New → Fix Committed
David Ames (thedac)
Changed in charm-ceph-osd:
milestone: none → 18.08
James Page (james-page)
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.