healthcheck_curl() causes massive dentry cache growth

Bug #1805656 reported by Nagasai Vinaykumar Kapalavai
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Nagasai Vinaykumar Kapalavai

Bug Description

Description of problem:
Since NSS_SDB_USE_CACHE=no is not set before calling curl in the container health checks, the dentry cache on controller nodes grows continually. On controllers with a larger amount of RAM, this can lead to soft lockups once memory pressure forces a reclamation of the extraneous cache entries.

See RHBZ 1044666 for background.

Version-Release number of selected component (if applicable):
13.0

How reproducible:
Run the following systemtap script on a containerized OSP13 controller:

probe kernel.function("d_alloc").return { log(reverse_path_walk($return)) }

Observe many repeated calls referencing lib/docker/overlay2/<some_id>/diff/etc/pki/nssdb/.<some_number>_dOeSnotExist_.db

Steps to Reproduce:
1. Deploy an OSP13 environment with containerized control plane
2. Run the systemtap script above for 1 minute
3. Observe many calls for nonexistent NSS DB files
4. Run the following to add NSS_SDB_USE_CACHE=no to the healthcheck function:

 docker ps -q | xargs -I {} docker exec -u root {} sed -i '/^healthcheck_curl/a \ \ export NSS_SDB_USE_CACHE=no' /usr/share/openstack-tripleo-common/healthcheck/common.sh

5. Re-run the systemtap script
6. Observe a large reduction (90%+) in dentry cache calls over 1 minute

Actual results:
[root@ctl01 ~]# stap -o test1.out -T 60 dentry.stap
[root@ctl01 ~]# wc -l test1.out
186526 test1.out
[root@ctl01 ~]# grep dOeSnotExist test1.out | wc -l
158649

Expected results:
[root@ctl01 ~]# stap -o test2.out -T 60 dentry.stap
[root@ctl01 ~]# wc -l test2.out
17544 test2.out
[root@ctl01 ~]# grep dOeSnotExist test2.out | wc -l
0

Changed in tripleo:
assignee: nobody → Nagasai Vinaykumar Kapalavai (vinaykns2)
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
milestone: none → stein-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.openstack.org/620649

Changed in tripleo:
status: Triaged → In Progress
tags: added: queens-backport-potential rocky-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/620649
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=a719804ef29b098fabb91a0b5d98ea09b5f526e9
Submitter: Zuul
Branch: master

commit a719804ef29b098fabb91a0b5d98ea09b5f526e9
Author: Nagasai Vinaykumar Kapalavai <email address hidden>
Date: Wed Nov 28 16:21:33 2018 +0000

    Stops growth of massive dentry cache growth

    Change-Id: I0cb84edcd9e46ab1e9b311298d6ce9d2a2b21766
    Closes-Bug: #1805656

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 10.2.0

This issue was fixed in the openstack/tripleo-common 10.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/636660

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/638051

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/rocky)

Reviewed: https://review.openstack.org/638051
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=1e63f067d1e099fb023f11f5cfdde775efdcbb5d
Submitter: Zuul
Branch: stable/rocky

commit 1e63f067d1e099fb023f11f5cfdde775efdcbb5d
Author: Nagasai Vinaykumar Kapalavai <email address hidden>
Date: Wed Nov 28 16:21:33 2018 +0000

    Stops growth of massive dentry cache growth

    Change-Id: I0cb84edcd9e46ab1e9b311298d6ce9d2a2b21766
    Closes-Bug: #1805656
    (cherry picked from commit a719804ef29b098fabb91a0b5d98ea09b5f526e9)

tags: added: in-stable-rocky
tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/queens)

Reviewed: https://review.openstack.org/636660
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=5c398a07a61c29df61e46bcefd4b195d4417f8ec
Submitter: Zuul
Branch: stable/queens

commit 5c398a07a61c29df61e46bcefd4b195d4417f8ec
Author: Nagasai Vinaykumar Kapalavai <email address hidden>
Date: Wed Nov 28 16:21:33 2018 +0000

    Stops growth of massive dentry cache growth

    Change-Id: I0cb84edcd9e46ab1e9b311298d6ce9d2a2b21766
    Closes-Bug: #1805656
    (cherry picked from commit a719804ef29b098fabb91a0b5d98ea09b5f526e9)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 9.6.0

This issue was fixed in the openstack/tripleo-common 9.6.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 8.7.0

This issue was fixed in the openstack/tripleo-common 8.7.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.