Unable to diagnose where docker filesystem usage is going

Bug #1977750 reported by Jim Gauld
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jim Gauld

Bug Description

Brief Description
-----------------
Cannot diagnose from collect logs where docker file-system usage is going.

The existing data in collect has overall filesystem usage, various crictl commands, and docker images commands, but is not detailed enough to see the remnants from an initial install. This prevents simple debugging of lab issues and field issues when we exceed file-system alarm threshold of 80% for /var/lib/docker.

Desire the addition of the following two commands to collect containerization_images.info :
docker system df
du -h --max-depth 1 /var/lib/docker

If an admin manually does docker operations to pull in images or other, it will show in the historical docker filesystem /var/lib/docker/overlay2 and /var/lib/docker/x instead of /var/lib/docker/io.containerd.* . This has been demonstrated to chew up many GB of storage that never gets reclaimed without an admin manually removing images via "docker rmi", "crictl rmi", "docker system prune --force", "crictl rmi --prune", etc.

e.g.,
--------------------------------------------------------------------
Fri Jun 3 21:13:09 UTC 2022 : : docker system df
--------------------------------------------------------------------
TYPE TOTAL ACTIVE SIZE RECLAIMABLE
Images 1 0 688.5MB 688.5MB (100%)
Containers 0 0 0B 0B
Local Volumes 0 0 0B 0B
Build Cache 0 0 0B 0B

--------------------------------------------------------------------
Fri Jun 3 21:13:09 UTC 2022 : : du -h --max-depth 1 /var/lib/docker
--------------------------------------------------------------------
0 /var/lib/docker/containers
0 /var/lib/docker/plugins
719M /var/lib/docker/overlay2
2.9M /var/lib/docker/image
24K /var/lib/docker/volumes
0 /var/lib/docker/trust
32K /var/lib/docker/network
0 /var/lib/docker/swarm
16K /var/lib/docker/builder
56K /var/lib/docker/buildkit
0 /var/lib/docker/tmpmounts
1.7G /var/lib/docker/io.containerd.content.v1.content
0 /var/lib/docker/io.containerd.snapshotter.v1.native
5.0G /var/lib/docker/io.containerd.snapshotter.v1.overlayfs
3.9M /var/lib/docker/io.containerd.metadata.v1.bolt
0 /var/lib/docker/io.containerd.runtime.v1.linux
12K /var/lib/docker/io.containerd.runtime.v2.task
664K /var/lib/docker/io.containerd.grpc.v1.cri
0 /var/lib/docker/tmp
0 /var/lib/docker/runtimes
7.4G /var/lib/docker

Severity
--------
Major: Cannot debug field issues when we have docker related file-system alarms.

Steps to Reproduce
------------------
Gather a 'collect'.

Expected Behavior
------------------
See detailed first-level usage under /var/lib/docker .

Actual Behavior
----------------
State what is the actual behavior

Reproducibility
---------------
100 percent reproducible.
Always see docker remnants due to initial install via docker from install of n3000-opae.
Occasionally admins will install other stuff.

System Configuration
--------------------
AIO-DX. Applicable to all configs.

Branch/Pull Time/Commit
-----------------------
BUILD_DATE="2022-06-01 13:08:06 -0400"

Last Pass
---------
No. Day one issue.

Timestamp/Logs
--------------
Collect /var/extra/containerization_images.info has:
docker image ls -a
crictl images
ctr -n k8s.io images list

Collect /var/extra/filesystem.info has:
df -h -H -T --local -t ext2 -t ext3 -t ext4 -t xfs --total
Filesystem Type Size Used Avail Use% Mounted on
/dev/mapper/cgts--vg-docker--lv xfs 33G 8.0G 25G 25% /var/lib/docker

Test Activity
-------------
Feature Testing, Evaluation.

Jim Gauld (jgauld)
Changed in starlingx:
assignee: nobody → Jim Gauld (jgauld)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to utilities (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/utilities/+/844844

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to utilities (master)

Reviewed: https://review.opendev.org/c/starlingx/utilities/+/844844
Committed: https://opendev.org/starlingx/utilities/commit/617d58053b30ce74696bca33bcbea93c5b3a5f36
Submitter: "Zuul (22348)"
Branch: master

commit 617d58053b30ce74696bca33bcbea93c5b3a5f36
Author: Jim Gauld <email address hidden>
Date: Fri Jun 3 14:40:50 2022 -0400

    Add docker disk usage to collect containerization

    This adds 'docker system df' and 'du -h --max-depth 1 /var/lib/docker'
    commands to collect containerization_images.info.

    This gives high level of usage and first-level breakdown. This is
    useful in diagnosing file-system usage of /var/lib/docker if there
    are remnants from initial install, or if an admin does manual docker
    operations which consumes many extraneous GB.

    We expect CRI stuff under: /var/lib/docker/io.containerd.* .

    The containerization_images.info has the following new sections:

    --------------------------------------------------------------------
    Fri Jun 3 21:13:09 UTC 2022 : : docker system df
    --------------------------------------------------------------------
    TYPE TOTAL ACTIVE SIZE RECLAIMABLE
    Images 1 0 688.5MB 688.5MB (100%)
    Containers 0 0 0B 0B
    Local Volumes 0 0 0B 0B
    Build Cache 0 0 0B 0B

    --------------------------------------------------------------------
    Fri Jun 3 21:13:09 UTC 2022 : : du -h --max-depth 1 /var/lib/docker
    --------------------------------------------------------------------
    0 /var/lib/docker/containers
    0 /var/lib/docker/plugins
    719M /var/lib/docker/overlay2
    2.9M /var/lib/docker/image
    24K /var/lib/docker/volumes
    0 /var/lib/docker/trust
    32K /var/lib/docker/network
    0 /var/lib/docker/swarm
    16K /var/lib/docker/builder
    56K /var/lib/docker/buildkit
    0 /var/lib/docker/tmpmounts
    1.7G /var/lib/docker/io.containerd.content.v1.content
    0 /var/lib/docker/io.containerd.snapshotter.v1.native
    5.0G /var/lib/docker/io.containerd.snapshotter.v1.overlayfs
    3.9M /var/lib/docker/io.containerd.metadata.v1.bolt
    0 /var/lib/docker/io.containerd.runtime.v1.linux
    12K /var/lib/docker/io.containerd.runtime.v2.task
    664K /var/lib/docker/io.containerd.grpc.v1.cri
    0 /var/lib/docker/tmp
    0 /var/lib/docker/runtimes
    7.4G /var/lib/docker

    It is likely that an admin requires manual cleanup using commands like:
    docker rmi x
    docker system prune --force

    An admin may also cleanup the CRI but that is already governed by kubelet
    image garbage collection file-system usage thresholds.
    crictl rmi x
    crictl rmi --prune

    TESTING:
    AIO-SX: Gather collect and inspect containerization_images.info

    Closes-Bug: 1977750

    Signed-off-by: Jim Gauld <email address hidden>
    Change-Id: I468d5ebd18ad72385d74be7bee614fbc2cbb1e99

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.7.0 stx.tools
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.