Zun

Stats report diferencies.

Bug #1989792 reported by tomas cribb
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Zun
Fix Released
Undecided
Durga Malleswari Varanasi

Bug Description

There is a difference between "openstack appcontainer stats <id>" and "docker stats <id>" in memory value. In all host that I tried I can see that zun reports more memory usaje than docker.

openstack appcontainer stats f4574038-07b3-472d-9c1a-b05043bb5769
+----------------+----------------------+
| Field | Value |
+----------------+----------------------+
| CONTAINER | truecommand01 |
| CPU % | 0.014663233183911304 |
| MEM USAGE(MiB) | 250.3046875 |
| MEM LIMIT(MiB) | 512.0 |
| MEM % | 48.88763427734375 |
| BLOCK I/O(B) | 125993472/8680960 |
| NET I/O(B) | 844127705/27613922 |
+----------------+----------------------+

CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
6c077faea1f0 zun-f4574038-07b3-472d-9c1a-b05043bb5769 1.71% 130.5MiB / 512MiB 25.48% 849MB / 27.8MB 126MB / 8.74MB 216

In this case, it is noted zun reports 250MiB (~48%) and docker 130.5MiB (~25%)

Changed in zun:
assignee: nobody → Durga Malleswari Varanasi (durga1)
Revision history for this message
Durga Malleswari Varanasi (durga1) wrote :

Hi Tomas,

I have gone through the bug and able to reproduce the same.
Just to add the fact that the difference in the stats is less when compared to that of your output.
Please find the output as below:
openstack@zed:~$ docker stats 01547ac35f1f
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
01547ac35f1f zun-5af4d461-b216-4615-8da3-b5a5750b25a0 0.07% 1.188MiB / 15.63GiB 0.01% 299MB / 270MB 1.25MB / 65.5kB 4
^C
openstack@zed:~$ openstack appcontainer stats container
+----------------+----------------------+
| Field | Value |
+----------------+----------------------+
| CONTAINER | container |
| CPU % | 0.009308632892915485 |
| MEM USAGE(MiB) | 1.37890625 |
| MEM LIMIT(MiB) | 16000.5234375 |
| MEM % | 0.008617882129832668 |
| BLOCK I/O(B) | 0/0 |
| NET I/O(B) | 298766635/269694133 |
+----------------+----------------------+

could you mention the version of OpenStack and Zun you are using?

Regards,
Malleswari

Revision history for this message
tomas cribb (tomascribb) wrote : Re: [Bug 1989792] Re: Stats report diferencies.

Hi Durga,
Thank you for your reply. In the controller I'm running zun 3.6.1, and my
Openstack version is Stein.
In a compute I'm running Ubuntu 20.04. I installed the componentes manually
from git.
If you need some other info just let me know.
Also if I can help you to correct this, I would be happy.

Best regards.

El lun, 21 nov 2022 a las 9:40, Durga Malleswari Varanasi (<
<email address hidden>>) escribió:

> Hi Tomas,
>
> I have gone through the bug and able to reproduce the same.
> Just to add the fact that the difference in the stats is less when
> compared to that of your output.
> Please find the output as below:
> openstack@zed:~$ docker stats 01547ac35f1f
> CONTAINER ID NAME CPU % MEM
> USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
> 01547ac35f1f zun-5af4d461-b216-4615-8da3-b5a5750b25a0 0.07%
> 1.188MiB / 15.63GiB 0.01% 299MB / 270MB 1.25MB / 65.5kB 4
> ^C
> openstack@zed:~$ openstack appcontainer stats container
> +----------------+----------------------+
> | Field | Value |
> +----------------+----------------------+
> | CONTAINER | container |
> | CPU % | 0.009308632892915485 |
> | MEM USAGE(MiB) | 1.37890625 |
> | MEM LIMIT(MiB) | 16000.5234375 |
> | MEM % | 0.008617882129832668 |
> | BLOCK I/O(B) | 0/0 |
> | NET I/O(B) | 298766635/269694133 |
> +----------------+----------------------+
>
> could you mention the version of OpenStack and Zun you are using?
>
> Regards,
> Malleswari
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1989792
>
> Title:
> Stats report diferencies.
>
> Status in Zun:
> New
>
> Bug description:
> There is a difference between "openstack appcontainer stats <id>" and
> "docker stats <id>" in memory value. In all host that I tried I can
> see that zun reports more memory usaje than docker.
>
> openstack appcontainer stats f4574038-07b3-472d-9c1a-b05043bb5769
> +----------------+----------------------+
> | Field | Value |
> +----------------+----------------------+
> | CONTAINER | truecommand01 |
> | CPU % | 0.014663233183911304 |
> | MEM USAGE(MiB) | 250.3046875 |
> | MEM LIMIT(MiB) | 512.0 |
> | MEM % | 48.88763427734375 |
> | BLOCK I/O(B) | 125993472/8680960 |
> | NET I/O(B) | 844127705/27613922 |
> +----------------+----------------------+
>
> CONTAINER ID NAME CPU % MEM
> USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
> 6c077faea1f0 zun-f4574038-07b3-472d-9c1a-b05043bb5769 1.71%
> 130.5MiB / 512MiB 25.48% 849MB / 27.8MB 126MB / 8.74MB 216
>
>
> In this case, it is noted zun reports 250MiB (~48%) and docker 130.5MiB
> (~25%)
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/zun/+bug/1989792/+subscriptions
>
>

Revision history for this message
tomas cribb (tomascribb) wrote :

Hi Durga, I found the problem.
The difference is the way that docker CLI calculates the memory usage, it subtracts cache usage. This is documented here: https://docs.docker.com/engine/reference/commandline/stats/#:~:text=On%20Linux%2C%20the%20Docker%20CLI,use%20the%20data%20as%20needed.
On the other hand, the docker API reports all memory (usage + cache) and other value (total_inactive_file) with cache value.
This is de API Output:
"memory_stats": {
    "usage": 359535108096,
    "max_usage": 429496983552,
    "stats": {
      "active_anon": 139756539904,
      "active_file": 77813866496,
      "cache": 343939633152,
      "dirty": 70516736,
      "hierarchical_memory_limit": 549755813888,
      "hierarchical_memsw_limit": 644245094400,
      "inactive_anon": 8037867520,
      "inactive_file": 126423769088,
      "mapped_file": 96543813632,
      "pgfault": 340022432683,
      "pgmajfault": 1052181,
      "pgpgin": 89765729104,
      "pgpgout": 89679783198,
      "rss": 8026935296,
      "rss_huge": 0,
      "total_active_anon": 139756539904,
      "total_active_file": 77813866496,
      "total_cache": 343939616768,
      "total_dirty": 70504448,
      "total_inactive_anon": 8037560320,
      "total_inactive_file": 126423752704,
      "total_mapped_file": 96543813632,
      "total_pgfault": 340022432078,
      "total_pgmajfault": 1052181,
      "total_pgpgin": 89765729024,
      "total_pgpgout": 89679783198,
      "total_rss": 8026632192,
      "total_rss_huge": 0,
      "total_unevictable": 0,
      "total_writeback": 0,
      "unevictable": 0,
      "writeback": 8192
    },
    "failcnt": 112842674,
    "limit": 549755813888

Maybe it could be changed memory calculation in:
driver.py -> def stats(self, context, container)

mem_usage = (res['memory_stats']['usage'] - res['memory_stats']['stats']['total_inactive_file'])/ 1024 / 1024

Regards,
Tomas

Revision history for this message
Durga Malleswari Varanasi (durga1) wrote :

Hi tomas,

Thanks for the analysis.
Yes I have gone through the documentation. However, when I trigger the docker stats rest API (GET /v1.21/containers/<container_name>/stats), I see this response:
    "memory_stats": {
        "usage": 1007616,
        "stats": {
            "active_anon": 4096,
            "active_file": 8192,
            "anon": 86016,
            "anon_thp": 0,
            "file": 651264,
            "file_dirty": 0,
            "file_mapped": 466944,
            "file_writeback": 0,
            "inactive_anon": 81920,
            "inactive_file": 643072,
            "kernel_stack": 16384,
            "pgactivate": 2,
            "pgdeactivate": 0,
            "pgfault": 1183,
            "pglazyfree": 0,
            "pglazyfreed": 0,
            "pgmajfault": 9,
            "pgrefill": 0,
            "pgscan": 0,
            "pgsteal": 0,
            "shmem": 0,
            "slab": 189848,
            "slab_reclaimable": 102576,
            "slab_unreclaimable": 87272,
            "sock": 0,
            "thp_collapse_alloc": 0,
            "thp_fault_alloc": 0,
            "unevictable": 0,
            "workingset_activate": 0,
            "workingset_nodereclaim": 0,
            "workingset_refault": 0
        },
        "limit": 16777801728

I would like to add that we are not getting "total_inactive_file" field.
There is one more field inactive_file which is reducing the delta in stats but not the exec values.
Could you specify the API call you used to fetch the output mentioned?

Regards,
Malleswari

Revision history for this message
tomas cribb (tomascribb) wrote :

Hi Durga,
Thank you for your reply.

I'm using "curl -s --unix-socket /var/run/docker.sock 'http://localhost/containers/c42e2d65acea/stats?stream=false&one-shot=true' | jq .memory_stats"

My host is Ubuntu 20.04 and this my docker version:

Client:
 Version: 20.10.7
 API version: 1.41
 Go version: go1.13.8
 Git commit: 20.10.7-0ubuntu1~20.04.2
 Built: Fri Oct 1 14:07:06 2021
 OS/Arch: linux/amd64
 Context: default
 Experimental: true

Server:
 Engine:
  Version: 20.10.7
  API version: 1.41 (minimum version 1.12)
  Go version: go1.13.8
  Git commit: 20.10.7-0ubuntu1~20.04.2
  Built: Fri Oct 1 03:27:17 2021
  OS/Arch: linux/amd64
  Experimental: false
 containerd:
  Version: 1.5.2-0ubuntu1~20.04.3
  GitCommit:
 runc:
  Version: 1.0.0~rc95-0ubuntu1~20.04.2
  GitCommit:
 docker-init:
  Version: 0.19.0
  GitCommit:

Regards,
Tomás.

Revision history for this message
Durga Malleswari Varanasi (durga1) wrote :

Hi Tomas,

I see that my docker host config as below:
openstack@zed:~/devstack$ docker version
Client: Docker Engine - Community
 Version: 20.10.21
 API version: 1.41
 Go version: go1.18.7
 Git commit: baeda1f
 Built: Tue Oct 25 18:01:58 2022
 OS/Arch: linux/amd64
 Context: default
 Experimental: true

Server: Docker Engine - Community
 Engine:
  Version: 20.10.21
  API version: 1.41 (minimum version 1.12)
  Go version: go1.18.7
  Git commit: 3056208
  Built: Tue Oct 25 17:59:49 2022
  OS/Arch: linux/amd64
  Experimental: false
 containerd:
  Version: 1.6.9
  GitCommit: 1c90a442489720eec95342e1789ee8a5e1b9536f
 runc:
  Version: 1.1.4
  GitCommit: v1.1.4-0-g5fd4c4d
 docker-init:
  Version: 0.19.0
  GitCommit: de40ad0

and the output for the REST call is as below:
openstack@zed:~/devstack$ curl -s --unix-socket /var/run/docker.sock 'http://localhost/containers/01547ac35f1f/stats?stream=false&one-shot=true' | jq .memory_stats
{
  "usage": 1007616,
  "stats": {
    "active_anon": 4096,
    "active_file": 8192,
    "anon": 86016,
    "anon_thp": 0,
    "file": 651264,
    "file_dirty": 0,
    "file_mapped": 466944,
    "file_writeback": 0,
    "inactive_anon": 81920,
    "inactive_file": 643072,
    "kernel_stack": 16384,
    "pgactivate": 2,
    "pgdeactivate": 0,
    "pgfault": 1183,
    "pglazyfree": 0,
    "pglazyfreed": 0,
    "pgmajfault": 9,
    "pgrefill": 0,
    "pgscan": 0,
    "pgsteal": 0,
    "shmem": 0,
    "slab": 189848,
    "slab_reclaimable": 102576,
    "slab_unreclaimable": 87272,
    "sock": 0,
    "thp_collapse_alloc": 0,
    "thp_fault_alloc": 0,
    "unevictable": 0,
    "workingset_activate": 0,
    "workingset_nodereclaim": 0,
    "workingset_refault": 0
  },
  "limit": 16777801728
}

Would it be okay If we go ahead with the inactive_file param?

Kindly comment

Best Regards,
malleswari

Changed in zun:
status: New → In Progress
Revision history for this message
tomas cribb (tomascribb) wrote :

Hi Durga,
I think that the difference is the cgroup version that we are using, I'm using cgroup 1, is the default, I think because I install docker following the docker documentation (https://docs.docker.com/engine/install/ubuntu/).
From my point of view the best solution could be detect which cgroup versión the host is running.
Something like did oficial docker repo (https://github.com/docker/cli/blob/20.10/cli/command/container/stats_helpers.go#L239)

func calculateMemUsageUnixNoCache(mem types.MemoryStats) float64 {
 // cgroup v1
 if v, isCgroup1 := mem.Stats["total_inactive_file"]; isCgroup1 && v < mem.Usage {
  return float64(mem.Usage - v)
 }
 // cgroup v2
 if v := mem.Stats["inactive_file"]; v < mem.Usage {
  return float64(mem.Usage - v)
 }
 return float64(mem.Usage)
}

What do you think??

Best regards.
Tomás.

Revision history for this message
Durga Malleswari Varanasi (durga1) wrote :

Hi Tomas,

Thanks for the reply.
I have also followed the same installation guide in one of other hosts and I see that cgroup 2 is enabled there as well. I am not sure from where this config is been picked up.
Also, I would like to check for the param that exists and based on that we can move forward.

Something like this:
check whichever is present "total_inactive_file" or "inactive_file" and perform next action.

Kindly let me know your thoughts.

Regards,
Malleswari

Revision history for this message
tomas cribb (tomascribb) wrote :

Hi Durga,
I tested the code in zun/container/docker/driver.py in my enviroment and saw that the call
"res = docker.stats(container.container_id, decode=False, stream=False)"
returns a dict with both elements:
res['memory_stats']['stats']['inactive_file']
res['memory_stats']['stats']['total_inactive_file']
whit the same value.
I don't know if with cgroup2 we have the same behavior, but a general solution could be somthing like that:

*******************
def stats(self, context, container):
        with docker_utils.docker_client() as docker:
            res = docker.stats(container.container_id, decode=False,
                               stream=False)

            if 'total_inactive_file' in res['memory_stats']['stats']:
                mem_usage = (res['memory_stats']['usage'] - res['memory_stats']['stats']['total_inactive_file'])/ 1024 / 1024
            elif 'inactive_file' in res['memory_stats']['stats']:
                mem_usage = (res['memory_stats']['usage'] - res['memory_stats']['stats']['inactive_file'])/ 1024 / 1024

            cpu_usage = res['cpu_stats']['cpu_usage']['total_usage']
            system_cpu_usage = res['cpu_stats']['system_cpu_usage']
            cpu_percent = float(cpu_usage) / float(system_cpu_usage) * 100

            mem_limit = res['memory_stats']['limit'] / 1024 / 1024
            mem_percent = float(mem_usage) / float(mem_limit) * 100
*******************

What do you think?

Revision history for this message
Durga Malleswari Varanasi (durga1) wrote :

yes Tom,

My intent is same and my patch is somewhat similar to this.
Coming to the fields from the docker response. for Cgroup2 I am getting the only one field.
Hence I proposed the same approach in my previous comment.
I will be pushing the patch in sometime.

Revision history for this message
tomas cribb (tomascribb) wrote :

Thank you!!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to zun (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/zun/+/880764

Changed in zun:
status: In Progress → Fix Committed
Revision history for this message
Durga Malleswari Varanasi (durga1) wrote :

Hi Tomas,

The change has been pushed to opendev.org for review. Please find the same.

Regards,
Malleswari

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to zun (master)

Reviewed: https://review.opendev.org/c/openstack/zun/+/880764
Committed: https://opendev.org/openstack/zun/commit/4fa358474ee337f27bfaf8b98e886cc8d10ada50
Submitter: "Zuul (22348)"
Branch: master

commit 4fa358474ee337f27bfaf8b98e886cc8d10ada50
Author: Malleswari Varanasi <email address hidden>
Date: Tue Apr 18 08:43:46 2023 -0700

    Stats report diferencies

    There is a difference between "openstack appcontainer stats <id>"
    and "docker stats <id>" in memory value.
    Cache Usage should be removed from the memory usage while calculating
    which is present in docker SDK.
    This patch contains the Cgroup V1 and V2 handling for Memory stats

    Closes-Bug: #1989792
    Change-Id: I4f1d9b738ee5de176b6e0ef69593363f2977f07a

Changed in zun:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/zun 12.0.0.0rc1

This issue was fixed in the openstack/zun 12.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.