cinder-backup uses surprisingly high memory

Bug #1908805 reported by norman shen
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Triaged
Medium
Unassigned

Bug Description

In one of our production env, I saw cinder-backup uses around 4G of rss memory, which is surprising because cinder-backup daemon is pretty idle during the time I check the metrics.

```console
# ps -eo pid,comm,rss --sort -rss
    PID COMMAND RSS
  53650 mysqld 5784744
1683253 beam.smp 4097472
  30189 cinder-backup 3058148
```

The cinder-backup version is rocky and deployed in a k8s cluster. there are 3 cinder backup service deployed.

tags: added: backup-service
Changed in cinder:
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
norman shen (jshen28) wrote :

Any advices on how to find out what is actually using those memory? Current solution is still rebooting cinder-backup service.

Revision history for this message
Rajat Dhasmana (whoami-rajat) wrote :

Hi norman,

Can you tell which backend is used for backup and which backend for volumes?

Revision history for this message
Gorka Eguileor (gorka) wrote :

The cause for the high RSS even when idle is a high water mark memory issue caused by the per native thread glibc malloc arena.

I'm proposing a devstack patch to fix it https://review.opendev.org/c/openstack/devstack/+/845805 and a tripleo one as well https://review.opendev.org/c/openstack/tripleo-common/+/845807

You can try it in your deployment yourself running the cinder-backup service with those environmental variables.

Revision history for this message
norman shen (jshen28) wrote :

thank you very much for the input. I will try it asap. Besides, may I have some references explaining its meanings? And are those tunables available for all glibc versions? what would be the potential tradeoff for those tunables? thank you again and looking forward to the reply.

Revision history for this message
norman shen (jshen28) wrote :

Hi Rajat,

sorry for the delay. we are using swift backend to talk to a ceph rgw cluster. we do have customized
the code.

best,
Norman

Revision history for this message
Gorka Eguileor (gorka) wrote :

Description of the tunable parameters can be seen on the glibc manual: https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html

I don't know the exact versions where each configuration parameter was added and fully supported, but I see that the M_ARENA_MAX environmental variable was added back in 2009 (https://sourceware.org/git/?p=glibc.git;a=commit;h=425ce2edb9d11cc1ff650fac16dfbc450241896a), so probably in glibc version 2.10

In my opinion there should be no trade-off for Python programs, since the independent arenas are meant to reduce congestion (due to the locking) for multi-threaded applications that do a considerable amount of memory reserve and free operations, but in Python we have the GIL, so we are already suffering the penalties of the locking and won't benefit from that performance improvement.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.