CC13: Potential memory leak issue with 5.0.1 kmod vRouter

Bug #1804399 reported by Steven Sciriha
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
New
Undecided
Sivakumar Ganapathy

Bug Description

Issue Description:

We are investigating an issue with our kernel compute nodes where any running workloads (including virtualised, all userspace processes and kmod vRouter) slowly consume more and more host memory, which the kernel then allocates to the buffer/cache mem pools leaving little to no memory allocated to the "free" pool.

#free -m

total used free shared buff/cache available
Mem: 257084 35239 699 3 221145 207
Swap: 0 0 0

This is to be expected under normal operation but issues arise when the virtualised workloads require more memory and libvirt attempts to allocate more host memory but finds none available which triggers OOM error and the VM then crashes.

The expectation is that kernel should release memory from the buff/cache pool to the libvirt but this does not seem to be the case

libvirt log:

Nov 16 04:53:16 overcloud63m-comp-3 kernel: [691744] 0 691744 876 40 6 0 0 sh
Nov 16 04:53:16 overcloud63m-comp-3 kernel: Out of memory: Kill process 258272 (qemu-kvm) score 128 or sacrifice child
Nov 16 04:53:16 overcloud63m-comp-3 kernel: Killed process 258272 (qemu-kvm) total-vm:34876400kB, anon-rss:33825876kB, file-rss:596kB, shmem-rss:20kB
Nov 16 04:53:16 overcloud63m-comp-3 journal: 2018-11-16 09:53:16.368+0000: 240077: warning : qemuGetProcessInfo:1460 : cannot parse process status data
Nov 16 04:53:16 overcloud63m-comp-3 journal: 2018-11-16 09:53:16.368+0000: 240077: error : virProcessGetAffinity:506 : cannot get CPU affinity of process 258311: No such process
Nov 16 04:53:16 overcloud63m-comp-3 journal: 2018-11-16 09:53:16.380+0000: 240079: warning : qemuGetProcessInfo:1460 : cannot parse process status data
Nov 16 04:53:16 overcloud63m-comp-3 journal: 2018-11-16 09:53:16.380+0000: 240079: error : virProcessGetAffinity:506 : cannot get CPU affinity of process 258311: No such process
Nov 16 04:53:17 overcloud63m-comp-3 journal: 2018-11-16 09:53:17.568+0000: 240031: error : qemuMonitorIORead:610 : Unable to read from monitor: Connection reset by peer
Nov 16 04:53:17 overcloud63m-comp-3 kvm: 0 guests now active
Nov 16 04:53:17 overcloud63m-comp-3 systemd-machined: Machine qemu-1-instance-00000149
terminated.

Redhat are investigating this issue as well (link below) and have found that this memory is being marked as unreclaimable by the kernel. The likely cause for this is a SLAB memory leak potentially by the vRouter kernel module but this is as yet unconfirmed.

https://access.redhat.com/support/cases/#/case/02255073

Tags: cc13
summary: - Potential memory leak issue with 5.0.1 kmod vRouter
+ CC13: Potential memory leak issue with 5.0.1 kmod vRouter
information type: Proprietary → Public
tom murray (tmurray-a)
Changed in juniperopenstack:
assignee: nobody → Sivakumar Ganapathy (hotlava51)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.