Brief Description
-----------------
OOM xfs_filemap_fault
Order zero OOM (not a cgroup OOM)
can't give order zero memory back
Severity
--------
Major
Steps to Reproduce
------------------
1. installed stx but did not apply stx-openstack application (Note: labels existed on worker nodes)
2. Performed system application-apply (versioned tarfile for stx-openstack)
system application-apply was reported as successfully completed.
3. attempted system application-update to centos-stable-latest
2019-06-12T18:34:57.000 controller-0 -sh: info HISTORY: PID=1670384 UID=1875 system application-update stx-openstack-1.0-14-centos-stable-latest.tgz
Expected Behaviour
------------------
Actual Behaviour
----------------
Order zero OOM (Note: it is not a cgroup OOM) on compute-0
can't give order zero memory back
see memtop and top output attached
In step 3, the application-update fails (and also failed recovery as well)
...
| stx-openstack | 1.0-14-centos-stable- | armada-manifest | stx-openstack. | apply-failed | application update from |
| | versioned | | yaml | | version 1.0-14-centos-stable-versioned to version 1.0-14 centos-stable-latest aborted. application recover to version 1.0-14-centos-stable-versioned aborted. please check logs for detail.
see CrashLoopBackOff of the following pods
kubectl get pods -n openstack
neutron-ovs-agent-compute-0-9c041f23-hsx4x 0/1 CrashLoopBackOff 14 61m
neutron-ovs-agent-compute-1-eae26dba-rcd2n 1/1 Running 0 60m
..
nova-compute-compute-0-75ea0372-787m7 1/2 CrashLoopBackOff 13 61m
nova-compute-compute-1-eae26dba-2zk4k 2/2 Running 0 61m
Reproducibility
---------------
TBD
System Configuration
--------------------
standard system
Branch/Pull Time/Commit
-----------------------
BUILD_ID="20190612T013000Z"
Timestamp/Logs
--------------
see kern.log compute-0 also attached
2019-06-12T18:48:59.180 compute-0 kernel: warning [12644.237895] python invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=1000
2019-06-12T18:48:59.180 compute-0 kernel: info [12644.237901] python cpuset=80af688d281bc86ecf6cc924f480c1d292b507e64b737cdb31f287f1ba16a6f8 mems_allowed=0
2019-06-12T18:48:59.180 compute-0 kernel: warning [12644.237905] CPU: 0 PID: 316722 Comm: python Kdump: loaded Tainted: G O ------------ T 3.10.0-957.12.2.el7.1.tis.x86_64 #1
2019-06-12T18:48:59.180 compute-0 kernel: warning [12644.237907] Hardware name: Intel Corporation S2600WT2R/S2600WT2R, BIOS SE5C610.86B.01.01.0022.062820171903 06/28/2017
2019-06-12T18:48:59.180 compute-0 kernel: warning [12644.237909] Call Trace:
2019-06-12T18:48:59.180 compute-0 kernel: warning [12644.237921] [<ffffffff8a608991>] dump_stack+0x19/0x1b
2019-06-12T18:48:59.180 compute-0 kernel: warning [12644.237928] [<ffffffff8a603dce>] dump_header+0x8e/0x23f
2019-06-12T18:48:59.180 compute-0 kernel: warning [12644.237934] [<ffffffff8a617516>] ? retint_kernel+0x26/0x30
2019-06-12T18:48:59.180 compute-0 kernel: warning [12644.237941] [<ffffffff89f886de>] oom_kill_process+0x24e/0x3d0
2019-06-12T18:48:59.180 compute-0 kernel: warning [12644.237944] [<ffffffff89f2df9b>] ? rcu_read_unlock_special+0x1ab/0x1b0
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.237947] [<ffffffff89f88f43>] out_of_memory+0x4d3/0x510
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.237952] [<ffffffff89f8f2b5>] __alloc_pages_nodemask+0xa85/0xb80
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.237972] [<ffffffff89fd9ad8>] alloc_pages_current+0x98/0x110
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.237975] [<ffffffff89f84447>] __page_cache_alloc+0x97/0xb0
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.237978] [<ffffffff89f87258>] filemap_fault+0x278/0x460
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238014] [<ffffffffc059215e>] __xfs_filemap_fault+0x7e/0x200 [xfs]
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238029] [<ffffffffc059238c>] xfs_filemap_fault+0x2c/0x30 [xfs]
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238047] [<ffffffff89fb18fa>] __do_fault.isra.70+0x8a/0x100
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238050] [<ffffffff89fb1f1c>] do_read_fault.isra.72+0x4c/0x1b0
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238053] [<ffffffff89fb8547>] handle_mm_fault+0x557/0xc30
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238059] [<ffffffff89e5fd43>] __do_page_fault+0x1e3/0x440
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238061] [<ffffffff89e60015>] do_page_fault+0x35/0x90
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238064] [<ffffffff8a617648>] page_fault+0x28/0x30
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238066] Mem-Info:
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238074] active_anon:390090 inactive_anon:2108 isolated_anon:0
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238074] active_file:29532 inactive_file:42331 isolated_file:0
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238074] unevictable:1356 dirty:39 writeback:0 unstable:0
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238074] slab_reclaimable:37926 slab_unreclaimable:115835
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238074] mapped:19639 shmem:3472 pagetables:9901 bounce:0
2019-06-12T18:48:59.181 compute-0 kernel: warning [12644.238074] free:759204 free_pcp:273 free_cma:0
@Wendy, can you please check if you're seeing the same issue as of https:/ /bugs.launchpad .net/starlingx/ +bug/1827258?