Activity log for bug #1750038

Date Who What changed Old value New value Message
2018-02-16 20:22:11 Dragan S. bug added bug
2018-02-16 20:23:09 Dragan S. linux (Ubuntu): milestone xenial-updates
2018-02-16 20:23:30 Dragan S. linux (Ubuntu): assignee Dragan S. (dragan-s)
2018-02-16 20:30:06 Ubuntu Kernel Bot linux (Ubuntu): status New Incomplete
2018-02-20 16:39:41 Joseph Salisbury tags kernel-da-key
2018-05-08 04:04:35 Daniel Axtens description Using Ubuntu Xenial user reports processes hang in D state waiting for disk io. Ocassionally one of the applications gets into "D" state on NFS reads/sync and close system calls. based on the kernel backtraces seems to be stuck in kmalloc allocation during cleanup of dirty NFS pages. All the subsequent operations on the NFS mounts are stuck and reboot is required to rectify the situation. [Test scenario] 1) Applications running in Docker environment 2) Application have cgroup limits --cpu-shares --memory -shm-limit 3) python and C++ based applications (torch and caffe) 4) Applications read big lmdb files and write results to NFS shares 5) use NFS v3 , hard and fscache is enabled 6) now swap space is configured This prevents all other I/O activity on that mount to hang. we are running into this issue more frequently and identified few applications causing this problem. As updated in the description, the problem seems to be happening when exercising the stack try_to_free_mem_cgroup_pages+0xba/0x1a0 we see this with docker containers with cgroup option --memory <USER_SPECIFIED_MEM>. whenever there is a deadlock, we see that the process that is hung has reached the maximum cgroup limit, multiple times and typically cleans up dirty data and caches to bring the usage under the limit. This reclaim path happens many times and finally we hit probably a race get into deadlock == SRU Justification == [Impact] Occasionally an application gets stuck in "D" state on NFS reads/sync and close system calls. All the subsequent operations on the NFS mounts are stuck and reboot is required to rectify the situation. [Fix] Use GPF_NOIO in some allocations in writeback to avoid a deadlock. This is upstream in: ae97aa524ef4 ("NFS: Use GFP_NOIO for two allocations in writeback") [Testcase] See Test scenario in previous description. A test kernel with this patch was tested heavily (>100hrs of test suite) without issue. [Regression Potential] This changes memory allocation in NFS to use a different policy. This could potentially affect NFS. However, the patch is already in Artful and Bionic without issue. The patch does not apply to Trusty. == Previous Description == Using Ubuntu Xenial user reports processes hang in D state waiting for disk io. Ocassionally one of the applications gets into "D" state on NFS reads/sync and close system calls. based on the kernel backtraces seems to be stuck in kmalloc allocation during cleanup of dirty NFS pages. All the subsequent operations on the NFS mounts are stuck and reboot is required to rectify the situation. [Test scenario] 1) Applications running in Docker environment 2) Application have cgroup limits --cpu-shares --memory -shm-limit 3) python and C++ based applications (torch and caffe) 4) Applications read big lmdb files and write results to NFS shares 5) use NFS v3 , hard and fscache is enabled 6) now swap space is configured This prevents all other I/O activity on that mount to hang. we are running into this issue more frequently and identified few applications causing this problem. As updated in the description, the problem seems to be happening when exercising the stack try_to_free_mem_cgroup_pages+0xba/0x1a0 we see this with docker containers with cgroup option --memory <USER_SPECIFIED_MEM>. whenever there is a deadlock, we see that the process that is hung has reached the maximum cgroup limit, multiple times and typically cleans up dirty data and caches to bring the usage under the limit. This reclaim path happens many times and finally we hit probably a race get into deadlock
2018-05-08 04:04:40 Daniel Axtens linux (Ubuntu): assignee Dragan S. (dragan-s) Daniel Axtens (daxtens)
2018-05-08 04:05:03 Daniel Axtens bug added subscriber Daniel Axtens
2018-05-11 13:48:16 Kleber Sacilotto de Souza nominated for series Ubuntu Xenial
2018-05-11 13:48:16 Kleber Sacilotto de Souza bug task added linux (Ubuntu Xenial)
2018-05-11 13:48:25 Kleber Sacilotto de Souza linux (Ubuntu Xenial): status New In Progress
2018-05-11 13:48:40 Kleber Sacilotto de Souza linux (Ubuntu): status Incomplete Fix Released
2018-05-15 10:41:34 Kleber Sacilotto de Souza linux (Ubuntu Xenial): status In Progress Fix Committed
2018-05-28 14:03:47 Brad Figg tags kernel-da-key kernel-da-key verification-needed-xenial
2018-06-01 18:22:02 David Coronel tags kernel-da-key verification-needed-xenial kernel-da-key verification-done-xenial
2018-06-11 15:09:13 Launchpad Janitor linux (Ubuntu Xenial): status Fix Committed Fix Released
2018-06-11 15:09:13 Launchpad Janitor cve linked 2017-5715
2018-06-11 15:09:13 Launchpad Janitor cve linked 2017-5753
2018-06-11 15:09:13 Launchpad Janitor cve linked 2018-3639
2018-06-11 15:09:13 Launchpad Janitor cve linked 2018-8087