linux 4.4.0-59.80 ADT test failure with linux 4.4.0-59.80

Bug #1654971 reported by Andy Whitcroft
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Colin Ian King
Revision history for this message
Andy Whitcroft (apw) wrote :

This one failed with a hang in the kernel regression test suite in whatever is after the sockfd test:

  01:55:06 DEBUG| [stdout] sock PASSED
  01:55:16 DEBUG| [stdout] sockfd PASSED
  [hang]

Looking back at previous runs we also see it drop "Killed" in the same place:

  06:30:16 DEBUG| [stdout] sock PASSED
  06:30:26 DEBUG| [stdout] sockfd PASSED
  Killed

Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Confirmed
assignee: nobody → Colin Ian King (colin-king)
tags: added: kernel-adt-failure
Revision history for this message
Colin Ian King (colin-king) wrote :

the following stable commit fixes this hang:

http://git.kernel.org/cgit/linux/kernel/git/stable/stable-queue.git/diff/releases/4.4.41/ftrace-x86_32-set-ftrace_stub-to-weak-to-prevent-gcc-from-using-short-jumps-to-it.patch?id=4a1e0d3c5df26e1689c6147c078af10a367969d9

since these will be landing in our stable trees very soon from upstream stable, let's wait for them to land rather than SRU these.

Revision history for this message
Colin Ian King (colin-king) wrote :

OK, there's one more bug that's causing ADT to hang, I've now managed to reproduce another hang, it seems specific to running the sockpair stress-ng stressor with a large number of CPUs and > 2GB of memory.

Changed in linux (Ubuntu):
status: Confirmed → In Progress
Revision history for this message
Colin Ian King (colin-king) wrote :

So the root issue behind this was stress-ng was calling prctl(PR_SET_DUMPABLE) which causes the oom-adjustments to not be set. This in turn meant that running the sockpair stressor as root let do non-oomable processes when memory got tight and we end up in locked in kernel space looking for free pages to create a socket pair.

This is a kind of DoS, however, it is running as root on a low memory i386 system, so I'm currently fixing the stress-ng oom-adjustments issue so at least the kernel can OOM the stressor. I've also added oom detecting in the stressor and auto-respawning to keep the pressure up when a process gets killed.

See commits:
http://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=4e098f94e6366bf096506d8c78fd9f96a0a59950
http://kernel.ubuntu.com/git/cking/stress-ng.git/commit/?id=47eb86ed8e13bd9171672f7b5b98773452e4fd16

Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.