Comment 96 for bug 1573062

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2016-07-17 21:43 EDT-------
I tried the kernel at http://people.canonical.com/~kamal/lp1573062/lp1573062.1/ and it worked fine for me

------- Comment From <email address hidden> 2016-07-19 01:04 EDT-------
Looks like I got a failure with the run on http://people.canonical.com/~kamal/lp1573062/lp1573062.1/

But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed.

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0

I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run

------- Comment From <email address hidden> 2016-07-19 01:37 EDT-------
In the kern.log posted, it looks like the problem has moved to

rwsem_wake+0xcc/0x110
up_write+0x78/0x90
unlink_anon_vmas+0x15c/0x2c0

A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment

------- Comment From <email address hidden> 2016-07-19 19:51 EDT-------
I am cloning the sources to debug further

------- Comment From <email address hidden> 2016-07-19 23:52 EDT-------
I cloned the kernel from https://git.launchpad.net/~kamalmostafa/ubuntu/+source/linux/+git/xenial/log/?h=lp1573062 and built with the machine config specified from /boot/config. I also verified the diff matches my changes.

I ran

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0

twice

Both the times, the test did the right thing. Could someone verify if

(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep

I was just testing for the command line above and I could see a difference with those patches.

------- Comment From <email address hidden> 2016-07-21 20:14 EDT-------
No, the diff matches, sorry for the confusion, but here is what I said

"I also verified the diff matches my changes"

In summary, here is what I did

1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console

Did four runs and I got back the console each time

However with the provided binaries

Step 3 (stress-ng) failed for me once in two runs

------- Comment From <email address hidden> 2016-07-25 08:08 EDT-------
Strange, I am able to reproduce the issue with the provided binaries, but not when I build it. I am not doing a deb build, but just a make -j64 with the config from /boot for 4.4.0-28. The problem could be at my end, but I am a little concerned.

I also noticed that if I am interacting with the system during runs, it succeeds, frequently checking if the console is active (enters and control-o-h). I am going to see if I can get a repro again and debug further.

------- Comment From <email address hidden> 2016-07-25 09:09 EDT-------
In the meanwhile, any updates on the bisect? I was hoping we could do both things (RCA and bisect) in parallel

Thanks,
Balbir

------- Comment From <email address hidden> 2016-07-25 23:37 EDT-------
I've been working off the assumption that the bug was fixed in mainline :)

I tried a few runs, including 4.5 (4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as well (comment #25). I presume I should stick to comment #92 and assume that the bug is still present in mainline

------- Comment From <email address hidden> 2016-07-31 21:17 EDT-------
Does this succeed on your system? Could you please try three runs?

timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0