I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run
------- Comment From <email address hidden> 2016-07-19 01:37 EDT-------
In the kern.log posted, it looks like the problem has moved to
A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment
------- Comment From <email address hidden> 2016-07-19 19:51 EDT-------
I am cloning the sources to debug further
Both the times, the test did the right thing. Could someone verify if
(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep
I was just testing for the command line above and I could see a difference with those patches.
------- Comment From <email address hidden> 2016-07-21 20:14 EDT-------
No, the diff matches, sorry for the confusion, but here is what I said
"I also verified the diff matches my changes"
In summary, here is what I did
1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console
Did four runs and I got back the console each time
However with the provided binaries
Step 3 (stress-ng) failed for me once in two runs
------- Comment From <email address hidden> 2016-07-25 08:08 EDT-------
Strange, I am able to reproduce the issue with the provided binaries, but not when I build it. I am not doing a deb build, but just a make -j64 with the config from /boot for 4.4.0-28. The problem could be at my end, but I am a little concerned.
I also noticed that if I am interacting with the system during runs, it succeeds, frequently checking if the console is active (enters and control-o-h). I am going to see if I can get a repro again and debug further.
------- Comment From <email address hidden> 2016-07-25 09:09 EDT-------
In the meanwhile, any updates on the bisect? I was hoping we could do both things (RCA and bisect) in parallel
Thanks,
Balbir
------- Comment From <email address hidden> 2016-07-25 23:37 EDT-------
I've been working off the assumption that the bug was fixed in mainline :)
I tried a few runs, including 4.5 (4.5.0-040500-generic_4.5.0-040500.201605161244) and it worked for me as well (comment #25). I presume I should stick to comment #92 and assume that the bug is still present in mainline
------- Comment From <email address hidden> 2016-07-17 21:43 EDT------- people. canonical. com/~kamal/ lp1573062/ lp1573062. 1/ and it worked fine for me
I tried the kernel at http://
------- Comment From <email address hidden> 2016-07-19 01:04 EDT------- people. canonical. com/~kamal/ lp1573062/ lp1573062. 1/
Looks like I got a failure with the run on http://
But with my diff + 4.4.0 source from apt-source, I can always get the the following command to succeed.
timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0
I've tried three times with my diff (all success) and twice with the kernel @ ~kamal (one failure and one success). I've not tried the longer 7 hour run
------- Comment From <email address hidden> 2016-07-19 01:37 EDT-------
In the kern.log posted, it looks like the problem has moved to
rwsem_wake+ 0xcc/0x110 anon_vmas+ 0x15c/0x2c0
up_write+0x78/0x90
unlink_
A bunch of threads are stuck on rwsem_wake -- spinning on the sem->wait_lock. I can see a whole bunch of exiting stress-ng-mmapf stuck on this lock, spinning. I'll double check this. Can we get a build with lockdep enabled? I am unable to reproduce this issue at my end with the diff applied on my machine at the moment
------- Comment From <email address hidden> 2016-07-19 19:51 EDT-------
I am cloning the sources to debug further
------- Comment From <email address hidden> 2016-07-19 23:52 EDT------- /git.launchpad. net/~kamalmosta fa/ubuntu/ +source/ linux/+ git/xenial/ log/?h= lp1573062 and built with the machine config specified from /boot/config. I also verified the diff matches my changes.
I cloned the kernel from https:/
I ran
timeout -s 9 $end_time stress-ng --aggressive --verify --timeout $runtime --brk 0
twice
Both the times, the test did the right thing. Could someone verify if
(a) The smaller subset works fine?
(b) The larger test fails, if so, can we get a run with lockdep
I was just testing for the command line above and I could see a difference with those patches.
------- Comment From <email address hidden> 2016-07-21 20:14 EDT-------
No, the diff matches, sorry for the confusion, but here is what I said
"I also verified the diff matches my changes"
In summary, here is what I did
1. cloned the sources
2. built locally on my machine
3. Ran stress-ng with recommended parameters
4. The test succeeded, got back the console
Did four runs and I got back the console each time
However with the provided binaries
Step 3 (stress-ng) failed for me once in two runs
------- Comment From <email address hidden> 2016-07-25 08:08 EDT-------
Strange, I am able to reproduce the issue with the provided binaries, but not when I build it. I am not doing a deb build, but just a make -j64 with the config from /boot for 4.4.0-28. The problem could be at my end, but I am a little concerned.
I also noticed that if I am interacting with the system during runs, it succeeds, frequently checking if the console is active (enters and control-o-h). I am going to see if I can get a repro again and debug further.
------- Comment From <email address hidden> 2016-07-25 09:09 EDT-------
In the meanwhile, any updates on the bisect? I was hoping we could do both things (RCA and bisect) in parallel
Thanks,
Balbir
------- Comment From <email address hidden> 2016-07-25 23:37 EDT-------
I've been working off the assumption that the bug was fixed in mainline :)
I tried a few runs, including 4.5 (4.5.0- 040500- generic_ 4.5.0-040500. 201605161244) and it worked for me as well (comment #25). I presume I should stick to comment #92 and assume that the bug is still present in mainline