Activity log for bug #1042172

Date Who What changed Old value New value Message
2012-08-27 10:30:56 Muharem Hrnjadovic bug added bug
2012-08-27 10:31:02 Muharem Hrnjadovic openquake: status New Confirmed
2012-08-27 10:31:05 Muharem Hrnjadovic openquake: importance Undecided Medium
2012-08-27 10:31:22 Muharem Hrnjadovic tags devop mfcluster
2012-08-27 10:33:49 Muharem Hrnjadovic description gemmicro01 became unresponsive a week ago. Today Urs was finally granted access to the data center and we could take a look at the machine. We hooked up a console and a keyboard but the machine was hung up and we could not log into it. We also took a quick look at gemmicro01 and noticed an error on the console: It read approx. as follows: gemmicro02 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies) The same error was found in the /var/log/syslog of other machines e.g. bigstar04.log:Aug 27 09:04:54 bigstar04 kernel: [2986792.080036] INFO: rcu_bh detected stall on CPU 35 (t=0 jiffies) gemmicro01.log:Aug 17 08:16:39 gemmicro01 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies) gemmicro01.log:Aug 17 18:06:45 gemmicro01 kernel: [1582229.668026] INFO: rcu_bh detected stall on CPU 46 (t=0 jiffies) Whether this is what caused gemmicro01 to lock up remains to be determined. There is a possibility that these kinds of errors were fixed in the 3.4 kernel (see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1003081) Also please see: https://lkml.org/lkml/2012/3/27/169 gemmicro01 became unresponsive a week ago. Today Urs was finally granted access to the data center and we could take a look at the machine. We hooked up a console and a keyboard but the machine was hung up and we could not log into it. We also took a quick look at gemmicro01 and noticed an error on the console: It read approx. as follows:     gemmicro02 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies) The same error was found in the /var/log/syslog of other machines e.g.     bigstar04.log:Aug 27 09:04:54 bigstar04 kernel: [2986792.080036] INFO: rcu_bh detected stall on CPU 35 (t=0 jiffies)     gemmicro01.log:Aug 17 08:16:39 gemmicro01 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)     gemmicro01.log:Aug 17 18:06:45 gemmicro01 kernel: [1582229.668026] INFO: rcu_bh detected stall on CPU 46 (t=0 jiffies) Whether this is what caused gemmicro01 to lock up remains to be determined. There is a possibility that these kinds of errors were fixed in the 3.4 kernel (see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1003081) Also please see: https://lkml.org/lkml/2012/3/27/169 And: http://www.kernel.org/doc/Documentation/RCU/stallwarn.txt
2012-09-06 09:50:42 Muharem Hrnjadovic summary gemmicro01 lock-up machine lock-up, upgrade kernel on MF cluster
2012-09-06 09:51:02 Muharem Hrnjadovic openquake: assignee Muharem Hrnjadovic (al-maisan)
2012-09-06 09:51:05 Muharem Hrnjadovic openquake: importance Medium High
2012-09-06 09:51:08 Muharem Hrnjadovic openquake: milestone 0.8.3
2013-03-11 14:03:31 Lars Butler openquake: status Confirmed Fix Committed
2013-03-11 14:03:34 Lars Butler openquake: status Fix Committed Fix Released