gemmicro01 became unresponsive a week ago. Today Urs was finally granted access to the data center and we could take a look at the machine. We hooked up a console and a keyboard but the machine was hung up and we could not log into it.
We also took a quick look at gemmicro01 and noticed an error on the console: It read approx. as follows:
gemmicro02 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)
The same error was found in the /var/log/syslog of other machines e.g.
bigstar04.log:Aug 27 09:04:54 bigstar04 kernel: [2986792.080036] INFO: rcu_bh detected stall on CPU 35 (t=0 jiffies)
gemmicro01.log:Aug 17 08:16:39 gemmicro01 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)
gemmicro01.log:Aug 17 18:06:45 gemmicro01 kernel: [1582229.668026] INFO: rcu_bh detected stall on CPU 46 (t=0 jiffies)
Whether this is what caused gemmicro01 to lock up remains to be determined.
gemmicro01 became unresponsive a week ago. Today Urs was finally granted access to the data center and we could take a look at the machine. We hooked up a console and a keyboard but the machine was hung up and we could not log into it.
We also took a quick look at gemmicro01 and noticed an error on the console: It read approx. as follows:
gemmicro02 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)
The same error was found in the /var/log/syslog of other machines e.g.
bigstar04. log:Aug 27 09:04:54 bigstar04 kernel: [2986792.080036] INFO: rcu_bh detected stall on CPU 35 (t=0 jiffies) log:Aug 17 08:16:39 gemmicro01 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies) log:Aug 17 18:06:45 gemmicro01 kernel: [1582229.668026] INFO: rcu_bh detected stall on CPU 46 (t=0 jiffies)
gemmicro01.
gemmicro01.
Whether this is what caused gemmicro01 to lock up remains to be determined.
There is a possibility that these kinds of errors were fixed in the 3.4 kernel (see https:/ /bugs.launchpad .net/ubuntu/ +source/ linux/+ bug/1003081)
Also please see: https:/ /lkml.org/ lkml/2012/ 3/27/169