2012-08-27 10:33:49 |
Muharem Hrnjadovic |
description |
gemmicro01 became unresponsive a week ago. Today Urs was finally granted access to the data center and we could take a look at the machine. We hooked up a console and a keyboard but the machine was hung up and we could not log into it.
We also took a quick look at gemmicro01 and noticed an error on the console: It read approx. as follows:
gemmicro02 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)
The same error was found in the /var/log/syslog of other machines e.g.
bigstar04.log:Aug 27 09:04:54 bigstar04 kernel: [2986792.080036] INFO: rcu_bh detected stall on CPU 35 (t=0 jiffies)
gemmicro01.log:Aug 17 08:16:39 gemmicro01 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)
gemmicro01.log:Aug 17 18:06:45 gemmicro01 kernel: [1582229.668026] INFO: rcu_bh detected stall on CPU 46 (t=0 jiffies)
Whether this is what caused gemmicro01 to lock up remains to be determined.
There is a possibility that these kinds of errors were fixed in the 3.4 kernel (see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1003081)
Also please see: https://lkml.org/lkml/2012/3/27/169 |
gemmicro01 became unresponsive a week ago. Today Urs was finally granted access to the data center and we could take a look at the machine. We hooked up a console and a keyboard but the machine was hung up and we could not log into it.
We also took a quick look at gemmicro01 and noticed an error on the console: It read approx. as follows:
gemmicro02 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)
The same error was found in the /var/log/syslog of other machines e.g.
bigstar04.log:Aug 27 09:04:54 bigstar04 kernel: [2986792.080036] INFO: rcu_bh detected stall on CPU 35 (t=0 jiffies)
gemmicro01.log:Aug 17 08:16:39 gemmicro01 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)
gemmicro01.log:Aug 17 18:06:45 gemmicro01 kernel: [1582229.668026] INFO: rcu_bh detected stall on CPU 46 (t=0 jiffies)
Whether this is what caused gemmicro01 to lock up remains to be determined.
There is a possibility that these kinds of errors were fixed in the 3.4 kernel (see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1003081)
Also please see: https://lkml.org/lkml/2012/3/27/169
And: http://www.kernel.org/doc/Documentation/RCU/stallwarn.txt |
|