Comment 0 for bug 1042172

Revision history for this message
Muharem Hrnjadovic (al-maisan) wrote : gemmicro01 lock-up

gemmicro01 became unresponsive a week ago. Today Urs was finally granted access to the data center and we could take a look at the machine. We hooked up a console and a keyboard but the machine was hung up and we could not log into it.

We also took a quick look at gemmicro01 and noticed an error on the console: It read approx. as follows:

    gemmicro02 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)

The same error was found in the /var/log/syslog of other machines e.g.

    bigstar04.log:Aug 27 09:04:54 bigstar04 kernel: [2986792.080036] INFO: rcu_bh detected stall on CPU 35 (t=0 jiffies)
    gemmicro01.log:Aug 17 08:16:39 gemmicro01 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)
    gemmicro01.log:Aug 17 18:06:45 gemmicro01 kernel: [1582229.668026] INFO: rcu_bh detected stall on CPU 46 (t=0 jiffies)

Whether this is what caused gemmicro01 to lock up remains to be determined.

There is a possibility that these kinds of errors were fixed in the 3.4 kernel (see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1003081)

Also please see: https://lkml.org/lkml/2012/3/27/169