machine lock-up, upgrade kernel on MF cluster
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenQuake (deprecated) |
Fix Released
|
High
|
Muharem Hrnjadovic |
Bug Description
gemmicro01 became unresponsive a week ago. Today Urs was finally granted access to the data center and we could take a look at the machine. We hooked up a console and a keyboard but the machine was hung up and we could not log into it.
We also took a quick look at gemmicro01 and noticed an error on the console: It read approx. as follows:
gemmicro02 kernel: [1546823.628018] INFO: rcu_bh detected stall on CPU 6 (t=0 jiffies)
The same error was found in the /var/log/syslog of other machines e.g.
bigstar04.
gemmicro01.
gemmicro01.
Whether this is what caused gemmicro01 to lock up remains to be determined.
There is a possibility that these kinds of errors were fixed in the 3.4 kernel (see https:/
Also please see: https:/
And: http://
Changed in openquake: | |
status: | New → Confirmed |
importance: | Undecided → Medium |
tags: | added: devop mfcluster |
description: | updated |
Changed in openquake: | |
status: | Confirmed → Fix Committed |
status: | Fix Committed → Fix Released |
The kernel will be upgraded on all MF cluster machine in order to prevent this issue from occurring