iTCO_wdt: Unexpected close, not stopping watchdog

Bug #1464073 reported by johnnyliao
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
stress-ng (Ubuntu)
Fix Released
Medium
Colin Ian King

Bug Description

Running stress-ng test and system will auomatically reboot after 60s. From the dmesg, there is one error shown “iTCO_wdt: Unexpected close, not stopping watchdog”.
However, I already disabled IPMI and the problem still persists. From system BIOS saying, BIOS does not fully control iTCO_wdt, but OS does. By running other stress program, there is no such problem found during same H/W, enviornment or operation.

I suspect stress-ng triggered TCO_EN bit and force the system automatically reboot with specific time (always 60s). Is there any other suggestion or further advice will be highly appreciated.

Related branches

Revision history for this message
Colin Ian King (colin-king) wrote :

Is there any specific stress test that triggers this issue?

Revision history for this message
johnnyliao (johnnyliao) wrote :

HI Colin,
Only run stress-ng. Once while it's executed, the system will reboot after done for few cycles.
I can easily duplicate the problem in most of Haswell platform boards.
If I disable "iTCO_wdt" or rmmod iTCO_wdt driver from kernel, the problem goes away.

Sinc TCO is not controlled by BIOS, but OS, our engineering team has no idea on how's going t happen.
The reboot problem is only happening while executing stress-ng test program. I verified other system stress test or burn-in utilities, they did not have such problem at all.

Revision history for this message
Colin Ian King (colin-king) wrote :

My guess is that the "stress-ng --sysfs" test is causing the problem. Can you run:

stress-ng --sysfs 1

and see if that triggers it

Changed in stress-ng (Ubuntu):
importance: Undecided → Medium
assignee: nobody → Colin Ian King (colin-king)
status: New → Incomplete
Revision history for this message
johnnyliao (johnnyliao) wrote :

HI Colin,
Yes. From trying the parameter "--sysfs 1", iTCO_wdt is not been triggered and the server does not reboot self anymore. Appreciate the support. Any question is on "sysfs 1", what does this mean for? Any reason for having such problem if I did not add "sysfs"?

Revision history for this message
Colin Ian King (colin-king) wrote :

OK, are you running this as a normal user or as root?

Revision history for this message
johnnyliao (johnnyliao) wrote :

Hi Colin,
I was operating the command with root account.

Revision history for this message
Colin Ian King (colin-king) wrote :

..and my guess is that it does not occur when running it with normal user privileges. Do you mind just checking for that too? Thanks

Revision history for this message
johnnyliao (johnnyliao) wrote :

Yes. I create a testing account and did not get server reboot with 30 minutes operation. If I use root account, it will reboot within 2~3 minutes. But, I saw lots of permission denied since I did not grant full permission for the test account.

Revision history for this message
Colin Ian King (colin-king) wrote :

Thanks, I'm uploading a fix right now, should land in stress-ng 0.04.06 in a day or so

Revision history for this message
Colin Ian King (colin-king) wrote :
Changed in stress-ng (Ubuntu):
status: Incomplete → Fix Committed
Revision history for this message
johnnyliao (johnnyliao) wrote :

Thanks for help, Colin. You really give us a big help here. :)

Changed in stress-ng (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.