Unable to reboot ARM64 node with 5.15 realtime

Bug #1946486 reported by Po-Hsu Lin
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
ubuntu-kernel-tests
New
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Medium
Joseph Salisbury
Hirsute
Invalid
Medium
Joseph Salisbury

Bug Description

It looks like this issue is affecting some ARM64 bare metal node:
  * helo-kernel 2 failed out of 2 attempts
  * kuzzle 11 failed out of 13 attempts (as far as I can recall 2 deployments failed early, the system was not deployed at all, so probably we can call it 9 failed)

The boot test will fail, the system was unable to reboot with the 5.11.0-27-realtime kernel, here is the steps (integrated with the deployment script)

1. Deploy this node with Hirsute
2. Install the 5.11.0-27-realtime kernel and set to boot with this kernel by using the boot-kernel-simple in ckct. Reboot
3. System can boot with 5.11.0-27-realtime, reboot again to make sure it's OK

It seem the system will stuck at the last reboot attempt in step 3.

Tags: hirsute
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

I tried to monitor this on node kuzzle, it looks like when it's rebooting on step 3 the console will be disconnected (Unless it's caused by someone else is trying to connect to the console) I can't get anything from it.

[ OK ] Reached target Reboot.
SOL session closed by BMC

Reconnect the console does not print any other info.

Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

On the console of helo-kernel, the system hang with step 3 with:
[ OK ] Finished Reboot.
[ OK ] Reached target Reboot.
(no further update from this point)

Changed in linux (Ubuntu Hirsute):
status: New → Incomplete
status: Incomplete → Confirmed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1946486

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: hirsute
Revision history for this message
Sean Feole (sfeole) wrote : Re: Unable to reboot ARM64 node with 5.11.0-27-realtime

After a bit of wrestling with the arm64 ampere system, I was able to reproduce the outlined issues in this bug.

Console log is attached, the system will initially boot into the 5.11.0-27-realtime kernel, however, when a "reboot" command is issued, the host appears sit idle, and will not warm reset.

On the console of helo-kernel, the system hang with step 3 with:
[ OK ] Finished Reboot.
[ OK ] Reached target Reboot.
(no further update from this point)

There are no other console logs that occur on the host

Revision history for this message
Sean Feole (sfeole) wrote :
Revision history for this message
Kleber Sacilotto de Souza (kleber-souza) wrote :

This issue has been happening for a long time and it's not a regression.

The boot failure can be found with helo-kernel at least since SRU cycle 2021.05.31. As the tests passed on other arm64 nodes I believe this issue had been ignored but now that we have more arm64 nodes being used for the tests the issue became more apparent.

Changed in linux (Ubuntu Hirsute):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu):
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Hirsute):
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

The 5.11 kernel will not support realtime, so updating bug to reflect Focal and 5.15.

Changed in linux (Ubuntu):
status: Incomplete → In Progress
Changed in linux (Ubuntu Hirsute):
status: Confirmed → Invalid
summary: - Unable to reboot ARM64 node with 5.11.0-27-realtime
+ Unable to reboot ARM64 node with 5.15 realtime
Changed in linux (Ubuntu):
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.