Raspberry Pi 3B hangs - dev_pm_opp_set_rate: failed to find current OPP, Failed to get throttled, Failed to change plib frequency; mmc timeout waiting for hardware interrupt

Bug #1889637 reported by stellarpower
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

### uname -a (64-bit ARM, official image):
`Linux ubuntu 5.4.0-1015-raspi #15-Ubuntu SMP Fri Jul 10 05:34:24 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux`
### LSB release (Ubuntu *Server*, focal):
Description: Ubuntu 20.04.1 LTS
### Interesting packages installed
- zfs-dkms (with initramfs support) @ 0.8.3-1ubuntu12.2
  * spl-dkms @ 0.8.3-1ubuntu12.2
- dphys-swapfile
### Hardware model:
Raspberry Pi 3 Model B
- 32 GiB SD card with root partition
  * had a swap partition; now unused
  * migrated to dphys-swapfile
- Attached 32 GiB USB stick as zpool for storage (not root FS)
- Current PSU reportedly outputs 2.4A supply for the Pi
  * Still have occasional undervolt warnings (formally requires 2.5A)
  * Lightning indicator not present however
- Connected over wireless networking

## Issue
- When under significant computational load at some point, the machine appears to freeze.
  * I usually log in in a headless manner via ssh, so externally the machine is frozen and I need to pull the power cable
- Connectig the HDMI monitor the following messsages appear, in various orders each time:

```terminal
cpu cpu0: dev_pm_opp_set_rate: failed to find current OPP for freq 9,223,372,036,854,775,698 ({illegible on my photograph, presumably -110})
hwmon hwmon1: Failed to get throttled (-110)
raspberrypi-clk firmware clocks: Failed to change plib frequency: -110
mmc0: timeout waiting for hardware interrupt
# mmc0 would be the root partition

### ... typically later on in the output

rcu: INFO: rcu_sched detected stalls on CPU/tasks
rcu: $1-...0: (1 GPs behind) idle=.../1/0x40000{more 0s...}02 softirq=66377/66378 {or 26106/26107} fqs={this value varies}
INFO: task kworker/{...} blocked for more than 120 seconds
   TAINTED: P WC OE 5.4.0-1015-raspi #15-ubuntu
watchdog: BUG: soft lockup - CPU #3 stuck for 22s!

```

The OPP frequency above looks to me like it may be the cause of the issue, I have added the commas myself to the output but it would appear to be a rubbish value; [this](https://lkml.org/lkml/2020/7/24/683) mailing list archive I found whilst searching for terms found in the messages appears to back up my belief that we should be seeing a sensible CPU frequency here, expressed in integer Hz; the above would be 9.2 EHz assuming Hertz are the base unit, higher still if it's k/M/GHz etc. My most sensible guess is this value has been brought up somewhere as garbage, and understandably the system fails to scale the clock speed, with the resultant crashes presumably due to this.

Beyond this point, there is no kernel panic, however the machine locks up externally; does not respond to USB keyboard NumLock and is invisible on the network, with more and more errors gradually being output to the console via the HDMI display; the most notable being the SD card is not responding

Just before encountering this issue I had added a swap aprtition, to the SD card, as I had none by default and the system seemed to be hanging when it presumably was sending bad_allocs to userland processes as it failed to allocate memory. As the SD card was mentioned, I have tried a variety of power supplies (as I was getting several undervolt warnings) and eventually removed the swap partition and used a swapfile with `dphys-swapfile` knowing that the way the Pi accesses the SD card is somewhat different from a typical machine. However, neither of these two seems to have resolved the issue, giving further evidence that the frequency scaling may well be the primary issue and the rest is simply the carnage that ensues.

## Steps to Reproduce
- Seems to happen sporadically when the machine is under stress, within 5-25 minutes
- Currently I am trying to set up a rootless docker compose file
  * Attempting to pull the images eventually leads to the issue
  * The images are being downloaded to the zpool on the USB stick and *not* the SD card
- The system seems to hang initially waiting on the SDcard to respond to an IRQ
- however I believe that the CPU scaling message seems to be the root cause
- Do not have any of the importat messages in the `syslog`, I need an external HDMI monitor to get the output on screen from the kernel ring buffer

## Links

- [Related AskUbuntu question](https://askubuntu.com/questions/1241412/ubuntu-20-04-lts-hangs-with-error-hwmon1-failed-to-get-throttled-110)
- [Potentially related bug - the frequency issue seems to be the same, however the specific cause and a workaround are different](https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/1875148)

## Extra
- Attaching /proc/cpuinfo
- Please let me know if any more diagnostics required; I would use hardinfo or inxi but both want to install large parts of X which I don't want to do

Revision history for this message
stellarpower (stellarpower) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1889637/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1889637

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: focal
Revision history for this message
stellarpower (stellarpower) wrote :
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Dave Jones (waveform) wrote :

I'm afraid if you're getting undervolt warnings that's almost certainly the cause of any serious issues like kernel hangs. The lightning bolt not being visible simply means that the undervolt condition is not sustained, but nevertheless any such warnings indicate that your power supply is dropping below 4.7V (usually under load). Before investigating this further, we'd need to eliminate that as a potential cause.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.