Prevent thermal shutdown during boot process

Bug #1906168 reported by Kai-Heng Feng
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
HWE Next
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Critical
Unassigned
Focal
Won't Fix
Undecided
Unassigned
Groovy
Fix Released
Critical
Unassigned
Hirsute
Fix Released
Critical
Unassigned
linux-oem-5.10 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Critical
Unassigned
linux-oem-5.6 (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Critical
Unassigned

Bug Description

[Impact]
Surprising thermal shutdown at boot on Intel based mobile workstations.

[Fix]
Since these thermal devcies are not in ACPI ThermalZone, OS shouldn't
shutdown the system.

These critial temperatures are for usespace to handle, so let kernel
know it shouldn't handle it.

SRU for stable kernels will be sent after the fix is in upstream.

[Test]
Use reboot stress as a reproducer. 5% chance to see a surprising
shutdown at boot.

With the fix applied, the thermal shutdown is no longer reproducible.

[Where problems could occur]
For ACPI based platforms, we still have "acpitz" to protect systems from
overheating. If these acpitz sensors don't work, then the system could
face real overheating issue.

CVE References

affects: linux (Ubuntu) → linux-oem-5.6 (Ubuntu)
Changed in linux-oem-5.6 (Ubuntu):
status: New → Invalid
Changed in linux-oem-5.6 (Ubuntu Focal):
status: New → Confirmed
importance: Undecided → Critical
tags: added: oem-priority originate-from-1905514 stella
no longer affects: linux-oem-5.6 (Ubuntu Groovy)
no longer affects: linux-oem-5.6 (Ubuntu Hirsute)
Changed in linux (Ubuntu Focal):
status: New → Won't Fix
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1906168

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.6 (Ubuntu Focal):
status: Confirmed → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
tags: added: verification-done-focal
removed: verification-needed-focal
Changed in linux-oem-5.10 (Ubuntu):
status: New → Invalid
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.6 - 5.6.0-1036.39

---------------
linux-oem-5.6 (5.6.0-1036.39) focal; urgency=medium

  * focal/linux-oem-5.6: 5.6.0-1036.39 -proposed tracker (LP: #1906396)

  * Prevent thermal shutdown during boot process (LP: #1906168)
    - SAUCE: thermal: core: Add indication for userspace usage
    - SAUCE: thermal: int340x: Indicate userspace usage
    - SAUCE: thermal: intel: intel_pch_thermal: Indicate userspace usage

  * alsa/hda: The sound output is abnormal when the balance is on center after
    switch from the audio speaker to headset on a Dell AIO (LP: #1905808)
    - ALSA: hda/realtek - Fixed Dell AIO wrong sound tone

  * [SRU][OEM-5.6] UBUNTU: SAUCE: Fix brightness control on BOE 2270 panel
    (LP: #1904991)
    - SAUCE: drm/i915: Force DPCD backlight mode for BOE 2270 panel

  * Use ACPI S5 for reboot (LP: #1904225)
    - PM: ACPI: reboot: Use S5 for reboot

 -- Timo Aaltonen <email address hidden> Wed, 02 Dec 2020 09:55:46 +0200

Changed in linux-oem-5.6 (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

what's the status for distro kernel / oem-5.10?

Revision history for this message
Kai-Heng Feng (kaihengfeng) wrote :

The proper fix just hit thermal/testing branch, I'll backport them once they are in linux-next.

Changed in linux (Ubuntu Groovy):
status: New → Confirmed
Changed in linux (Ubuntu Hirsute):
status: Incomplete → Confirmed
Changed in linux-oem-5.10 (Ubuntu Focal):
status: New → Confirmed
Changed in linux (Ubuntu Groovy):
importance: Undecided → Critical
Changed in linux (Ubuntu Hirsute):
importance: Undecided → Critical
Changed in linux-oem-5.10 (Ubuntu Focal):
importance: Undecided → Critical
Timo Aaltonen (tjaalton)
Changed in linux-oem-5.10 (Ubuntu Focal):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.10 - 5.10.0-1013.14

---------------
linux-oem-5.10 (5.10.0-1013.14) focal; urgency=medium

  * focal/linux-oem-5.10: 5.10.0-1013.14 -proposed tracker (LP: #1914024)

  * Fix no video output when boot up system with type-c port (LP: #1914020)
    - drm/i915/tgl: Fix typo during output setup

 -- Timo Aaltonen <email address hidden> Mon, 01 Feb 2021 12:47:03 +0200

Changed in linux-oem-5.10 (Ubuntu Focal):
status: Fix Committed → Fix Released
Changed in linux (Ubuntu Groovy):
status: Confirmed → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (20.7 KiB)

This bug was fixed in the package linux - 5.10.0-14.15

---------------
linux (5.10.0-14.15) hirsute; urgency=medium

  * hirsute/linux: 5.10.0-14.15 -proposed tracker (LP: #1913724)

  * Restore palm ejection on multi-input devices (LP: #1913520)
    - HID: multitouch: Apply MT_QUIRK_CONFIDENCE quirk for multi-input devices

  * intel-hid is not loaded on new Intel platform (LP: #1907160)
    - platform/x86: intel-hid: add Rocket Lake ACPI device ID

  * Hirsute update: v5.10.11 upstream stable release (LP: #1913430)
    - scsi: target: tcmu: Fix use-after-free of se_cmd->priv
    - mtd: rawnand: gpmi: fix dst bit offset when extracting raw payload
    - mtd: rawnand: nandsim: Fix the logic when selecting Hamming soft ECC engine
    - i2c: tegra: Wait for config load atomically while in ISR
    - i2c: bpmp-tegra: Ignore unknown I2C_M flags
    - platform/x86: ideapad-laptop: Disable touchpad_switch for ELAN0634
    - ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info()
    - ALSA: hda/realtek - Limit int mic boost on Acer Aspire E5-575T
    - ALSA: hda/via: Add minimum mute flag
    - crypto: xor - Fix divide error in do_xor_speed()
    - dm crypt: fix copy and paste bug in crypt_alloc_req_aead
    - ACPI: scan: Make acpi_bus_get_device() clear return pointer on error
    - btrfs: don't get an EINTR during drop_snapshot for reloc
    - btrfs: do not double free backref nodes on error
    - btrfs: fix lockdep splat in btrfs_recover_relocation
    - btrfs: don't clear ret in btrfs_start_dirty_block_groups
    - btrfs: send: fix invalid clone operations when cloning from the same file
      and root
    - fs: fix lazytime expiration handling in __writeback_single_inode()
    - pinctrl: ingenic: Fix JZ4760 support
    - mmc: core: don't initialize block size from ext_csd if not present
    - mmc: sdhci-of-dwcmshc: fix rpmb access
    - mmc: sdhci-xenon: fix 1.8v regulator stabilization
    - mmc: sdhci-brcmstb: Fix mmc timeout errors on S5 suspend
    - dm: avoid filesystem lookup in dm_get_dev_t()
    - dm integrity: fix a crash if "recalculate" used without "internal_hash"
    - dm integrity: conditionally disable "recalculate" feature
    - drm/atomic: put state on error path
    - drm/syncobj: Fix use-after-free
    - drm/amdgpu: remove gpu info firmware of green sardine
    - drm/amd/display: DCN2X Find Secondary Pipe properly in MPO + ODM Case
    - drm/i915/gt: Prevent use of engine->wa_ctx after error
    - drm/i915: Check for rq->hwsp validity after acquiring RCU lock
    - ASoC: Intel: haswell: Add missing pm_ops
    - ASoC: rt711: mutex between calibration and power state changes
    - SUNRPC: Handle TCP socket sends with kernel_sendpage() again
    - HID: sony: select CONFIG_CRC32
    - dm integrity: select CRYPTO_SKCIPHER
    - x86/hyperv: Fix kexec panic/hang issues
    - scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
    - scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
    - scsi: qedi: Correct max length of CHAP secret
    - scsi: scsi_debug: Fix memleak in scsi_debug_init()
    - scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
    - riscv: ...

Changed in linux (Ubuntu Hirsute):
status: Confirmed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
tags: added: verification-done-groovy
removed: verification-needed-groovy
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (18.6 KiB)

This bug was fixed in the package linux - 5.8.0-45.51

---------------
linux (5.8.0-45.51) groovy; urgency=medium

  * groovy/linux: 5.8.0-45.51 -proposed tracker (LP: #1916143)

  * Please trust Canonical Livepatch Service kmod signing key (LP: #1898716)
    - [Config] enable CONFIG_MODVERSIONS=y
    - [Packaging] build canonical-certs.pem from branch/arch certs
    - [Config] add Canonical Livepatch Service key to SYSTEM_TRUSTED_KEYS
    - [Config] add ubuntu-drivers key to SYSTEM_TRUSTED_KEYS
    - [Config] Allow ASM_MODVERSIONS and MODULE_REL_CRCS

  * CVE-2021-20194
    - bpf, cgroup: Fix optlen WARN_ON_ONCE toctou
    - bpf, cgroup: Fix problematic bounds check

  * Missing device id for Intel TGL-H ISH [8086:43fc] in intel-ish-hid driver
    (LP: #1914543)
    - HID: intel-ish-hid: ipc: Add Tiger Lake H PCI device ID

  * Prevent thermal shutdown during boot process (LP: #1906168)
    - thermal/core: Emit a warning if the thermal zone is updated without ops
    - thermal/core: Add critical and hot ops
    - thermal/drivers/acpi: Use hot and critical ops
    - thermal/drivers/rcar: Remove notification usage
    - thermal: int340x: Fix unexpected shutdown at critical temperature
    - thermal: intel: pch: Fix unexpected shutdown at critical temperature

  * geneve overlay network on vlan interface broken with offload enabled
    (LP: #1914447)
    - net/mlx5e: Fix SWP offsets when vlan inserted by driver

  * Groovy update: upstream stable patchset 2021-02-11 (LP: #1915473)
    - net: cdc_ncm: correct overhead in delayed_ndp_size
    - net: hns3: fix the number of queues actually used by ARQ
    - net: hns3: fix a phy loopback fail issue
    - net: stmmac: dwmac-sun8i: Balance internal PHY resource references
    - net: stmmac: dwmac-sun8i: Balance internal PHY power
    - net: vlan: avoid leaks on register_vlan_dev() failures
    - net/sonic: Fix some resource leaks in error handling paths
    - net: ipv6: fib: flush exceptions when purging route
    - tools: selftests: add test for changing routes with PTMU exceptions
    - net: fix pmtu check in nopmtudisc mode
    - net: ip: always refragment ip defragmented packets
    - octeontx2-af: fix memory leak of lmac and lmac->name
    - nexthop: Fix off-by-one error in error path
    - nexthop: Unlink nexthop group entry in error path
    - s390/qeth: fix L2 header access in qeth_l3_osa_features_check()
    - net: dsa: lantiq_gswip: Exclude RMII from modes that report 1 GbE
    - net/mlx5: Use port_num 1 instead of 0 when delete a RoCE address
    - net/mlx5e: ethtool, Fix restriction of autoneg with 56G
    - chtls: Fix hardware tid leak
    - chtls: Remove invalid set_tcb call
    - chtls: Fix panic when route to peer not configured
    - chtls: Replace skb_dequeue with skb_peek
    - chtls: Added a check to avoid NULL pointer dereference
    - chtls: Fix chtls resources release sequence
    - HID: wacom: Fix memory leakage caused by kfifo_alloc
    - ARM: OMAP2+: omap_device: fix idling of devices during probe
    - i2c: sprd: use a specific timeout to avoid system hang up issue
    - dmaengine: dw-edma: Fix use after free in dw_edma_alloc_chunk()
    - can: tcan4x5x: fix bittiming const...

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Timo Aaltonen (tjaalton)
Changed in hwe-next:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.