Comment 33 for bug 1910562

Revision history for this message
munbi (gabriele) wrote :

Hi Sergio. First of all, thank you (and anyone else involved) for your time investigating in the issue.

> Could you please try to run the command several times, specifying the different chip

I did not find how to specify a certain chip with the 'sensors' command after reading the man page.
But I'm sure the problem happens when sensors tries to read the temperature from the AMD discrete graphics card, because when the console output of sensors reaches this section during the scan:

amdgpu-pci-0100
Adapter: PCI adapter
[ --> HERE IT HANGS FOR 2 SECS <-- ]
vddgfx: 1.05 V
edge: +42.0°C (crit = +94.0°C, hyst = -273.1°C)
power1: 7.11 W (cap = 35.00 W)

it hangs for a couple of seconds and two things happens:
1. the fans start spinning
2. I see an drm related warning appearing in the dmesg logs:

[ 41.397707] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 42.609649] [drm:dce110_edp_wait_for_hpd_ready [amdgpu]] *ERROR* dce110_edp_wait_for_hpd_ready: wait timed out!
[ 43.213733] [drm:dce110_edp_wait_for_hpd_ready [amdgpu]] *ERROR* dce110_edp_wait_for_hpd_ready: wait timed out!
[ 43.242390] [drm] UVD and UVD ENC initialized successfully.
[ 43.352370] [drm] VCE initialized successfully.
[ 43.358578] amdgpu 0000:01:00.0: [drm] Cannot find any crtc or sizes

Could it be that scanning the amdgpu-pci-0100 causes the AMD card to reset and this is related to the fans starting ?

> you were able to reproduce this problem not only in Ubuntu, but also in other distributions. Is that correct ?

I was able to reproduce on Ubuntu 20.04 with all 5.8 kernels available and with a 5.10.0-14 kernel installed from Hirsute beta repositories.

However today I made some tests again using my stable 20.04 and also the just released final Hirsute 21.04 from a live USB. Below the results of running the 'sensors' command in terminal, ordered by Ubuntu version and kernel:

1. 20.04, 5.4.0-59-generic #65-Ubuntu SMP
- everything ok
- no fans spinning
- no dmesg warinings
- no sensors hanging during scan:
amdgpu-pci-0100
Adapter: PCI adapter
vddgfx: N/A
edge: N/A (crit = +94.0°C, hyst = -273.1°C)
power1: N/A (cap = 35.00 W)

2. 20.04, 5.8.0-50-generic #56~20.04.1-Ubuntu SMP
- problem present
- fans spinning
- sensors hangs during output when reaching amd section:
amdgpu-pci-0100
Adapter: PCI adapter
[ 2 SECS ]
vddgfx: 1.05 V
edge: +38.0°C (crit = +94.0°C, hyst = -273.1°C)
power1: 7.12 W (cap = 35.00 W)
- dmesg warnings appearing when fans start:
[ 41.397707] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 42.609649] [drm:dce110_edp_wait_for_hpd_ready [amdgpu]] *ERROR* dce110_edp_wait_for_hpd_ready: wait timed out!
[ 43.213733] [drm:dce110_edp_wait_for_hpd_ready [amdgpu]] *ERROR* dce110_edp_wait_for_hpd_ready: wait timed out!
[ 43.242390] [drm] UVD and UVD ENC initialized successfully.
[ 43.352370] [drm] VCE initialized successfully.
[ 43.358578] amdgpu 0000:01:00.0: [drm] Cannot find any crtc or sizes

3. USB/Live 21.04, 5.11.0-16-generic #17-Ubuntu SMP
- problem present
- fans spinning
- sensors hangs during output when reaching amd section:
amdgpu-pci-0100
Adapter: PCI adapter
[ 2 SECS ]
vddgfx: 1.05 V
edge: +38.0°C (crit = +94.0°C, hyst = -273.1°C)
power1: 7.12 W (cap = 35.00 W)
- dmesg warnings appearing when fans start (slightly different from 5.8.0-50):
[ 301.476682] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[ 301.681312] [drm] UVD and UVD ENC initialized successfully.
[ 301.791290] [drm] VCE initialized successfully.
[ 301.797151] amdgpu 0000:01:00.0: [drm] Cannot find any crtc or sizes
[ 302.099911] drm_dp_i2c_do_msg: 14 callbacks suppressed
[ 302.196470] [drm:lspcon_init [i915]] *ERROR* Failed to probe lspcon
[ 302.196534] [drm:lspcon_resume [i915]] *ERROR* LSPCON init failed on port D
[ 303.115752] [drm:lspcon_init [i915]] *ERROR* Failed to probe lspcon
[ 303.115829] [drm:lspcon_resume [i915]] *ERROR* LSPCON init failed on port D

So the only kernel without this problem is the old 5.4, but I suspect that with that kernel the sensor real measurement is not happening because there are no voltage/temps readings but only N/A (maybe it was not yet implemented and this is only a stub/dummy output)

> If it is, do you think you could test a pristine Linux kernel (from https://kernel.org)?

I'll try as soon as possible and report back the results.

In the meantime, thanks again and please feel free to ask any more questions/tests.

Regards, Gabriele.