Very slow boot, wake from sleep and shutdown with 5.4 kernel

Bug #1874104 reported by Michael Fox
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I upgraded a Ubuntu 18.04 installation to 20.04 beta on my late 2015 5k 27" iMac. The new 5.4 kernel was installed and has been updated a few times. The former LTS kernel (4.15) was retained. When I start up the computer with the new kernel, the process is very slow and it takes about 2 1/2 minutes to get to a working desktop. Doing the same startup with the old 4.15 kernel (in the 20.04 beta) takes approximately 25 seconds to a working desktop. Similarly, changing display settings, waking from sleep and shutting down the computer all take several minutes with the 5.4 kernel, vs 10-30 seconds with the 4.15 kernel.

To provide more information, I tried starting up this computer with the 5.3 kernel from a 19.10 live USB, and from a 5.0 kernel in the Peppermint Linux live USB that is based on Ubuntu. The 5.3 kernel is as slow as the 5.4 kernel, whereas the 5.0 kernel works similarly to the 4.15 LTS kernel.

I suspect that the problem relates to the Radeon m9 390x video card in my iMac, but I have no idea what might have changed in the newer kernels to affect their compatibility with this iMac.

ProblemType: Bug
DistroRelease: Ubuntu 20.04
Package: linux-image-5.4.0-25-generic 5.4.0-25.29
ProcVersionSignature: Ubuntu 5.4.0-25.29-generic 5.4.30
Uname: Linux 5.4.0-25-generic x86_64
ApportVersion: 2.20.11-0ubuntu27
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC2: mfox 2497 F.... pulseaudio
 /dev/snd/controlC0: mfox 2497 F.... pulseaudio
 /dev/snd/controlC1: mfox 2497 F.... pulseaudio
CasperMD5CheckResult: skip
CurrentDesktop: ubuntu:GNOME
Date: Tue Apr 21 10:55:18 2020
HibernationDevice: # RESUME=UUID=5e4ca488-e781-4ee7-aeb3-8adc968cd22a
InstallationDate: Installed on 2018-05-09 (712 days ago)
InstallationMedia: Ubuntu 18.04 LTS "Bionic Beaver" - Release amd64 (20180426)
MachineType: Apple Inc. iMac17,1
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_CA.UTF-8
 SHELL=/bin/bash
ProcFB: 0 amdgpudrmfb
ProcKernelCmdLine: \boot\vmlinuz-5.4.0-25-generic ro root=UUID=0559983a-7175-4ab1-af46-7a117ee78ead initrd=boot\initrd.img-5.4.0-25-generic
RelatedPackageVersions:
 linux-restricted-modules-5.4.0-25-generic N/A
 linux-backports-modules-5.4.0-25-generic N/A
 linux-firmware 1.187
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 01/31/2020
dmi.bios.vendor: Apple Inc.
dmi.bios.version: 176.0.0.0.0
dmi.board.name: Mac-B809C3757DA9BB8D
dmi.board.vendor: Apple Inc.
dmi.board.version: iMac17,1
dmi.chassis.type: 9
dmi.chassis.vendor: Apple Inc.
dmi.chassis.version: Mac-B809C3757DA9BB8D
dmi.modalias: dmi:bvnAppleInc.:bvr176.0.0.0.0:bd01/31/2020:svnAppleInc.:pniMac17,1:pvr1.0:rvnAppleInc.:rnMac-B809C3757DA9BB8D:rvriMac17,1:cvnAppleInc.:ct9:cvrMac-B809C3757DA9BB8D:
dmi.product.family: iMac17,1
dmi.product.name: iMac17,1
dmi.product.version: 1.0
dmi.sys.vendor: Apple Inc.

Revision history for this message
Michael Fox (mfox-trentu) wrote :
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Jouni Mettala (jouni-mettala) wrote :

dmesg shows watchdog: BUG: soft lockup - CPU#0 stuck for 22s! and useful looking trace.

Could you test newest upstream kernel (5.7-rc2) from
https://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D

Install instructions are available at https://wiki.ubuntu.com/Kernel/MainlineBuilds

Revision history for this message
Michael Fox (mfox-trentu) wrote :

Thanks for the quick reply. I installed and tested 5.7-rc2 as per instructions. The new kernel booted up to the login screen, but as slowly as 5.4.0-25. (Over 2 minutes to the login screen.) When I logged in with my password, all that happened was that I was returned to the login screen. I tried logging in three times with the same result. I don't know if this is related, but there were only four 5.7-rc2 modules that applied to my non-low latency install. Normally there is also a signed module, but there was not one present in the mainline ppa for this kernel.

Would you like an additional bug report for this boot? If so, how do I do this and get it included in the existing bug report?

tags: added: kernel-bug-exists-upstream
Revision history for this message
Jouni Mettala (jouni-mettala) wrote :

Could you attach dmesg from upstream kernel (5.7-rc2) to this bug?
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1874104/+addcomment
Previous boot is (Computer) /var/log/dmesg.0 It shows version number.

Login bug is likely separate issue. It could be reported as new.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

I hope this is what you were looking for. In your message, you noted that the login bug is likely a separate issue and it could be reported as new. I thought that this is what I already did. If not, should I report it again? And with what attachments? Thanks.

Revision history for this message
Jouni Mettala (jouni-mettala) wrote :

Virtualbox taints kernel. Could you reproduce this bug with untainted kernel? Either to uninstall virtualbox or blacklist vboxdrv kernel module. I guess live USB doesn't use virtualbox but am not sure.

So far last known good version is 5.0. Known bad ones are 5.3 5.4 and 5.7-rc2.

We like to find either good one or untainted bad with a trace.

Is v5.1 good one?
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.1/
Does it login? Is boot fast? Do you get trace if it's bad?

Next step depends on results. For example if v5.1 is good you can try v5.2
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.2/

If you can't login with untainted 5.7-rc2 you can report it to bugzilla.kernel.org with dmesg and lspci attached. They also like to know good version that was able to login.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

I uninstalled Virtualbox and tested kernels 5.1 and 5.2. Both showed fast boots similar to 4.15. I then tested 5.3 by downloading it from the ppa/mailine. This is where the slow boot started. I have attached two dmesg's and a systemd analysis of the critical chain and blame for kernel 5.3. I can't seem to send more than one file at a time, so I will send 3 separate messages.

You can contrast this against the equivalent for kernel 5.2, which I will reboot with and send results in the next set of messages.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

Next, a dmesg for that boot. Not sure which it is so I'll send two.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

Next, dmesg.0

Revision history for this message
Michael Fox (mfox-trentu) wrote :

This time with the last fast kernel, 5.2. The bootup was slightly slower than with 4.15, but I think this might relate to snap module loading. I should also point out that the difference between kernels 4.15, 5.1 and 5.2 vs the kernels after 5.3 is more than what is indicated by the SystemD analysis. With the newer kernels there is an additional delay of about a minute from the point where I log in until I get a working desktop.

Attached is the SystemD analysis for kernel 5.2

Revision history for this message
Michael Fox (mfox-trentu) wrote :

Next is the first of two dmesg logs for the 5.2 kernel.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

Finally, a second dmesg for the 5.2 kernel boot.(I'm not sure whether dmesg or dmesg.0 is the right one.

Revision history for this message
Jouni Mettala (jouni-mettala) wrote :

Good job. Now v5.2 is known good one. v5.3 is known bad. There are also
other kernels tested including latest upstream one.

It is possible to bisect bug to a single commit
https://wiki.ubuntu.com/Kernel/KernelBisection
Bisecting is good way to find developer to look at your bug.

Before that you can test 5.3-rc1 Most new code is introduced in rc1.
Otherwise middle point is good to test.
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.3-rc1/

You can see Linux version of your /var/log/dmesg by first line of the
file. For example less /var/log/dmesg

Revision history for this message
Michael Fox (mfox-trentu) wrote :

I did the bisection and found that the bad kernels started at 5.3-rc1. The last 5.2 kernel, 5.2.21, was good. I have attached the SystemD analysis here of the boots of both kernels. You can see from it that the problem in rc1 is related to the gpu-manager service.

I will attach the dmesg log of the rc1 boot in the next message.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

Here is the dmesg log for the bad boot of 5.3-rc1.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

I should add that I tested kernels 5.2.15, 5.2.18 and 5.2.20 as well. All were OK, so the good-bad line is clearly after 5.2.21.

Revision history for this message
Jouni Mettala (jouni-mettala) wrote :
Download full text (3.8 KiB)

This is trace from 5.3-rc1. I am not sure if it's right one.

[ 123.722504] kernel: ------------[ cut here ]------------
[ 123.722541] kernel: WARNING: CPU: 1 PID: 153 at net/wireless/core.c:868 wiphy_register.cold+0xe4/0x2e5 [cfg80211]
[ 123.722541] kernel: Modules linked in: bnep intel_rapl_msr mei_hdcp nls_iso8859_1 intel_rapl_common x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass snd_hda_codec_cirrus snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi crct10dif_pclmul snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm ghash_clmulni_intel snd_seq_midi snd_seq_midi_event amdgpu(+) applesmc input_polldev snd_rawmidi snd_seq amd_iommu_v2 aesni_intel snd_seq_device snd_timer gpu_sched aes_x86_64 crypto_simd uvcvideo videobuf2_vmalloc videobuf2_memops cryptd btusb ttm glue_helper videobuf2_v4l2 intel_cstate btrtl intel_rapl_perf videobuf2_common btbcm btintel brcmfmac brcmutil bluetooth snd ecdh_generic input_leds videodev mc ecc cfg80211 drm_kms_helper i2c_algo_bit fb_sys_fops syscopyarea sysfillrect soundcore sysimgblt mei_me mei acpi_als kfifo_buf industrialio apple_bl mac_hid sch_fq_codel iptable_filter ip6table_filter ip6_tables br_netfilter bridge stp llc arp_tables parport_pc ppdev lp parport
[ 123.722559] kernel: drm ip_tables x_tables autofs4 hid_apple hid_generic usbhid hid sdhci_pci crc32_pclmul cqhci i2c_i801 thunderbolt sdhci tg3 video ahci libahci
[ 123.722564] kernel: CPU: 1 PID: 153 Comm: kworker/1:1 Not tainted 5.3.0-050300rc1-generic #201907212232
[ 123.722565] kernel: Hardware name: Apple Inc. iMac17,1/Mac-B809C3757DA9BB8D, BIOS 176.0.0.0.0 01/31/2020
[ 123.722568] kernel: Workqueue: events request_firmware_work_func
[ 123.722580] kernel: RIP: 0010:wiphy_register.cold+0xe4/0x2e5 [cfg80211]
[ 123.722581] kernel: Code: ff e9 7a 58 fa ff 48 c7 c7 48 df 79 c0 e8 0e e3 56 f4 0f 0b b8 ea ff ff ff e9 62 58 fa ff 48 c7 c7 48 df 79 c0 e8 f6 e2 56 f4 <0f> 0b b8 ea ff ff ff e9 4a 58 fa ff 48 c7 c7 48 df 79 c0 e8 de e2
[ 123.722582] kernel: RSP: 0000:ffffb82140233c40 EFLAGS: 00010246
[ 123.722582] kernel: RAX: 0000000000000024 RBX: 000000000000070e RCX: 0000000000000000
[ 123.722583] kernel: RDX: 0000000000000000 RSI: ffff9d249ea97448 RDI: ffff9d249ea97448
[ 123.722583] kernel: RBP: ffffb82140233cc0 R08: ffff9d249ea97448 R09: 0000000000000004
[ 123.722584] kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff9d2476cdc300
[ 123.722584] kernel: R13: ffff9d2476cdc300 R14: ffff9d247eab18c0 R15: ffff9d247eab0000
[ 123.722585] kernel: FS: 0000000000000000(0000) GS:ffff9d249ea80000(0000) knlGS:0000000000000000
[ 123.722585] kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 123.722586] kernel: CR2: 0000556254966c28 CR3: 000000043b84e004 CR4: 00000000003606e0
[ 123.722586] kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 123.722587] kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 123.722587] kernel: Call Trace:
[ 123.722595] kernel: ? handle_channel_custom.isra.0+0x8d/0x160 [cfg80211]
[ 123.722603] kernel: ? wiphy_apply_custom_regulatory+0x73/0xc0 [cfg80211]
[ 123.722609] kernel: brcmf_cfg80211_attach+0x1d8/0x630...

Read more...

Revision history for this message
Jouni Mettala (jouni-mettala) wrote :

This bug is introduced between v5.2 and v5.3-rc1. Further bisecting
means compiling kernel. Note that stable versions like 5.2.21 are
generally not bisected.

Could you report login bug as new one? Just like you reported this one. Perhaps via ubuntu-bug linux. It could be reported with working kernel. Then you can add dmesg later. It would help to know if it is possible to login using v5.6
https://kernel.ubuntu.com/~kernel-ppa/mainline/v5.6/

Revision history for this message
Michael Fox (mfox-trentu) wrote :

It is possible to login with kernel 5.6. The bootup is slow, but it does boot and gets past the login screen. I have attached the dmesg from that boot, and will send the systemd analysis in the next message.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

Attached is the SystemD analysis for the 5.6 boot. It looks pretty similar to that of all the kernels I have tried that are 5.3 or newer.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

I will report the login bug as a new one. As far as bisection past identifying the line of progression to identify the last kernel that works and the first one that doesn't, I looked at the further steps one would have to take and I don't think that my technical capability is up to it. I have never compiled a kernel before, and those instructions suggest that it's more complicated than just compilation.

Is there anything further I can do that is less technical to help diagnose and fix this bug in the 20.04 LTS kernel?

Revision history for this message
Michael Fox (mfox-trentu) wrote :

I forgot to send you the proper dmesg for the boot of kernel 5.3rc-1. It is attached. This kernel was not only slow, but it didn't pick up my wireless card. This was the only kernel I tried that did this.

Revision history for this message
Jouni Mettala (jouni-mettala) wrote :

So far these lines look most useful. I am not sure if they are enough for developer.

kernel: amdgpu: [powerplay]
                        failed to send message 145 ret is 0

There is usually about a week between upstream rc kernels. Between stable and next -rc1 there is two weeks. If you can test some of these it would be great. Especially 5.8-rc1 when it is ready.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

I made a separate report of the 5.7-rc2 kernel bug to ubuntu-bug linux as you requested. I will also try to monitor the upstream rc kernels and run tests on them as you have requested.

I agree that the slow boot bug is likely connected to the AMD video card in my iMac, and thus to the amdgpu driver. I say that because I also have a 2011 iMac with a much older AMD Radeon video card (6770M), and it has no trouble with the 5.4 kernel in Ubuntu 20.04, nor with any of the kernels that came with 18.04, 18.10, 19.04 or 19.10. If the amdgpu powerplay failing to send message is the problem, is there an easy fix for this?

I did look at the changelog for the 5.3-rc1 kernel (the first with the slow boot). There were a lot of changes related to amd and amdgpu by at least 25 people, but I have no idea how to figure out which, if any, of these changes is causing the slow boot.

Revision history for this message
Michael Fox (mfox-trentu) wrote :

5.7-rc3 boots on my iMac. It gets past the login screen (unlike rc2), but it still has the slow boot problem.

To post a comment you must log in.