ASPEED server console output extremely slow after upgrade to 18.04

Bug #1808183 reported by latimerio on 2018-12-12
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Bionic
Medium
Unassigned
Cosmic
Medium
Unassigned

Bug Description

===SRU Justification===
[Impact]
Server console is extremely slow on ast, after the upgrade from 16.04 to
18.04.

[Fix]
Remove firmware/bootloader configured framebuffer device before probing.

[Test]
User confirmed the patch fixes the issue.

[Regression Potential]
Low. The fix is in stable tree, limits to one specific device, and other
DRM drivers already do this for a while.

===Original Bug Report===
After the upgrade from 16.04 to 18.04 my server console is extremely slow.
It looks similar to the youtube clips mentioned in the problem described here: https://ubuntuforums.org/showthread.php?t=2399941
My server is attached to a KVM switch with VGA cable to a 1920x1080 LCD.
It runs on an ASUS P9D-X board.
Before the upgrade everything was fine.
Now it is so slow that it sometimes even hangs during boot and goes into an emergency console where I just can do a ctrl-alt-delete as described in bug 1802469

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: xorg (not installed)
ProcVersionSignature: Ubuntu 4.15.0-42.45-generic 4.15.18
Uname: Linux 4.15.0-42-generic x86_64
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
Date: Wed Dec 12 17:07:48 2018
SourcePackage: xorg
Symptom: display
UpgradeStatus: Upgraded to bionic on 2018-10-11 (62 days ago)
---
ProblemType: Bug
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 Dec 15 07:22 seq
 crw-rw---- 1 root audio 116, 33 Dec 15 07:22 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
ApportVersion: 2.20.9-0ubuntu7.5
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
DistroRelease: Ubuntu 18.04
HibernationDevice: RESUME=UUID=99dd75dc-dc37-4970-be5a-79d9244a3bcc
IwConfig:
 eth0 no wireless extensions.

 eth1 no wireless extensions.

 lo no wireless extensions.
MachineType: ASUSTeK COMPUTER INC. P9D-X Series
Package: linux (not installed)
PciMultimedia:

ProcFB: 0 astdrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.15.0-42-generic root=UUID=0fa1f208-3a7b-44ca-9f85-e7349a3b1844 ro nosplash
ProcVersionSignature: Ubuntu 4.15.0-42.45-generic 4.15.18
RelatedPackageVersions:
 linux-restricted-modules-4.15.0-42-generic N/A
 linux-backports-modules-4.15.0-42-generic N/A
 linux-firmware 1.173.2
RfKill: Error: [Errno 2] No such file or directory: 'rfkill': 'rfkill'
Tags: bionic
Uname: Linux 4.15.0-42-generic x86_64
UnreportableReason: This report is about a package that is not installed.
UpgradeStatus: Upgraded to bionic on 2018-10-11 (64 days ago)
UserGroups:

_MarkForUpload: False
dmi.bios.date: 10/13/2014
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0901
dmi.board.asset.tag: To be filled by O.E.M.
dmi.board.name: P9D-X Series
dmi.board.vendor: ASUSTeK COMPUTER INC.
dmi.board.version: Rev 1.xx
dmi.chassis.asset.tag: To Be Filled By O.E.M.
dmi.chassis.type: 17
dmi.chassis.vendor: To Be Filled By O.E.M.
dmi.chassis.version: To Be Filled By O.E.M.
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0901:bd10/13/2014:svnASUSTeKCOMPUTERINC.:pnP9D-XSeries:pvrRev1.xx:rvnASUSTeKCOMPUTERINC.:rnP9D-XSeries:rvrRev1.xx:cvnToBeFilledByO.E.M.:ct17:cvrToBeFilledByO.E.M.:
dmi.product.family: To be filled by O.E.M.
dmi.product.name: P9D-X Series
dmi.product.version: Rev 1.xx
dmi.sys.vendor: ASUSTeK COMPUTER INC.

latimerio (fomember) wrote :
Daniel van Vugt (vanvugt) wrote :

Please:

1. Clarify what is meant to be visible on your server console. Are you using Gnome Shell, just text mode, or something else?

2. Run 'lspci -k > lspcik.txt' on the machine and send us the 'lspcik.txt' file

3. Install package 'mesa-utils' and then run 'glxinfo > glxinfo.txt' on the machine and send us the file 'glxinfo.txt'.

4. Run 'dmesg > dmesg.txt' on the machine and send us the 'dmesg.txt' file.

5. Run 'journalctl > journal.txt' on the machine and send us the file 'journal.txt'.

affects: xorg (Ubuntu) → ubuntu
Changed in ubuntu:
status: New → Incomplete
latimerio (fomember) wrote :
  • X.7z Edit (52.1 KiB, application/x-7z-compressed)

Please find attached the desired files.
I am talking about text console only.
There is no X.org installed and thus no glxinfo, so this bug might be in the wrong category although it is display/graphics related.
The problem is the same as can be seen here: https://www.youtube.com/watch?v=4VEis5jtsJk.
I am obviously not the only one with the problem as the thread https://ubuntuforums.org/showthread.php?t=2399941 shows.
Bootup messages are as slow as typewriter consoles from the 1980
The dmesg part is still relatively fast but the Ubuntu boot up messages are really slow and take more than 100s.
They are in fact so slow that the server sometimes runs into emergency shell due to timeouts.
As said it only happened after upgrade to 18.04 whereas 16.04 was without problems for the last 2 years.

Daniel van Vugt (vanvugt) wrote :

I see now, thanks.

This appears to be a kernel regression relating to 'ast' performance:

04:00.0 VGA compatible controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 21)
        Subsystem: ASUSTeK Computer Inc. ASPEED Graphics Family
        Kernel driver in use: ast
        Kernel modules: ast

summary: - server console output extremely slow after upgrade to 18.04
+ ASPEED server console output extremely slow after upgrade to 18.04
affects: ubuntu → linux (Ubuntu)
tags: added: regression-release
Changed in linux (Ubuntu):
status: Incomplete → New

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1808183

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Daniel van Vugt (vanvugt) wrote :

Can you please also run 'lsmod' and send us the output?

latimerio (fomember) wrote :

My server doesn't have a graphical UI and the apport-collect did not work.
The output of lsmod is attached.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Daniel van Vugt (vanvugt) wrote :

You should be able to open the link required by apport-collect by copying and pasting it to another machine.

latimerio (fomember) wrote : CRDA.txt

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

latimerio (fomember) wrote :

I finally managed to work my way through the text based apport-collect and was able to send the data.

Kai-Heng Feng (kaihengfeng) wrote :

Seems like this commit fixes the issue:
commit 5478ad10e7850ce3d8b7056db05ddfa3c9ddad9a
Author: Thomas Zimmermann <email address hidden>
Date: Thu Nov 15 11:42:16 2018 +0100

    drm/ast: Remove existing framebuffers before loading driver

    If vesafb attaches to the AST device, it configures the framebuffer memory
    for uncached access by default. When ast.ko later tries to attach itself to
    the device, it wants to use write-combining on the framebuffer memory, but
    vesefb's existing configuration for uncached access takes precedence. This
    results in reduced performance.

    Removing the framebuffer's configuration before loding the AST driver fixes
    the problem. Other DRM drivers already contain equivalent code.

    Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1112963
    Signed-off-by: Thomas Zimmermann <email address hidden>
    Cc: <email address hidden>
    Tested-by: Y.C. Chen <email address hidden>
    Reviewed-by: Jean Delvare <email address hidden>
    Tested-by: Jean Delvare <email address hidden>
    Signed-off-by: Dave Airlie <email address hidden>

I built a kernel [1] with the commit, please test it.

[1] https://people.canonical.com/~khfeng/lp1808183/

latimerio (fomember) wrote :

I have installed the kernel and rebooted the server 3 times.
All reboots were as fast as the previous ubuntu 16.04.
So I think your fix has solved the problem.
Thanks a lot.

I did get a warning during the update of the initramfs though
   W: Possible missing firmware /lib/firmware/ast_dp501_fw.bin for module ast
But it may be that this has been already there before and I just didn't see it.

The whole issue leaves 2 open question to me:
1. Has there been a change in the VESA/AST mechanism from 16.04 to 18.04 ?
2. Or is there another explanation for the different behavior after the upgrade to 18.04?

Daniel van Vugt (vanvugt) wrote :

> Has there been a change in the VESA/AST mechanism from 16.04 to 18.04 ?

I think the answer is "yes" there must have been a kernel change that caused the regression. It's not yet clear where the regression occurred between 16.04 and 18.04 (there's at least two years work between them) but at least we seem to have a fix for it.

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
description: updated
Changed in linux (Ubuntu Bionic):
status: New → In Progress
Changed in linux (Ubuntu Cosmic):
status: New → In Progress
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Released
Changed in linux (Ubuntu Cosmic):
status: In Progress → Fix Released
Daniel van Vugt (vanvugt) wrote :

^^^
It's not documented but I think that means to say the bug is fixed in:

18.04: https://launchpad.net/ubuntu/+source/linux/4.15.0-43.46
18.10: https://launchpad.net/ubuntu/+source/linux/4.18.0-13.14

Jay (jayram1989) wrote :

May i know when can we expect the official release for this issue?

Daniel van Vugt (vanvugt) wrote :

The fix seems to have been released for both 18.04 and 18.10 already, a couple of weeks ago.

I assume the fix was already in 19.04...?

Daniel van Vugt (vanvugt) wrote :

Assuming comment #23 is correct...

Mat Nightingale (kellogg76) wrote :

I've just done a clean install of 18.04 and have the same issue where running 18.04 off a boot usb stick works flawlessly, however after installing on the hard drive the login screen is very laggy and once I enter my password the screen resolution is all messed up and unreadable.

If I reboot and hold down shift to get the Grub menu and choose Advanced then recovery mode and then without altering anything I choose resume from the recovery menu and the login screen works perfectly.

So I assume the fix hasn't been rolled out yet as I've run the updater and there's nothing that needs updating on my system.

Daniel van Vugt (vanvugt) wrote :

Mat,

This bug is about text console output only. Not the login screen or graphical sessions.

Please log your own new bug by running:

   ubuntu-bug mutter

Kai-Heng Feng (kaihengfeng) wrote :

Seems like 4.15.0-43.46 is a CVE respin so the patch is not included. Please wait for the next kernel release.

Stefan Bader (smb) on 2019-01-14
Changed in linux (Ubuntu Cosmic):
status: Fix Released → Fix Committed
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
importance: Undecided → Medium
status: Fix Released → Fix Committed
Jay (jayram1989) wrote :

Hello Team,

May I know when can we expect the official release for this patch?

I tried to apply Seems like 4.15.0-43.46 version, on the machine, still the issue persist.

Thanks
Jayaram

Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-cosmic' to 'verification-done-cosmic'. If the problem still exists, change the tag 'verification-needed-cosmic' to 'verification-failed-cosmic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-cosmic
tags: added: verification-needed-bionic
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Denis Dowling (ddowlingau) wrote :

Encountered the same issue today. Updated from 16.04 to 18.04 on a SuperMicro server with ASPEED AST 2400 VGA controller. The boot console scrolling was very slow after the update and X11 performance was unusable. Enabled the "proposed" apt feed and updated to 4.15.0-44-generic kernel. Can confirm that console performance is back to normal again.

latimerio (fomember) wrote :

In the last couple of days my system boots as slow again as before the fix.
I am currently on linux-image-unsigned-4.15.0-43-generic.
So there must have something changed along the way which brought back the bug.

latimerio (fomember) wrote :

The system boot is so slow that two times it got stuck in emergency mode saying that some configured disk drives were not found.
Strangely when I use a Ctrl-Alt-Del to reboot from emergency mode, the system boots fast.
I used UKTools to upgrade to kernel 4.15.18-041518-generic but the problem is still there.

Kai-Heng Feng (kaihengfeng) wrote :

The -44-generic kernel is in -proposed.

Launchpad Janitor (janitor) wrote :
Download full text (47.0 KiB)

This bug was fixed in the package linux - 4.15.0-44.47

---------------
linux (4.15.0-44.47) bionic; urgency=medium

  * linux: 4.15.0-44.47 -proposed tracker (LP: #1811419)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: pass in enum wbt_flags to get_rq_wait()
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: sdhci: Disable 1.8v modes (HS200/HS400/UHS) if controller can't support
      1.8v
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb: Use MMC_CAP2_NO_SDIO
    - mmc: rtsx_usb: Enable MMC_CAP_ERASE to allow erase/discard/trim requests
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: cannonlake: Fix HOSTSW_OWN register offset of H variant

  * Add Cavium ThunderX2 SoC UNCORE PMU driver (LP: #1811200)
    - perf: Export perf_event_update_userpage
    - Documentation: perf: Add documentation for ThunderX2 PMU uncore driver
    - drivers/perf: Add Cavium ThunderX2 SoC UNCORE PMU driver
    - [Config] New config CONFIG_THUNDERX2_PMU=m

  * Update hisilicon SoC-specific drivers (LP: #1810457)
    - SAUCE: Revert "net: hns3: Updates RX packet info fetch in case of multi BD"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: separate roce from nic when
      resetting"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: Use roce handle when calling roce
      callback function"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: Add calling roce callback
      function when link status change"
    - Revert "UBUNTU: SAUCE: {topost} net: hns3: optimize the process of notifying
      roce client"
    - Revert "UBUNTU: S...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
latimerio (fomember) wrote :

OK, I installed the 4.15.0-44-generic kernel and my system is back to normal.

Still I have 3 questions?
 1. What made the boot become slow again when there was the kernel fix from linux-image-unsigned-4.15.0-43-generic still in place?

 2. Why does the system boot fast when I do a Ctrl-Alt-Del in a stuck emergency shell?

 3. When will the fix find its way in the normal kernel release so that kernel updates will not break the system again?

Daniel van Vugt (vanvugt) wrote :

It appears the fix for 18.04 was released into 'updates' 15 hours ago. See comment #38 and:
https://launchpad.net/ubuntu/+source/linux/4.15.0-44.47

tags: added: verification-done-bionic
removed: verification-needed-bionic
Daniel van Vugt (vanvugt) wrote :

Have we established the bug was never in 19.04?

Kai-Heng Feng (kaihengfeng) wrote :

Disco kernel receives this fix via stable update.

Launchpad Janitor (janitor) wrote :
Download full text (56.3 KiB)

This bug was fixed in the package linux - 4.18.0-14.15

---------------
linux (4.18.0-14.15) cosmic; urgency=medium

  * linux: 4.18.0-14.15 -proposed tracker (LP: #1811406)

  * CPU hard lockup with rigorous writes to NVMe drive (LP: #1810998)
    - blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait
    - blk-wbt: move disable check into get_limit()
    - blk-wbt: use wq_has_sleeper() for wq active check
    - blk-wbt: fix has-sleeper queueing check
    - blk-wbt: abstract out end IO completion handler
    - blk-wbt: improve waking of tasks

  * To reduce the Realtek USB cardreader power consumption (LP: #1811337)
    - mmc: core: Introduce MMC_CAP_SYNC_RUNTIME_PM
    - mmc: rtsx_usb_sdmmc: Don't runtime resume the device while changing led
    - mmc: rtsx_usb_sdmmc: Re-work runtime PM support
    - mmc: rtsx_usb_sdmmc: Re-work card detection/removal support
    - memstick: rtsx_usb_ms: Add missing pm_runtime_disable() in probe function
    - misc: rtsx_usb: Use USB remote wakeup signaling for card insertion detection
    - memstick: Prevent memstick host from getting runtime suspended during card
      detection
    - memstick: rtsx_usb_ms: Use ms_dev() helper
    - memstick: rtsx_usb_ms: Support runtime power management

  * Support non-strict iommu mode on arm64 (LP: #1806488)
    - iommu/io-pgtable-arm: Fix race handling in split_blk_unmap()
    - iommu/arm-smmu-v3: Implement flush_iotlb_all hook
    - iommu/dma: Add support for non-strict mode
    - iommu: Add "iommu.strict" command line option
    - iommu/io-pgtable-arm: Add support for non-strict mode
    - iommu/arm-smmu-v3: Add support for non-strict mode
    - iommu/io-pgtable-arm-v7s: Add support for non-strict mode
    - iommu/arm-smmu: Support non-strict mode

  * [Regression] crashkernel fails on HiSilicon D05 (LP: #1806766)
    - efi: honour memory reservations passed via a linux specific config table
    - efi/arm: libstub: add a root memreserve config table
    - efi: add API to reserve memory persistently across kexec reboot
    - irqchip/gic-v3-its: Change initialization ordering for LPIs
    - irqchip/gic-v3-its: Simplify LPI_PENDBASE_SZ usage
    - irqchip/gic-v3-its: Split property table clearing from allocation
    - irqchip/gic-v3-its: Move pending table allocation to init time
    - irqchip/gic-v3-its: Keep track of property table's PA and VA
    - irqchip/gic-v3-its: Allow use of pre-programmed LPI tables
    - irqchip/gic-v3-its: Use pre-programmed redistributor tables with kdump
      kernels
    - irqchip/gic-v3-its: Check that all RDs have the same property table
    - irqchip/gic-v3-its: Register LPI tables with EFI config table
    - irqchip/gic-v3-its: Allow use of LPI tables in reserved memory
    - arm64: memblock: don't permit memblock resizing until linear mapping is up
    - efi/arm: Defer persistent reservations until after paging_init()
    - efi: Permit calling efi_mem_reserve_persistent() from atomic context
    - efi: Prevent GICv3 WARN() by mapping the memreserve table before first use

  * ELAN900C:00 04F3:2844 touchscreen doesn't work (LP: #1811335)
    - pinctrl: cannonlake: Fix community ordering for H variant
    - pinctrl: c...

Changed in linux (Ubuntu Cosmic):
status: Fix Committed → Fix Released
latimerio (fomember) wrote :

I cannot confirm that 4.15.18-041518-generic works stable.
It did work for a while though..
Since a few days my boot is slow again and yesterday and today the system again got stuck in emergency mode because of too slow boot.
Thus I think there is more to it than just the kernel.
The only explanation I do have is that there is some dependency between the kernel and other packages which get updated when I run apt dist-upgrade.
I am also puzzled that my system boots very fast when I do a CTRL-ALT-DELETE when it is stuck in emergency mode.
So there is a difference between a warm start and a cold boot.
I suppose that there is some hardware initialization which is done too late during cold boot but makes the following warm start fast.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.