Cannot probe sata disk on sata controller behind VMD: ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)

Bug #1894778 reported by You-Sheng Yang on 2020-09-08
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
HWE Next
Undecided
Unassigned
linux (Ubuntu)
High
You-Sheng Yang
Focal
High
You-Sheng Yang
Groovy
High
You-Sheng Yang
linux-oem-5.10 (Ubuntu)
Undecided
Unassigned
Focal
High
You-Sheng Yang
Groovy
Undecided
Unassigned
linux-oem-5.6 (Ubuntu)
Undecided
Unassigned
Focal
High
You-Sheng Yang
Groovy
Undecided
Unassigned

Bug Description

[SRU Justification]

[Impact]

When booting with a certain platforms with boot disk attached to SATA
bus behind Intel VMD controller, disk probing may fail with following
error messages left in dmesg:

  [ 6.163286] ata1.00: qc timeout (cmd 0xec)
  [ 6.165630] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)

[Fix]

Upstream commit f6b7bb847ca8 ("PCI: vmd: Offset Client VMD MSI-X
vectors") currently in vanilla kernel tree for v5.11.

[Test Case]

Check dmesg/lsblk for disk probe.

For pci MSI address, check lspci output:

  $ lspci -vvnn
  ....
      Capabilities: [80] MSI: Enable+ Count=1/1 maskable- 64bit-
          Address: fee00000 Data: 0000

When it fails, the address is fee00000. And with a patched kernel:

  $ lspci -vvnn
  ....
      Capabilities: [80] MSI: Enable+ Count=1/1 maskable- 64bit-
          Address: fee01000 Data: 0000

[Where problems could occur]

An unpatched kernel will not be able to probe SATA controllers moved
behind VMD when VMD/RAID mode is enabled in BIOS, leaving disks
attached on it completely unusable. With this change, kernel would
then be able to probe them but may also suffer from issues that only
occur under such configuration. However, the worst case is to move away
sata disks from VMD bus as they are currently without this fix, so the
risk here should be justified.

========== Previous SRU ==========

[SRU Justification]

[Impact]

When booting with a certain platforms with boot disk attached to SATA
bus behind Intel VMD controller, disk probing may fail with following
error messages left in dmesg:

  [ 6.163286] ata1.00: qc timeout (cmd 0xec)
  [ 6.165630] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)

[Fix]

Proposed kernel patch https://patchwork.kernel.org/patch/11758345/

[Test Case]

Check dmesg/lsblk for disk probe.

For pci MSI address, check lspci output:

  $ lspci -vvnn
  ....
      Capabilities: [80] MSI: Enable+ Count=1/1 maskable- 64bit-
          Address: fee00000 Data: 0000

When it fails, the address is fee00000. And with a patched kernel:

  $ lspci -vvnn
  ....
      Capabilities: [80] MSI: Enable+ Count=1/1 maskable- 64bit-
          Address: fee01000 Data: 0000

[Regression Potential]
Low. For previous NVMe based platforms, this patch brings no effective
change for NVMe devices because they will still stay in fast-interrupt
list.

========== Original Bug Description ==========

When booting with root filesystem on sata disks under Intel VMD mode, following errors printed in dmesg and no disk is found, nor booting into it:

  [ 6.163286] ata1.00: qc timeout (cmd 0xec)
  [ 6.165630] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
  [ 6.483649] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
  [ 16.659284] ata1.00: qc timeout (cmd 0xec)
  [ 16.661717] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
  [ 16.663161] ata1: limiting SATA link speed to 1.5 Gbps
  [ 16.983890] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
  [ 48.147294] ata1.00: qc timeout (cmd 0xec)
  [ 48.149737] ata1.00: failed to IDENTIFY (I/O error, err_mask=0x4)
  [ 48.467889] ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310)

  $ lspci
  ...
  10000:e0:17.0 SATA controller: Intel Corporation Device a0d3 (rev 20)

CVE References

You-Sheng Yang (vicamo) on 2020-09-08
tags: added: oem-priority originate-from-1891445 somerville
tags: added: originate-from-1892806
Changed in linux-oem-5.6 (Ubuntu Groovy):
status: New → Confirmed
Changed in linux-oem-5.6 (Ubuntu Focal):
status: New → Confirmed
Changed in linux-oem-5.6 (Ubuntu Groovy):
status: Confirmed → Invalid
Changed in linux (Ubuntu Groovy):
status: New → Confirmed
Changed in linux (Ubuntu Focal):
status: New → Confirmed
importance: Undecided → High
Changed in linux (Ubuntu Groovy):
importance: Undecided → High
Changed in linux-oem-5.6 (Ubuntu Focal):
importance: Undecided → High
You-Sheng Yang (vicamo) wrote :
Changed in linux-oem-5.6 (Ubuntu Focal):
status: Confirmed → In Progress
assignee: nobody → You-Sheng Yang (vicamo)
Changed in linux (Ubuntu Groovy):
status: Confirmed → In Progress
Changed in linux (Ubuntu Focal):
status: Confirmed → In Progress
assignee: nobody → You-Sheng Yang (vicamo)
Changed in linux (Ubuntu Groovy):
assignee: nobody → You-Sheng Yang (vicamo)
You-Sheng Yang (vicamo) wrote :
description: updated
You-Sheng Yang (vicamo) wrote :

Also nominated for Focal so that Focal generic kernel may recognize such platforms during installation.

Timo Aaltonen (tjaalton) on 2020-09-08
Changed in linux-oem-5.6 (Ubuntu Focal):
status: In Progress → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
You-Sheng Yang (vicamo) wrote :

Verified linux-oem-5.6 version 5.6.0-1028.28 from focal-proposed.

tags: added: verification-done-focal
removed: verification-needed-focal
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.6 - 5.6.0-1028.28

---------------
linux-oem-5.6 (5.6.0-1028.28) focal; urgency=medium

  * focal/linux-oem-5.6: 5.6.0-1028.28 -proposed tracker (LP: #1894630)

  * Cannot probe sata disk on sata controller behind VMD: ata1.00: failed to
    IDENTIFY (I/O error, err_mask=0x4) (LP: #1894778)
    - SAUCE: PCI: vmd: Add AHCI to fast interrupt list

  * SRU: Fix system hang when stress S3 on radeon with TTM (LP: #1893609)
    - mei: bus: don't clean driver pointer

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts
    - update dkms package versions

  * Introduce the new NVIDIA 450-server and the 450 UDA series (LP: #1887674)
    - [packaging] add signed modules for the 450 nvidia driver

  * CVE-2020-12888
    - vfio/type1: Support faulting PFNMAP vmas
    - vfio-pci: Fault mmaps to enable vma tracking
    - vfio-pci: Invalidate mmaps and block MMIO access on disabled memory

  * Missing id 8086:a0bc for VMD quirk PCI_DEV_FLAGS_ENABLE_ASPM (LP: #1893194)
    - SAUCE: PCI/ASPM: VMD: add ASPM quirk for 8086:a0bc

  * The DP/HDMI audio via USB-C to DP dongle or Dell Zeus adapter can't work
    after suspend (LP: #1893290)
    - ALSA: hda/hdmi: always check pin power status in i915 pin fixup

  * Comet Lake PCH-H RAID not support on Ubuntu20.04 (LP: #1892288)
    - ahci: Add Intel Comet Lake PCH-H PCI ID

  * device doesn't boot with kernel older than v5.7.7 on a usb key: hang at
    efi_tpm_eventlog_init (LP: #1892827)
    - efi/tpm: Verify event log header before parsing

 -- Timo Aaltonen <email address hidden> Tue, 08 Sep 2020 11:40:14 +0300

Changed in linux-oem-5.6 (Ubuntu Focal):
status: Fix Committed → Fix Released
You-Sheng Yang (vicamo) wrote :

This patch was not accepted by upstream per https://www.spinics.net/lists/linux-pci/msg98817.html . Need to check if there is another fix.

You-Sheng Yang (vicamo) wrote :

As 5.8.0-34-generic, 5.10.0-9-generic are still affected, and so is 5.10-1008-oem.

Changed in linux-oem-5.10 (Ubuntu Focal):
status: New → Confirmed
Changed in linux-oem-5.10 (Ubuntu Groovy):
status: New → Invalid
Changed in linux-oem-5.10 (Ubuntu):
status: New → Invalid
Rex Tsai (chihchun) wrote :

It seems the V3 patch[1] is merged[2]. Is that patch enough to address the problem we have?

[1] [v3] PCI: vmd: Offset Client VMD MSI-X vectors - Patchwork - https://patchwork.kernel.org<email address hidden>/
[2] PCI: vmd: Offset Client VMD MSI-X vectors · torvalds/linux@f6b7bb8 - https://github.com/torvalds/linux/commit/f6b7bb847ca821a8aaa1b6da10ee65311e6f15bf

You-Sheng Yang (vicamo) wrote :

@Rex, yes, and I've updated my PPA with kernels rebuilt with that fix cherry-picked. Tested oem-5.10 and Groovy 5.8 so far. Looks promising and will proceed to SRU.

Changed in linux-oem-5.10 (Ubuntu Focal):
assignee: nobody → You-Sheng Yang (vicamo)
importance: Undecided → High
status: Confirmed → In Progress
You-Sheng Yang (vicamo) wrote :

Mark 5.4-generic as WONTFIX for it's to be transited to v5.8-generic.

Changed in linux (Ubuntu Focal):
status: In Progress → Won't Fix
Timo Aaltonen (tjaalton) on 2021-01-11
Changed in linux-oem-5.10 (Ubuntu Focal):
status: In Progress → Fix Committed
Timo Aaltonen (tjaalton) on 2021-01-12
Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux-oem-5.10 - 5.10.0-1011.12

---------------
linux-oem-5.10 (5.10.0-1011.12) focal; urgency=medium

  * focal/linux-oem-5.10: 5.10.0-1011.12 -proposed tracker (LP: #1913030)

  * Support CML-S CPU + TGP PCH (LP: #1909457)
    - Revert "UBUNTU: SAUCE: drm/i915/rkl: new rkl ddc map for different PCH"
    - drm/i915/dg1: gmbus pin mapping
    - drm/i915/dg1: Don't program PHY_MISC for PHY-C and PHY-D
    - drm/i915/dg1: add hpd interrupt handling
    - drm/i915/display/ehl: Limit eDP to HBR2
    - drm/i915/jsl: Split EHL/JSL platform info and PCI ids
    - drm/i915: Add PORT_TCn aliases to enum port
    - drm/i915: s/PORT_TC/TC_PORT_/
    - drm/i915/rkl: new rkl ddc map for different PCH
    - SAUCE: drm/i915/gen9_bc : Add TGP PCH support

  * backlight parsing for VBT 234+ (LP: #1912157)
    - drm/i915/vbt: Fix backlight parsing for VBT 234+
    - drm/i915/vbt: Update the version and expected size of
      BDB_GENERAL_DEFINITIONS map
    - drm/i915/vbt: Add VRR VBT toggle

  * HD Audio Device PCI ID for the Intel Cometlake-R platform (LP: #1912427)
    - SAUCE: ALSA: hda: Add Cometlake-R PCI ID

 -- Timo Aaltonen <email address hidden> Mon, 25 Jan 2021 11:43:18 +0200

Changed in linux-oem-5.10 (Ubuntu Focal):
status: Fix Committed → Fix Released

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
fabianbur (fabianbur) wrote :

I have tested it on my ASUS Laptop X513EQ with the kernel 5.8.0-42 and now it detects the SATA hard drive without problem.

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
fabianbur (fabianbur) on 2021-01-30
tags: added: verification-done-groovy
removed: verification-needed-groovy
Aksahat (baka-1) wrote :

Hello, I recently install in my new laptop with `1TB HDD + 256GB SSD`. I am facing same problem.I will like to fix this using this patch or some another method. Please help me out in applying this patch and checking if this will work for me. P.s. I am working on ubuntu 20.04

fabianbur (fabianbur) wrote :

Aksahat: The kernel 5.8.0-42 is not yet in proposed, on Ubuntu 20.04. For now only on Ubuntu 20.10.

The alternative, until it is included in Ubuntu 20.04.2 is this:

sudo apt install linux-oem-20.04

This way, the OEM 5.6 kernel is installed, which is fix.

Then you have to uninstall the hwe kernel:

sudo apt remove --purge --install-recommends linux-generic-hwe-20.04

To list installed kernels:

sudo dpkg --list | grep linux-image
and:
sudo dpkg --list | grep linux-headers

Clear all 5.8 kernels with:

sudo apt-get --purge remove linux-xxx

That or wait for it to be fixed in ubuntu 20.04 with kernel 5.8

Aksahat (baka-1) wrote :

Thankyou :fabianbur for replying, I have a few questions...

Could something go wrong while doing this?
How long before patched kernel is released for ubuntu-20.04?
How can I be sure of my error is this same error and will be relsolved by patch?

Aksahat (baka-1) wrote :

https://pastebin.ubuntu.com/p/4pyrWCYryT/

:fabianbur , Here are results of those commands, will switching my kernel from boot menu to hwe-20.04 might help?

uname -sr returns ----> Linux 5.8.0-41-generic

fabianbur (fabianbur) wrote :

Aksahat:

Could something go wrong while doing this?

No, it couldn't be worse than what you already have. The OEM kernel is tested and resolves this particular bug and others.

How long before patched kernel is released for ubuntu-20.04?

That question is for the Ubuntu development team. I dare to calculate that between two and four weeks. Maybe less, but no more than that.

How can I be sure of my error is this same error and will be relsolved by patch?

If you use sudo apt install linux-oem-20.04 you reboot and in grub you select the OEM kernel and it fixes the error, that's it.

If you don't want to do that, check the detail of the error in this thread and compare it with yours.

The command output you share says nothing more than the kernel version you have.

Aksahat (baka-1) wrote :

Thankyou :fabinanbur for you help :) very helpful.

Aksahat (baka-1) wrote :

`linux-image-5.8.0-43-generic` has been released and I have updated myself to it. Still patch doesn't work whereas it works on linux-oem-20.04

fabianbur (fabianbur) wrote :

I must report that this bug has returned to Groovy with linux-image-5.8.0-43-generic

It was tested in groovy since proposed with linux-image-5.8.0-42-generic and verified as indicated here, within 5 days of publication.

What happened?
Does anyone know?

I myself marked it as Fix Released in groovy

And now?

You-Sheng Yang (vicamo) wrote :

@Fabianbur, 5.8.0-42 was obsoleted by a emergency security fix 5.8.0-43, and re-landed in 5.8.0-44 that is not yet available publicly. So I think you don't need to do anything here, but it's still very appreciated if you'd like to help verifying 5.8.0-44 again.

Aksahat (baka-1) wrote :

Can someone tell by when 5.8.0-44 will be released?

fabianbur (fabianbur) wrote :

@vicamo Thanks for your explanation. I can test 5.8.0-44 as soon as it is in proposed.

regards

fabianbur (fabianbur) wrote :

@vicamo At this time the kernel is available in proposed.

I can confirm that the kernel 5.8.0-44, in proposed, solves the problem in Groovy.

Launchpad Janitor (janitor) wrote :
Download full text (20.7 KiB)

This bug was fixed in the package linux - 5.10.0-14.15

---------------
linux (5.10.0-14.15) hirsute; urgency=medium

  * hirsute/linux: 5.10.0-14.15 -proposed tracker (LP: #1913724)

  * Restore palm ejection on multi-input devices (LP: #1913520)
    - HID: multitouch: Apply MT_QUIRK_CONFIDENCE quirk for multi-input devices

  * intel-hid is not loaded on new Intel platform (LP: #1907160)
    - platform/x86: intel-hid: add Rocket Lake ACPI device ID

  * Hirsute update: v5.10.11 upstream stable release (LP: #1913430)
    - scsi: target: tcmu: Fix use-after-free of se_cmd->priv
    - mtd: rawnand: gpmi: fix dst bit offset when extracting raw payload
    - mtd: rawnand: nandsim: Fix the logic when selecting Hamming soft ECC engine
    - i2c: tegra: Wait for config load atomically while in ISR
    - i2c: bpmp-tegra: Ignore unknown I2C_M flags
    - platform/x86: ideapad-laptop: Disable touchpad_switch for ELAN0634
    - ALSA: seq: oss: Fix missing error check in snd_seq_oss_synth_make_info()
    - ALSA: hda/realtek - Limit int mic boost on Acer Aspire E5-575T
    - ALSA: hda/via: Add minimum mute flag
    - crypto: xor - Fix divide error in do_xor_speed()
    - dm crypt: fix copy and paste bug in crypt_alloc_req_aead
    - ACPI: scan: Make acpi_bus_get_device() clear return pointer on error
    - btrfs: don't get an EINTR during drop_snapshot for reloc
    - btrfs: do not double free backref nodes on error
    - btrfs: fix lockdep splat in btrfs_recover_relocation
    - btrfs: don't clear ret in btrfs_start_dirty_block_groups
    - btrfs: send: fix invalid clone operations when cloning from the same file
      and root
    - fs: fix lazytime expiration handling in __writeback_single_inode()
    - pinctrl: ingenic: Fix JZ4760 support
    - mmc: core: don't initialize block size from ext_csd if not present
    - mmc: sdhci-of-dwcmshc: fix rpmb access
    - mmc: sdhci-xenon: fix 1.8v regulator stabilization
    - mmc: sdhci-brcmstb: Fix mmc timeout errors on S5 suspend
    - dm: avoid filesystem lookup in dm_get_dev_t()
    - dm integrity: fix a crash if "recalculate" used without "internal_hash"
    - dm integrity: conditionally disable "recalculate" feature
    - drm/atomic: put state on error path
    - drm/syncobj: Fix use-after-free
    - drm/amdgpu: remove gpu info firmware of green sardine
    - drm/amd/display: DCN2X Find Secondary Pipe properly in MPO + ODM Case
    - drm/i915/gt: Prevent use of engine->wa_ctx after error
    - drm/i915: Check for rq->hwsp validity after acquiring RCU lock
    - ASoC: Intel: haswell: Add missing pm_ops
    - ASoC: rt711: mutex between calibration and power state changes
    - SUNRPC: Handle TCP socket sends with kernel_sendpage() again
    - HID: sony: select CONFIG_CRC32
    - dm integrity: select CRYPTO_SKCIPHER
    - x86/hyperv: Fix kexec panic/hang issues
    - scsi: ufs: Relax the condition of UFSHCI_QUIRK_SKIP_MANUAL_WB_FLUSH_CTRL
    - scsi: ufs: Correct the LUN used in eh_device_reset_handler() callback
    - scsi: qedi: Correct max length of CHAP secret
    - scsi: scsi_debug: Fix memleak in scsi_debug_init()
    - scsi: sd: Suppress spurious errors when WRITE SAME is being disabled
    - riscv: ...

Changed in linux (Ubuntu):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers