hpsa: DMAR invalid read

Bug #1813651 reported by Guilherme G. Piccoli
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Guilherme G. Piccoli
Disco
Fix Released
Medium
Guilherme G. Piccoli

Bug Description

Recently in kernel 4.20-rc1 and newer we observed the following spontaneous issue with hpsa when intel_iommu is enabled:

[ 5173.952022] DMAR: DRHD: handling fault status reg 2
[ 5174.190649] DMAR: [DMA Read] Request device [03:00.0] fault addr eefdd000 [fault reason 06] PTE Read access is not set

There's a commit that touches DMA in hpsa: "scsi: hpsa: switch to generic DMA API"
We've tested with this commit reverted on top of 4.20-rc1 and it reproduces (the trigger is a kernel build). We cannot reproduce in 4.19.

Investigation is ongoing.

Tags: seg
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Correction: it happens in 4.19 also. I continue with the bisect process.

Revision history for this message
Terry Rudd (terrykrudd) wrote :

Guilherme, is there going to be further work on this bug?

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

Hi Terry, I'm planning to post an update this week.
It seems the problem is related with physical block size adjustments,
after the introduction of the patch: "eb53a3ea3e00 scsi: hpsa: limit transfer length to 1MB, not 512kB"

Without this patch I don't see the issue anymore. But I'm investigating why this patch triggers the issue...

Thanks,

Guilherme

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

I've narrowed down the problem for devices in HBA mode; if device is in RAID mode (despite if it's effectively using some raid level or has only 1 disk), it does not reproduce the issue. I'll attach files with outputs from HP RAID utility (accessed in BIOS) for both cases.
I continue the investigation.

Thanks,

Guilherme

Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :
tags: removed: sts
tags: added: seg
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

The problem was fixed by the following commit, present in all Ubuntu releases: 625d7d351887 ("scsi: hpsa: correct ioaccel2 chaining"). Basically it was an access to a DMA non-mapped region, and was restricted to non-RAID mode adapters due to the I/O path being different.

Since it was fixed, no more work or debug is necessary here.
Cheers,

Guilherme

Changed in linux (Ubuntu Disco):
status: Confirmed → Fix Released
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Guilherme G. Piccoli (gpiccoli) wrote :

For the current Ubuntu releases, the kernels with this fix are listed below.

* Xenial (16.04)
regular kernel: 4.4.0-159
HWE kernel: 4.15.0-60

* Bionic (18.04)
regular kernel: 4.15.0-60
HWE-Disco kernel: 5.0.0-31

* Disco (19.04)
regular kernel: 5.0.0-31

*Eoan (19.10):
regular kernel: all versions (the upstream fix came in v5.2; Eoan is based on v5.3)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.