amd iommu dma_map_sg failed

Bug #2020166 reported by zhangerdong
256
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Livepatch On-Prem
New
Undecided
Unassigned

Bug Description

I run into a dma_map_sg failed problem, ask for some help.

    My server info:

       root@X86-U-1-60:~/zhanged/ixdriver/kmd# lsb_release -a
       No LSB modules are available.
       Distributor ID: Ubuntu
       Description: Ubuntu 18.04.5 LTS
       Release: 18.04
       Codename: bionic

        root@X86-U-1-60:~/zhanged/ixdriver/kmd# free -g
               total used free shared buff/cache available
        Mem: 251 3 245 0 2 246
        Swap: 9 0 9

        root@X86-U-1-60:~/zhanged/ixdriver/kmd# cat /proc/version

       Linux version 4.15.0-112-generic (buildd@lcy01-amd64-027) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #113-Ubuntu SMP Thu Jul 9 23:41:39 UTC 2020

       root@X86-U-1-60:~/zhanged/ixdriver/kmd# cat /proc/cpuinfo

        processor : 0
        vendor_id : AuthenticAMD
        cpu family : 25
        model : 1
        model name : AMD EPYC 7543 32-Core Processor
        stepping : 1
        microcode : 0xa001144
        cpu MHz : 1492.407
        cache size : 512 KB
        physical id : 0
        siblings : 32
        core id : 0
        cpu cores : 32
        apicid : 0
        initial apicid : 0
        fpu : yes
        fpu_exception : yes
        cpuid level : 16
        root@X86-U-1-60:~/zhanged/ixdriver/kmd# lsb_release -a
        No LSB modules are available.
        Distributor ID: Ubuntu
        Description: Ubuntu 18.04.5 LTS
        Release: 18.04
        Codename: bionic

    My example code:

           size = 1073741824; // up to 9GB.
           ret = vm_mmap(NULL, 0, size,
                    PROT_READ | PROT_WRITE | PROT_EXEC,
                    MAP_ANONYMOUS | MAP_SHARED, 0, &user_addr);
            get_user_pages_fast(user_addr, size/PAGE_SIZE, FOLL_WRITE | FOLL_SPLIT, pages);
            sg_alloc_table_from_pages(&xfer->sgt, (struct page **)pages, page_num,
                        0, size, GFP_KERNEL);
            dma_map_sg(&pdev->dev, xfer->sgt.sgl, sg_nents(xfer->sgt.sgl), dir);

when I run my code, dmesg report error:

    [ 3408.305212] 0000:01:00.0: IOMMU mapping error in map_sg (io-pages: 2442288)

Finaly, I find amd iommu code on linux version 4.15 has bug, I try to modify code as follows:

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c

 index 97baf88d9505..189ecb206471 100644

 — a/drivers/iommu/amd_iommu.c

 +++ b/drivers/iommu/amd_iommu.c

 @@ -2453,7 +2453,7 @@ static int sg_num_pages(struct device *dev,

 {

 unsigned long mask, boundary_size;

 struct scatterlist *s;

 - int i, npages = 0;

 + unsgined long i, npages = 0;

 mask = dma_get_seg_boundary(dev);

 boundary_size = mask + 1 ? ALIGN(mask + 1, PAGE_SIZE) >> PAGE_SHIFT :

 @@ -2505,7 +2505,7 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,

/* Map all sg entries */

 for_each_sg(sglist, s, nelems, i) {

 - int j, pages = iommu_num_pages(sg_phys(s), s->length, PAGE_SIZE);

 + unsigned long j, pages = iommu_num_pages(sg_phys(s), s->length, PAGE_SIZE);

for (j = 0; j < pages; ++j) {
 unsigned long bus_addr, phys_addr;

my case can run success.

Why this error happen on my server, I think the reason is that my server has upto 240G free memory, some item’s size in
xfer->sgt.sgl larger than 4GB. The variable `nages` in function sg_num_pages is int type, so `npages << PAGE_SHIFT` will overflow for large size.
1. I try to update linux version to 5.8, my case can run success. But i cannot update linux version arbitrary.
2. I try to modify vm_mmap and get_user_pages_fast parameter, all try failed.
3. I try to disable iommu, my case can success, i cannot disable iommu arbitrary.

Is there are livepatch fix this bug? thanks very much.

summary: - iommu
+ amd iommu dma_map_sg failed
description: updated
description: updated
description: updated
description: updated
description: updated
information type: Public → Public Security
Revision history for this message
Kian Parvin (kian-parvin) wrote :

Hi,

This is an interesting problem, unfortunately Livepatch only addresses high and critical CVE's (Common Vulnerability and Exposure) and I'm not sure whether your problem classifies as a CVE. It looks simply like an immature driver implementation (as you mentioned it's fixed in a newer kernel).

You can check if Livepatch has addressed any specific issues using this page - https://ubuntu.com/security/notices (a brief explanation of what security notices are can be found here - https://ubuntu.com/security/livepatch/docs/livepatch/explanation/notices)

However I think you might benefit from Ubuntu's HWE kernels, this means you can run a newer kernel on an older release, details are here https://wiki.ubuntu.com/Kernel/LTSEnablementStack
A brief word of caution, this is on by default for Desktop distributions of Ubuntu so that newer kernels bring better driver support but not enabled by default for server installs, but it can also be enabled for servers.

Alternatively you might look into creating your own kernel module and inserting that into the server, unfortunately that's out of the scope in terms of Livepatch.

Hope that helps!

To post a comment you must log in.
This report contains Public Security information  
Everyone can see this security related information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.