Comment 5 for bug 1386490

On Tue, Dec 2, 2014 at 2:49 AM, dann frazier <email address hidden> wrote:
> On Thu, Nov 27, 2014 at 8:17 PM, Ming Lei <email address hidden> wrote:
>> >From ARM64 maintainer's viewpoint:
>>
>> http://marc.info/?l=linux-arm-kernel&m=141708838404470&w=2
>>
>> Either way, I don't think it's a problem for the kernel. We just need to
>> change the default DMA ops to coherent when booting with ACPI (using
>> non-coherent ops for a coherent device is not safe as the CPU can
>> corrupt cache lines written by the device).
>>
>> So I suggest to revert c7a4a7658d689f6 for utopic since utopic ships APM's
>> non-upstreamed PCI implementation, and APM's ARM64 Soc is coherent
>> arch.
>>
>> Dann, what do you think about it?
>
> Ming,
> Thanks for looking into this.
> I'm not sure I follow the relevancy of a couple things:
>
> - The upstream discussion seems to have gotten rerouted to solving
> the problem for ACPI, but this issue is regarding device-tree (as was
> your patch), since that is what Ubuntu/m400 uses. I'm having trouble
> seeing how the ACPI solution helps us.

Firstly I understand upstream prefers ACPI for arm64 server.

For DT based solution, my patch or sort of fix is needed for the
issue, but upstream community thought handling dma coherency
(include irq, dma mask, iommu, ...) should be moved to drivers/pci
of kernel first, and that work need cooperation between arm64
and pci community, and it might not easy to merge soon.

From the discussion, Redhat also ships the similar patch in their
internal tree.

>
> - I'm confident I reproduced this problem with all upstream bits
> after PCI was merged. Therefore my assumption was that this issue is
> independent of PCI stack choice. Do you believe that the solution for
> the upstream PCI stack is different than the solution for the APM PCI
> stack?

The issue or root cause is very clear, and all PCI devices can't
inherit dma coherent attribute on ARM64, both upstream and
APM PCI stack.

The patch I posted can't apply on APM PCI stack since its
implementation is very different with upstream, so reverting
c7a4a7658d is easier for utopic.

>
> In general I'm OK with a revert of c7a4a7658d689f6 for utopic, as long
> as we have a plan for an upstream answer for vivid and beyond (>=
> 3.19). I don't think the kernel team would support a plan that
> involves carrying this revert indefinitely (if that's what you're
> suggesting, I'm not sure it is).

It depends on upstream, as I described above, :-)

Thanks,

>
> -dann
>
>> Thanks,
>>
>> ** Changed in: linux (Ubuntu Utopic)
>> Assignee: (unassigned) => Ming Lei (tom-leiming)
>>
>> --
>> You received this bug notification because you are subscribed to the bug
>> report.
>> https://bugs.launchpad.net/bugs/1386490
>>
>> Title:
>> HP ProLiant m400 nic doesn't work after trusty
>>
>> To manage notifications about this bug go to:
>> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386490/+subscriptions
>
> --
> You received this bug notification because you are a bug assignee.
> https://bugs.launchpad.net/bugs/1386490
>
> Title:
> HP ProLiant m400 nic doesn't work after trusty
>
> Status in linux package in Ubuntu:
> Confirmed
> Status in linux source package in Utopic:
> Confirmed
> Status in linux source package in Vivid:
> Confirmed
>
> Bug description:
> Starting in 3.15, arm64 began defaulting to non-coherent dma_ops:
>
> commit c7a4a7658d689f664050c45493d79adf053f226e
> Author: Ritesh Harjani <email address hidden>
> Date: Wed Apr 23 06:29:46 2014 +0100
>
> arm64: Make default dma_ops to be noncoherent
>
> Firmware (dtb in the case of the m400) is responsible for telling the
> kernel when a device requires coherent dma_ops. However, as of utopic,
> this property is not being inherited by downstream devices.
> Specifically, the xgene-pcie device is marked as coherent, but the
> devices behind it (mellanox card) still get initialized with non-
> coherent ops.
>
> This results in the mlx4 driver bailing out with the following messages:
> [ 18.703635] mlx4_core 0000:01:00.0: command 0x23 timed out (go bit not cleared)
> [ 18.710911] mlx4_core 0000:01:00.0: Failed to initialize queue pair table, aborting
>
>
> There's an upstream discussion on the topic here:
> http://www.spinics.net/lists/arm-kernel/msg362320.html
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1386490/+subscriptions