Comment 2 for bug 1878520

Revision history for this message
dann frazier (dannf) wrote :

Note that the cma messages do not necessarily mean something is wrong. The logic in the kernel is to try CMA for DMA memory first, then revert to normal memory if that fails. It's annoying that the CMA allocater emits errors in this case, but we should be able to ignore them. If we have a reliable reproducer, we could try booting with cma=64M (or more if necessary) to see if we still get rcu stalls when enough CMA is available to handle all DMA requests.

But, while such tests might get us some interesting (or not) data points, it's unlikely to get us closer to a fix. I think it would be more worth while to see if we can debug the stall itself. We can start by rebuilding the kernel with CONFIG_RCU_TRACE as described here to see if we get more info:
https://www.kernel.org/doc/Documentation/RCU/stallwarn.txt

Since d05-1 appears to be up and idle, I went ahead and installed such a kernel and booted into it in hopes the issue reproduces over the weekend.