Hello, I am using amdgpu and mate on debian, but I have the very same issue. The only way to fix it was to downgrade the Kernel to 5.3.0. I already wrote to the amdgpu devs but got no response... My message was, back then:
Hi, on my netbook (debian bullseye, AMD A4, Sea Islands), after
updating the kernel to version 5.4.6 and logging into mate desktop, my
screen looked like this (see attached picture).
If you look at the image, do you have any idea what is happening
there? To me it looks like the framebuffer data is misinterpreted at
some stage. The picture looks good on the lightdm login screen, it is
only corrupted when I start a user session and when I then switch to a
console (ctrl+1), the screen looks correct for a second before it
switches. When I switch back (ctrl+7), it looks correct for a second
and afterwards, it turns corrupted again. I was looking into these
commits a bit but I don't have any idea... Maybe mate/marco is doing
something it shouldn't but then, much more people would have problems
now...
I took a screenshot but on that, everything looked fine, that is why I
took a picture. I assume it came by one of these commits, since these
were the only amdgpu changes between a working and a non-working
kernel:
It may fail to load guest driver in round 2 or cause Xstart problem
when using invalidate semaphore for SRIOV or picasso. So it needs avoid
using invalidate semaphore for SRIOV and picasso.
Signed-off-by: changzhu <email address hidden>
Reviewed-by: Christian König <email address hidden>
Reviewed-by: Huang Rui <email address hidden>
Signed-off-by: Alex Deucher <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
It may cause timeout waiting for sem acquire in VM flush when using
invalidate semaphore for picasso. So it needs to avoid using invalidate
semaphore for piasso.
It may lose gpuvm invalidate acknowldege state across power-gating off
cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire
before invalidation and semaphore release after invalidation.
After adding semaphore acquire before invalidation, the semaphore
register become read-only if another process try to acquire semaphore.
Then it will not be able to release this semaphore. Then it may cause
deadlock problem. If this deadlock problem happens, it needs a semaphore
firmware fix.
SW must acquire/release one of the vm_invalidate_eng*_sem around the
invalidation req/ack. Through this way,it can avoid losing invalidate
acknowledge state across power-gating off cycle.
To use vm_invalidate_eng*_sem, it needs to initialize
vm_invalidate_eng*_sem firstly.
Signed-off-by: changzhu <email address hidden>
Reviewed-by: Christian König <email address hidden>
Signed-off-by: Alex Deucher <email address hidden>
Cc: <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
----------
Maybe you have an idea and want to share it with me, apart from that
huge thanks for your work!
L3P3
Hello, I am using amdgpu and mate on debian, but I have the very same issue. The only way to fix it was to downgrade the Kernel to 5.3.0. I already wrote to the amdgpu devs but got no response... My message was, back then:
Hi, on my netbook (debian bullseye, AMD A4, Sea Islands), after
updating the kernel to version 5.4.6 and logging into mate desktop, my
screen looked like this (see attached picture).
If you look at the image, do you have any idea what is happening
there? To me it looks like the framebuffer data is misinterpreted at
some stage. The picture looks good on the lightdm login screen, it is
only corrupted when I start a user session and when I then switch to a
console (ctrl+1), the screen looks correct for a second before it
switches. When I switch back (ctrl+7), it looks correct for a second
and afterwards, it turns corrupted again. I was looking into these
commits a bit but I don't have any idea... Maybe mate/marco is doing
something it shouldn't but then, much more people would have problems
now...
I took a screenshot but on that, everything looked fine, that is why I
took a picture. I assume it came by one of these commits, since these
were the only amdgpu changes between a working and a non-working
kernel:
commit 9375fa3799293da 82490f0f1fa1f1e 7fabae2745
Author: changzhu <email address hidden>
Date: Tue Dec 10 22:00:59 2019 +0800
drm/amdgpu: add invalidate semaphore limit for SRIOV and picasso in gmc9
commit 90f6452ca58d436 de4f69b423ecd75 a109aa9766 upstream.
It may fail to load guest driver in round 2 or cause Xstart problem
when using invalidate semaphore for SRIOV or picasso. So it needs avoid
using invalidate semaphore for SRIOV and picasso.
Signed-off-by: changzhu <email address hidden>
Reviewed-by: Christian König <email address hidden>
Reviewed-by: Huang Rui <email address hidden>
Signed-off-by: Alex Deucher <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
commit bf8ae461a23577a 9884f993b31b31f 15dd7d6c0a
Author: changzhu <email address hidden>
Date: Tue Dec 10 10:23:09 2019 +0800
drm/amdgpu: avoid using invalidate semaphore for picasso
commit 413fc385a594ea6 eb08843be339390 57ddfdae76 upstream.
It may cause timeout waiting for sem acquire in VM flush when using
invalidate semaphore for picasso. So it needs to avoid using invalidate
semaphore for piasso.
Signed-off-by: changzhu <email address hidden>
Reviewed-by: Huang Rui <email address hidden>
Signed-off-by: Alex Deucher <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
commit f45858245286fa9 01e8abf36776df7 e99d9a9581
Author: Xiaojie Yuan <email address hidden>
Date: Wed Nov 20 14:02:22 2019 +0800
drm/ amdgpu/ gfx10: re-init clear state buffer after gpu reset
commit 210b3b3c7563df3 91bd81d49c51af3 03b928de4a upstream.
This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.
clear state buffer (resides in vram) is corrupted after 1st baco reset,
upon gfxoff exit, CPF gets garbage header in CSIB and hangs.
Signed-off-by: Xiaojie Yuan <email address hidden>
Reviewed-by: Hawking Zhang <email address hidden>
Signed-off-by: Alex Deucher <email address hidden>
Cc: <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
commit eebab68448a6bbb 9b899216b6e8890 57f6f4498d
Author: Xiaojie Yuan <email address hidden>
Date: Thu Nov 14 16:56:08 2019 +0800
drm/ amdgpu/ gfx10: explicitly wait for cp idle after halt/unhalt
commit 1e902a6d32d73e4 a6b3bc9d7cd43d4 ee2b242dea upstream.
50us is not enough to wait for cp ready after gpu reset on some navi asics.
Signed-off-by: Xiaojie Yuan <email address hidden>
Suggested-by: Jack Xiao <email address hidden>
Acked-by: Alex Deucher <email address hidden>
Signed-off-by: Alex Deucher <email address hidden>
Cc: <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
commit 69e0a0d5bcc4dd6 77ea460b172a1be c4127c650d
Author: changzhu <email address hidden>
Date: Tue Nov 19 11:13:29 2019 +0800
drm/amdgpu: invalidate mmhub semaphore workaround in gmc9/gmc10
commit f920d1bb9c4e77e fb08c41d70b6d44 2f46fd8902 upstream.
It may lose gpuvm invalidate acknowldege state across power-gating off
cycle. To avoid this issue in gmc9/gmc10 invalidation, add semaphore acquire
before invalidation and semaphore release after invalidation.
After adding semaphore acquire before invalidation, the semaphore
register become read-only if another process try to acquire semaphore.
Then it will not be able to release this semaphore. Then it may cause
deadlock problem. If this deadlock problem happens, it needs a semaphore
firmware fix.
Signed-off-by: changzhu <email address hidden>
Acked-by: Huang Rui <email address hidden>
Signed-off-by: Alex Deucher <email address hidden>
Cc: <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
commit b23e536fc4d5830 8e3e1ae05693279 95995ee378
Author: changzhu <email address hidden>
Date: Tue Nov 19 10:18:39 2019 +0800
drm/amdgpu: initialize vm_inv_eng0_sem for gfxhub and mmhub
commit 6c2c8972374ac5c 35078d36d7559f6 4c368f7b33 upstream.
SW must acquire/release one of the vm_invalidate_ eng*_sem around the eng*_sem, it needs to initialize invalidate_ eng*_sem firstly.
invalidation req/ack. Through this way,it can avoid losing invalidate
acknowledge state across power-gating off cycle.
To use vm_invalidate_
vm_
Signed-off-by: changzhu <email address hidden>
Reviewed-by: Christian König <email address hidden>
Signed-off-by: Alex Deucher <email address hidden>
Cc: <email address hidden>
Signed-off-by: Greg Kroah-Hartman <email address hidden>
----------
Maybe you have an idea and want to share it with me, apart from that
huge thanks for your work!
L3P3