I have attached some heavily formatted log output, which shows drm debug messages at the time of the hang (the messages come from the kernel log, obtained using "modrobe drm debug=1". You can see that after the cce_idle ioctl call, context 3 tries to lock again without unlocking first. I recompiled the Mesa DRI driver with the DEBUG_LOCKING flag set in r128_lock.h, and found out that this double locking behavior occurs in the depth buffer calls in r128_span.c. Basically
fixed the problem for me. Apparently, the lock is taken before these functions get called (AFAICR, r128SpanRenderStart() does the job, but I'm not sure).
re: *ERROR* r128_cce_idle called without lock held
drmP.h has a LOCK_TEST_WITH_RETURN macro, which returns -EINVAL along this error message when the lock isn't held. Because of this, the infinite loop in r128WaitForIdleLocked exits with ret == -EINVAL, so the actual error message is quite misleading in this case. So, the real error is that r128WaitForIdleLocked got called without a lock held.
I modified DEBUG_LOCK in r128_lock.h to print all lock operations to the standard error output, and modified the code to call DEBUG_LOCK on unlocks too. r128WaitForIdleLocked was also replaced with a macro to show where it's called from. It turned out that sometimes r128SpanRenderFinish() is called without calling r128SpanRenderStart() first -- so no lock is held during the call to r128WaitForIdleLocked. I think this may be a problem with the software fallback. Armagetron exhibits this behavior, while gl-117 calls RenderStart and RenderFinish in pairs; lock operations also.
Using kernel 2.6.22 with drm and mesa from the freedesktop git.
(Sorry for the double post, I'm a first-time bugzilla user.)
re: [drm:drm_lock_take] *ERROR* 3 holds heavyweight lock
I have attached some heavily formatted log output, which shows drm debug messages at the time of the hang (the messages come from the kernel log, obtained using "modrobe drm debug=1". You can see that after the cce_idle ioctl call, context 3 tries to lock again without unlocking first. I recompiled the Mesa DRI driver with the DEBUG_LOCKING flag set in r128_lock.h, and found out that this double locking behavior occurs in the depth buffer calls in r128_span.c. Basically
/* These functions require locking */
+/* R128_CONTEXT( ctx)); HARDWARE( R128_CONTEXT( ctx));
#undef HW_LOCK
#undef HW_UNLOCK
#define HW_LOCK() LOCK_HARDWARE(
#define HW_UNLOCK() UNLOCK_
+*/
/* 16-bit depth buffer functions
*/
fixed the problem for me. Apparently, the lock is taken before these functions get called (AFAICR, r128SpanRenderS tart() does the job, but I'm not sure).
re: *ERROR* r128_cce_idle called without lock held
drmP.h has a LOCK_TEST_ WITH_RETURN macro, which returns -EINVAL along this error message when the lock isn't held. Because of this, the infinite loop in r128WaitForIdle Locked exits with ret == -EINVAL, so the actual error message is quite misleading in this case. So, the real error is that r128WaitForIdle Locked got called without a lock held.
I modified DEBUG_LOCK in r128_lock.h to print all lock operations to the standard error output, and modified the code to call DEBUG_LOCK on unlocks too. r128WaitForIdle Locked was also replaced with a macro to show where it's called from. It turned out that sometimes r128SpanRenderF inish() is called without calling r128SpanRenderS tart() first -- so no lock is held during the call to r128WaitForIdle Locked. I think this may be a problem with the software fallback. Armagetron exhibits this behavior, while gl-117 calls RenderStart and RenderFinish in pairs; lock operations also.
Using kernel 2.6.22 with drm and mesa from the freedesktop git.