Comment 51 for bug 1553328

Revision history for this message
In , Ilia Mirkin (imirkin) wrote :

(In reply to Jim Blandy from comment #21)
> I have no idea what this code is supposed to be doing, but here's what I can
> infer:
>
> The call to pushbuf_kref crashes because the `struct nouveau_bo *bo`
> argument is NULL, and cli_push_get tries to use it. The caller of
> pushbuf_kref, pushbuf_validate, is iterating over a list of brefs; the
> current bref's bo field is NULL.
>
> This bref was created by nouveau_bufctx_refn, which was passed a NULL `bo`
> argument. Its caller is nvc0_add_resident, which was passed a `struct
> nv04_resource *` whose `bo` field is NULL.
>
> This nv04_resource was created by a call to nouveau_buffer_create in which
> buffer->domain is never set to anything other than 0. Looking at
> nouveau_buffer_allocate, it seems like a domain of zero is a legitimate
> value; the last branch of the if-else chain asserts that this is the case.
> Since that path doesn't set buf->bo, it seems it's legitimate for buf->bo to
> be NULL.
>
> pushbuf_kref seems adamant that bo should be non-NULL; both cli_push_get and
> cli_kref_get require it. At this point I'm lost: should nouveau_bufctx_refn
> never be passed a NULL bo? Should such a bufref never make it onto the list
> that pushbuf_validate sees? I'm not sure.
>
> Here's the stack trace at the call to nouveau_bufctx_refn:
>
> #0 nouveau_bufctx_refn (bctx=0x555555b3eea0, bin=bin@entry=1, bo=0x0,
> flags=256) at bufctx.c:126
> #1 0x00007ffff0c33154 in nvc0_add_resident (flags=256, res=0x555555bf4800,
> bin=1, bufctx=<optimized out>) at nvc0/nvc0_winsys.h:29
> #2 nvc0_validate_vertex_buffers_shared (nvc0=0x555555b3cf30) at
> nvc0/nvc0_vbo.c:407

Whoa, great analysis! And makes a *ton* more sense than my thought, which was that the GPU hung and we ran out of GEM handles making pushbufs.

So this is one of those idiotic bo-less resources. Ugh. Will check if your repro makes it happen for me too.