Unity8 using vmwgfx_dri.so crashed in mir::graphics::nested::detail::DisplayBuffer::make_current() -> eglMakeCurrent() -> ... -> dri2_image_get_buffers() [platform_mir.c:138]

Bug #1560498 reported by errors.ubuntu.com bug bridge on 2016-03-22
30
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Undecided
Unassigned
Mir
Triaged
Medium
Unassigned
mesa (Ubuntu)
High
Unassigned
mir (Ubuntu)
Medium
Unassigned
unity8 (Ubuntu)
High
Unassigned

Bug Description

The Ubuntu Error Tracker has been receiving reports about a problem regarding unity8. This problem was most recently seen with version 8.11+16.04.20160310.4-0ubuntu1, the problem page at https://errors.ubuntu.com/problem/7e3f860c1afbbc114bf73f9d7a2966209a25093d contains more details.

Changed in unity8 (Ubuntu):
status: New → Invalid
summary: - /usr/bin/unity8:11:dri2_image_get_buffers:dri_image_drawable_get_buffers:dri2_allocate_textures:dri_st_framebuffer_validate:st_framebuffer_validate
+ Unity8 crashed in
+ mir::graphics::nested::detail::DisplayBuffer::make_current() ->
+ eglMakeCurrent() -> ... -> dri2_image_get_buffers() [platform_mir.c:138]
Changed in mir:
importance: Undecided → High
Changed in mir (Ubuntu):
importance: Undecided → High
Changed in mesa (Ubuntu):
importance: Undecided → High
Changed in unity8 (Ubuntu):
importance: Undecided → High

+static int
+dri2_image_get_buffers(__DRIdrawable *driDrawable,
+ unsigned int format,
+ uint32_t *stamp,
+ void *loaderPrivate,
+ uint32_t buffer_mask,
+ struct __DRIimageList *buffers)
+{
+ struct dri2_egl_surface *dri2_surf = loaderPrivate;
+
+ if (buffer_mask & __DRI_IMAGE_BUFFER_BACK) {
+ if (!dri2_surf->back)
+ return 0;
+
+ buffers->back = ((struct gbm_dri_bo *)dri2_surf->back->bo)->image; <---- HERE (?!)
+ buffers->image_mask = __DRI_IMAGE_BUFFER_BACK;
+
+ return 1;
+ }
+
+ return 0;
+}

tags: added: egl-platform-mir
Daniel van Vugt (vanvugt) wrote :

Digging into at least the latest 5 incidents, the crash is always from vmwgfx_dri.so

So VMware's graphics driver is a problem. It should probably work better than this already so I'm reluctant to make it a duplicate of bug 1118903.

summary: - Unity8 crashed in
+ Unity8 using vmwgfx_dri.so crashed in
mir::graphics::nested::detail::DisplayBuffer::make_current() ->
eglMakeCurrent() -> ... -> dri2_image_get_buffers() [platform_mir.c:138]
Changed in mesa (Ubuntu):
status: New → Confirmed
Changed in mir (Ubuntu):
status: New → Invalid
Changed in mir:
status: New → Invalid
Daniel van Vugt (vanvugt) wrote :

Sorry, vmwgfx_dri.so actually belongs to Mesa.

binary package: libgl1-mesa-dri
source package: mesa

Changed in mir:
status: Invalid → New
Changed in mir (Ubuntu):
status: Invalid → New
Daniel van Vugt (vanvugt) wrote :

Note that a solution to bug 1118903 would allow us to avoid the offending vmwgfx_dri.so here.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mir (Ubuntu):
status: New → Confirmed
Pete Woods (pete-woods) wrote :

Guys, is it worth looping in Thomas Hellstrom <email address hidden> and/or Jakob Bornecrantz <email address hidden> to this bug? IIRC they were interested in getting the VMware driving working with Mir a while ago.

Daniel van Vugt (vanvugt) wrote :

Yes it's worth getting them involved, although I should point out my own more recent attempts to get Mir working properly under VMware were unsuccessful for different reasons (never hit this bug). Either way, VMware support needs a refresh...

tags: added: vm
Thomas Hellström (thellstrom) wrote :

OK, So I tried to debug this on 17.04 with the mir demos.

What happens is that while the Mir server seems to run fine,
when the EGL mir clients import a surface from the mir server (using fds / prime) they
typecast the XRGB surface from the mir server to an ARGB surface, which the svga gallium driver doesn't like that and it returns an error.

That error is never caught in the mir platform EGL layer and when the corresponding "bo" is dereferenced, the mir platform EGL layer instead dereferences NULL, which is the error code...

So I'd say this is a combination of two Mir errors: One illegal typecast and one failure to check
for errors.

As a side note, it would be possible for the svga driver to implement a workaround and not error
in this case, but while real hardware may be more forgiving in this case, the surface that the
mir client thinks is an argb surface will still be an xrgb surface and any operation involving the
alpha channel will yield unexpected results so IMHO this needs to be fixed in the MIR EGL layer:

Offending code: (platform_mir.c)

static struct gbm_bo *create_gbm_bo_from_buffer(struct gbm_device* gbm_dev,
                                                MirBufferPackage *package)
{
   struct gbm_import_fd_data data;

   data.fd = package->fd[0];
   data.width = package->width;
   data.height = package->height;
   data.format = GBM_FORMAT_ARGB8888; /* TODO: Use mir surface format */ <= HERE!
   data.stride = package->stride;

   return gbm_bo_import(gbm_dev, GBM_BO_IMPORT_FD, &data, GBM_BO_USE_RENDERING);
}

/Thomas

Daniel van Vugt (vanvugt) wrote :

Thanks for your efforts Thomas.

I recall the cast from ARGB to XRGB is a workaround I did for Mesa bug 1480755.

Changed in mir:
status: New → Triaged
Changed in mesa (Ubuntu):
status: Confirmed → Triaged
Changed in mir (Ubuntu):
status: Confirmed → Triaged
Daniel van Vugt (vanvugt) wrote :

Dropped severity of the Mir tasks to medium since it's only eglapp.c that's relevant there.

Changed in mir:
importance: High → Medium
Changed in mir (Ubuntu):
importance: High → Medium
Daniel van Vugt (vanvugt) wrote :

Although possibly also real_kms_display_configuration.cpp in Mir is wrong.

kevin gunn (kgunn72) wrote :

can we please roll back the "importance" to high?

i think it meeets the criteria
High: A bug which fulfills any of the following criteria:
Has a severe impact on a small portion of Ubuntu users (estimated)

kevin gunn (kgunn72) wrote :

nevertheless thanks for the progress on this one. lots of folks requesting.

Daniel van Vugt (vanvugt) wrote :

I chose Medium on the assumption that the only Mir code affected here was a couple of examples (and that the real problem is Mesa bug 1480755). However the issue affecting Unity8 might be the same mistake from the examples also made in real_kms_display_configuration.cpp and elsewhere. So yeah that would qualify as high then.

Pete Woods (pete-woods) wrote :

I *think* it looks like there's something similar going on inside QtMir (http://bazaar.launchpad.net/~mir-team/qtmir/trunk/view/head:/src/platforms/mirserver/screen.cpp#L60):

enum QImage::Format qImageFormatFromMirPixelFormat(MirPixelFormat mirPixelFormat) {
    switch (mirPixelFormat) {
    case mir_pixel_format_abgr_8888:
        if (isLittleEndian()) {
            // 0xRR,0xGG,0xBB,0xAA
            return QImage::Format_RGBA8888;
        } else {
            // 0xAA,0xBB,0xGG,0xRR
            qFatal("[mirserver QPA] "
                   "Qt doesn't support mir_pixel_format_abgr_8888 in a big endian architecture");
        }
        break;
    case mir_pixel_format_xbgr_8888:
        if (isLittleEndian()) {
            // 0xRR,0xGG,0xBB,0xXX
            return QImage::Format_RGBX8888;
        } else {
            // 0xXX,0xBB,0xGG,0xRR
            qFatal("[mirserver QPA] "
                   "Qt doesn't support mir_pixel_format_xbgr_8888 in a big endian architecture");
        }
        break;
        break;
    case mir_pixel_format_argb_8888:
        // 0xAARRGGBB
        return QImage::Format_ARGB32;
        break;
    case mir_pixel_format_xrgb_8888:
        // 0xffRRGGBB
        return QImage::Format_RGB32;
        break;
    case mir_pixel_format_bgr_888:
        qFatal("[mirserver QPA] Qt doesn't support mir_pixel_format_bgr_888");
        break;
    default:
        qFatal("[mirserver QPA] Unknown mir pixel format");
        break;
    }
    return QImage::Format_Invalid;
}

Pete Woods (pete-woods) wrote :

Or if not (I'm no graphics programmer), that's at least the comparable snippet, I think?

Daniel van Vugt (vanvugt) wrote :

No it doesn't seem to be the same issue. Thanks for looking though -- I recall we copied the hack into another project but I haven't yet found where that was.

Although removing the original hack from the Mir examples is trivial, I suspect that's not actually the problem here because mir-demos should not crash Unity8 (?). The actual problem I suspect is our KMS/nested platform code which also hard codes pixel formats and those will often be wrong (xrgb when it's really argb). That part is probably non-trivial to fix.

Pete Woods (pete-woods) on 2017-03-09
tags: added: unity8-desktop
Changed in canonical-devices-system-image:
status: New → Confirmed
Timo Aaltonen (tjaalton) wrote :

Mir EGL platform is gone

Changed in mesa (Ubuntu):
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers