Mesa

i965: Failed to submit batchbuffer: Bad address

Bug #1794033 reported by themusicgod1 on 2018-09-24

This bug affects 14 people

	Status	Importance	Assigned to
Mesa	New	Unknown	auto-gitlab.freedesktop.org-mesa-mesa-- #2134
mesa (Ubuntu)	Triaged	Low	Unassigned
xserver-xorg-video-intel (Fedora)	Unknown	Medium	freedesktop-bugs #104778

Bug Description

cpu: Intel® Core™ i5-7500 CPU @ 3.40GHz × 4
gpu: Intel® HD Graphics 630 (Kaby Lake GT2)
00:00.0 Host bridge: Intel Corporation Intel Kaby Lake Host Bridge (rev 05)
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 630 (rev 04)
00:08.0 System peripheral: Intel Corporation Skylake Gaussian Mixture Model
xserver-xorg-video-intel:
  Installed: 2:2.99.917+git20171229-1
  Candidate: 2:2.99.917+git20171229-1
ubuntu: 18.04 LTS bionic
gnome: 3.28.2
gdm3:
  Installed: 3.28.2-0ubuntu1.4
  Candidate: 3.28.2-0ubuntu1.4
os type: 64 bit
kernel: Linux eva1 4.15.0-30-generic #32-Ubuntu SMP Thu Jul 26 17:42:43 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
mesa-va-drivers:
  Installed: 18.0.5-0ubuntu0~18.04.1
  Candidate: 18.0.5-0ubuntu0~18.04.1

from syslog:

Sep 24 04:08:16 eva /usr/lib/gdm3/gdm-x-session[1170]: i965: Failed to submit batchbuffer: Bad address
Sep 24 04:08:21 eva gnome-terminal-[2597]: gnome-terminal-server: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Sep 24 04:08:22 eva [2541]: update-notifier: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Sep 24 04:08:22 eva at-spi-bus-launcher[1274]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
- but
(followed by pages upon pages of applications within gnome failing.)

nothing in kern.log around that time.

Singular gnome crash, taking down all applications running within it. Not yet reproduced. gdm successfully started a new session afterwards. Was definitely in a palermoon session on some website or other.

Tags:

Revision history for this message

In freedesktop.org Bugzilla #104778, Mark-a-janes (mark-a-janes) wrote on 2018-01-24:

Mesa CI reports a low error rate (2/700k), however the number of intermittent failures is consistently nonzero. This is worse than our historical results.

The rarity of the failures makes it difficult to pinpoint the regression, however there are several repeating errors:

i965: Failed to submit batchbuffer: Bad address
piglit.spec.!opengl 1_1.copypixels-draw-sync ivb
piglit.spec.!opengl 1_3.gl-1_3-texture-env snb

intel_batchbuffer.c:937: submit_batch: Assertion `entry->handle == batch->batch.bo->gem_handle' failed.
piglit.spec.!opengl 1_3.gl-1_3-texture-env.bdwm64
piglit.shaders.glsl-fs-raytrace-bug27060 skl

HSW tesselation failures

deqp-gles31': corrupted double-linked list: 0x0000561e9518be50 ***
dEQP-GLES31.functional.debug.error_filters.case_0.bdwm64

Revision history for this message

In freedesktop.org Bugzilla #104778, Kenxeth (kenxeth) wrote on 2018-01-31:

Do we know what kernel version is running on the machines with failures? We do slightly different things on v4.13 and later. Wondering if it's only happening on machines with older kernels, or newer ones, or both.

Revision history for this message

In freedesktop.org Bugzilla #104778, Mark-a-janes (mark-a-janes) wrote on 2018-01-31:

Unfortunately I saw this recently on 4.14 and 4.11

http://otc-mesa-ci.jf.intel.com/job/Leeroy/1934476/ - ivbgt2-01 4.14
http://otc-mesa-ci.jf.intel.com/job/Leeroy/1934454/ - sklgt2-04 4.11

Revision history for this message

In freedesktop.org Bugzilla #104778, Lionel-g-landwerlin (lionel-g-landwerlin) wrote on 2018-03-23:

Running piglit.shaders.glsl-fs-raytrace-bug27060, I found this valgrind warning : https://patchwork.freedesktop.org/patch/212413/

Revision history for this message

In freedesktop.org Bugzilla #104778, Lionel-g-landwerlin (lionel-g-landwerlin) wrote on 2018-03-23:

The Broadwell failure is interesting as it's clearly a memory corruption issue.
Running the dEQP-GLES31.functional.debug.* tests under valgrind, I can see a few errors from the CTS suite :

Test case 'dEQP-GLES31.functional.debug.negative_coverage.callbacks.state.get_nuniformfv'..

==12081== Use of uninitialised value of size 8
==12081== at 0x59B505E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==12081== by 0x59B55A8: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, long) const (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==12081== by 0x59C1178: std::ostream& std::ostream::_M_insert<long>(long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==12081== by 0x71CAF1: std::ostream& tcu::Format::operator<< <int const*>(std::ostream&, tcu::Format::Array<int const*> const&) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081== by 0xD9F922: std::ostream& tcu::Format::operator<< <int>(std::ostream&, tcu::Format::ArrayPointer<int> const&) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081== by 0xEE0160: tcu::MessageBuilder& tcu::MessageBuilder::operator<< <tcu::Format::ArrayPointer<int> >(tcu::Format::ArrayPointer<int> const&) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081== by 0xE9E45F: glu::CallLogWrapper::glGetIntegerv(unsigned int, int*) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081== by 0xA611D2: deqp::gles31::Functional::NegativeTestShared::get_nuniformfv(deqp::gles31::Functional::NegativeTestShared::NegativeTestContext&) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081== by 0x7230DF: deqp::gles31::Functional::(anonymous namespace)::TestFunctionWrapper::call(deqp::gles31::Functional::(anonymous namespace)::DebugMessageTestContext&) const (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081== by 0x725DA1: deqp::gles31::Functional::(anonymous namespace)::CallbackErrorCase::iterate() (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081== by 0x6DCAD3: deqp::gles31::TestCaseWrapper::iterate(tcu::TestCase*) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081== by 0xF9E157: tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)

I'm not sure whether that's related, might be worth fixing though (trying to write some patches).

The Broadwell failure is interesting as it's clearly a memory corruption issue.
Running the dEQP-GLES31.functional.debug.* tests under valgrind, I can see a few errors from the CTS suite :

Test case 'dEQP-GLES31.functional.debug.negative_coverage.callbacks.state.get_nuniformfv'..

==12081== Use of uninitialised value of size 8
==12081==    at 0x59B505E: ??? (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==12081==    by 0x59B55A8: std::ostreambuf_iterator<char, std::char_traits<char> > std::num_put<char, std::ostreambuf_iterator<char, std::char_traits<char> > >::_M_insert_int<long>(std::ostreambuf_iterator<char, std::char_traits<char> >, std::ios_base&, char, long) const (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==12081==    by 0x59C1178: std::ostream& std::ostream::_M_insert<long>(long) (in /usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.25)
==12081==    by 0x71CAF1: std::ostream& tcu::Format::operator<< <int const*>(std::ostream&, tcu::Format::Array<int const*> const&) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081==    by 0xD9F922: std::ostream& tcu::Format::operator<< <int>(std::ostream&, tcu::Format::ArrayPointer<int> const&) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081==    by 0xEE0160: tcu::MessageBuilder& tcu::MessageBuilder::operator<< <tcu::Format::ArrayPointer<int> >(tcu::Format::ArrayPointer<int> const&) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081==    by 0xE9E45F: glu::CallLogWrapper::glGetIntegerv(unsigned int, int*) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081==    by 0xA611D2: deqp::gles31::Functional::NegativeTestShared::get_nuniformfv(deqp::gles31::Functional::NegativeTestShared::NegativeTestContext&) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081==    by 0x7230DF: deqp::gles31::Functional::(anonymous namespace)::TestFunctionWrapper::call(deqp::gles31::Functional::(anonymous namespace)::DebugMessageTestContext&) const (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081==    by 0x725DA1: deqp::gles31::Functional::(anonymous namespace)::CallbackErrorCase::iterate() (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081==    by 0x6DCAD3: deqp::gles31::TestCaseWrapper::iterate(tcu::TestCase*) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)
==12081==    by 0xF9E157: tcu::TestSessionExecutor::iterateTestCase(tcu::TestCase*) (in /home/djdeath/src/mesa-src/VK-GL-CTS/build-es31/modules/gles31/deqp-gles31)

I'm not sure whether that's related, might be worth fixing though (trying to write some patches).

Revision history for this message

In freedesktop.org Bugzilla #104778, Samuel-sieb (samuel-sieb) wrote on 2018-04-15:

I just had my Gnome desktop crash and the only info in the log was:
i965: Failed to submit batchbuffer: Bad address

This is on Fedora 27, kernel 4.15.9, mesa 17.3.6.

Revision history for this message

In freedesktop.org Bugzilla #104778, Mark-a-janes (mark-a-janes) wrote on 2018-04-15:

It's clear to me that this bug is not simply "CI ghosts". We have a bug in Mesa which is hard to trigger, and we hit it very occasionally with the exhaustive CI infrastructure.

What we need is ideas on how to narrow down the failure. Perhaps one of the branches that performs additional memory verification could help? I got nothing out of valgrind.

I'm eager to get suggestions on what to do next.

Revision history for this message

In freedesktop.org Bugzilla #104778, Greatquux (greatquux) wrote on 2018-05-03:

#10

I've also encountered some desktop crashes lately with

May 03 14:15:15 ossy /usr/lib/gdm3/gdm-x-session[5995]: i965: Failed to submit batchbuffer: Cannot allocate memory

but it's intermittent and yeah this sounds like a tough problem to solve.
Ubuntu 18.04; GNOME 3.28.1; Kernel 4.15.0-20-lowlatency; Intel HD Graphics 630 with modesetting on Xorg 1.19.6 (but might try the old intel driver)

Revision history for this message

In freedesktop.org Bugzilla #104778, Mark-a-janes (mark-a-janes) wrote on 2018-05-23:

#11

*** Bug 106621 has been marked as a duplicate of this bug. ***

Revision history for this message

Veril (veril459) wrote on 2018-10-03:

I just had this exact problem. Was watching a facebook video on chrome when everything crashed.

Oct 3 02:02:26 ******* anacron[1861]: Anacron 2.3 started on 2018-10-03
Oct 3 02:02:26 ******* anacron[1861]: Normal exit (0 jobs run)
Oct 3 02:05:01 ******* CRON[2715]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Oct 3 02:15:01 ******* CRON[5542]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Oct 3 02:15:22 ******* wpa_supplicant[1097]: wlp5s0: WPA: Group rekeying completed with ******* [GTK=TKIP]
Oct 3 02:17:01 ******* CRON[6114]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)

Oct 3 02:20:58 ******* /usr/lib/gdm3/gdm-x-session[1208]: i965: Failed to submit batchbuffer: Bad address

Oct 3 02:20:58 ******* org.gnome.Nautilus[1267]: XIO: fatal IO error 11 (Resource temporarily unavailable) on X server ":0"
Oct 3 02:20:58 ******* org.gnome.Nautilus[1267]: after 5587 requests (5587 known processed) with 0 events remaining.
Oct 3 02:20:58 ******* deluge.desktop[14891]: deluge: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Oct 3 02:20:58 ******* [6122]: nomacs: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.

themusicgod1 (themusicgod1) on 2018-10-03

tags:

added: bionic

Timo Aaltonen (tjaalton) on 2018-10-04

affects:

xserver-xorg-video-intel (Ubuntu) → mesa (Ubuntu)

Revision history for this message

themusicgod1 (themusicgod1) wrote on 2018-10-25:

also happened with

Linux: 4.15.0-34-generic #37-Ubuntu SMP Mon Aug 27 15:21:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Revision history for this message

In freedesktop.org Bugzilla #104778, Mark-a-janes (mark-a-janes) wrote on 2018-10-29:

#12

One of the tests that seems to reproduce this more often than others:

dEQP-GLES31.functional.debug.negative_coverage.get_error.vertex_array.draw_arrays_instanced_incomplete_primitive

produces on stderr:
corrupted size vs. prev_size
or
corrupted double-linked list

Seen on bxt, bdw, bsw, ivb

Revision history for this message

In freedesktop.org Bugzilla #104778, Martin-peres-n (martin-peres-n) wrote on 2018-11-01:

#13

We hit this bug twice in a week, and then nothing since then (5 months and 1 week). I wonder if newer kernels fixed this issue. What is the most up to date kernel that has shown this issue?

Revision history for this message

In freedesktop.org Bugzilla #104778, Mark-a-janes (mark-a-janes) wrote on 2018-11-01:

#14

4.18. If you have a suggestion for what to run, I'll update.

Revision history for this message

In freedesktop.org Bugzilla #104778, Martin-peres-n (martin-peres-n) wrote on 2018-11-01:

#15

(In reply to Mark Janes from comment #11)
> 4.18. If you have a suggestion for what to run, I'll update.

Our CI last saw it on Linux: 4.17.0-rc6. So I guess we are just lucky...

Bug Watch Updater (bug-watch-updater) on 2018-11-07

Changed in xserver-xorg-video-intel (Fedora):
importance:	Unknown → Medium
status:	Unknown → Incomplete

Revision history for this message

Launchpad Janitor (janitor) wrote on 2018-12-17:

#16

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in mesa (Ubuntu):
status:	New → Confirmed

Revision history for this message

mike@papersolve.com (mike-papersolve) wrote on 2018-12-17:

#17

Just hit me, kernel 4.19.8-liquorix-amd64, Ubuntu 18.10. I was running with modesetting, but decided to change to intel driver (freshly compiled from git) to see if I run into it again (it seems pretty rare though so will probably be tough to hit). Has anyone definitely hit it on xf86-video-intel or only modesetting?

Revision history for this message

In freedesktop.org Bugzilla #104778, Lakshminarayana-vudum (lakshminarayana-vudum) wrote on 2019-02-14:

#18

Last seen this issue on our CI system is 8 months, 3 weeks / 4968 runs ago.
Can we close this issue?

Revision history for this message

In freedesktop.org Bugzilla #104778, Mark-a-janes (mark-a-janes) wrote on 2019-02-14:

#19

The problematic tests have been disabled in mesa ci since June 2018. If you think this is fixed, than I can re-enable them.

Mesa CI updated it's kernels to 4.19 recently, but otherwise there has been no change to affect this bug.

Revision history for this message

In freedesktop.org Bugzilla #104778, Mark-a-janes (mark-a-janes) wrote on 2019-02-16:

#20

Mesa CI reproduce these test failures immediately:

https://mesa-ci.01.org/mesa_master/builds/15252/group/63a9f0ea7bb98050796b649e85481845

Builds have fairly recent kernels:

Linux otc-gfxtest-sklgt2-01 4.19.0-1-amd64 #1 SMP Debian 4.19.12-1 (2018-12-22) x86_64 GNU/Linux

Bug Watch Updater (bug-watch-updater) on 2019-02-16

Changed in xserver-xorg-video-intel (Fedora):
status:	Incomplete → Confirmed

Revision history for this message

In freedesktop.org Bugzilla #104778, Martin-peres-n (martin-peres-n) wrote on 2019-03-08:

#21

(In reply to Mark Janes from comment #15)
> Mesa CI reproduce these test failures immediately:
>
> https://mesa-ci.01.org/mesa_master/builds/15252/group/
> 63a9f0ea7bb98050796b649e85481845
>
> Builds have fairly recent kernels:
>
> Linux otc-gfxtest-sklgt2-01 4.19.0-1-amd64 #1 SMP Debian 4.19.12-1
> (2018-12-22) x86_64 GNU/Linux

Thanks for the info!

I'll treat this as a mesa bug and since we are using your blacklist, we should be safe to just ignore it from our side. I'll close our kernel issue.

Thanks to everyone involved!

Revision history for this message

In freedesktop.org Bugzilla #104778, Cibuglog (cibuglog) wrote on 2019-03-08:

#22

The CI Bug Log issue associated to this bug has been archived.

New failures matching the above filters will not be associated to this bug anymore.

Revision history for this message

In freedesktop.org Bugzilla #104778, Oss-linuxpf (oss-linuxpf) wrote on 2019-03-12:

#23

I saw this once.

[Environment]
CPU: SkyLake(core i5 6500TE)
Distribution: debian(customised)
Kernel: 4.14.98
Mesa: 18.3.3
libdrm: 2.4.89

Message from stdout of drawing module was
----
i965: Failed to submit batchbuffer: Bad address
----

and back-trace were following
----
:
:
#5 0x00007f4496240b35 in exit () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x00007f44864d1a5d in submit_batch (out_fence_fd=0x0, in_fence_fd=<optimized out>, brw=0x47ee030) at intel_batchbuffer.c:838
#7 _intel_batchbuffer_flush_fence (line=<optimized out>, file=<optimized out>, out_fence_fd=0x0, in_fence_fd=<optimized out>, brw=0x47ee030) at intel_batchbuffer.c:891
#8 _intel_batchbuffer_flush_fence (brw=0x47ee030, in_fence_fd=<optimized out>, out_fence_fd=0x0, file=<optimized out>, line=<optimized out>) at intel_batchbuffer.c:852
#9 0x00007f44864a558a in brw_draw_single_prim (stream=<optimized out>, xfb_obj=0x0, prim_id=0, prim=0x7ffff9aa77d0, ctx=0x47ee030, indirect=<optimized out>) at brw_draw.c:898
#10 brw_draw_prims (ctx=0x47ee030, prims=<optimized out>, nr_prims=1, ib=<optimized out>, index_bounds_valid=<optimized out>, min_index=0, max_index=3, gl_xfb_obj=0x0, stream=0, indirect=0x0) at brw_draw.c:1107
#11 0x00007f448608063c in _mesa_draw_arrays (drawID=0, baseInstance=0, numInstances=1, count=4, start=0, mode=6, ctx=0x47ee030) at main/draw.c:408
#12 _mesa_draw_arrays (ctx=0x47ee030, mode=6, start=0, count=4, numInstances=1, baseInstance=0, drawID=0) at main/draw.c:385
#13 0x00007f4486081344 in _mesa_exec_DrawArrays (mode=6, start=0, count=4) at main/draw.c:565
:
:
----

Revision history for this message

In freedesktop.org Bugzilla #104778, Oss-linuxpf (oss-linuxpf) wrote on 2019-03-12:

#24

(In reply to Yoshinori Gento from comment #18)
> I saw this once.
This occurred in our product.

Revision history for this message

In freedesktop.org Bugzilla #104778, Mark-a-janes (mark-a-janes) wrote on 2019-03-12:

#25

Yoshinori: Mesa i965 team is seeking a way to reproduce this bug, so we can analyze and fix it.

How often does this occur in your product? If it is reproducible, then perhaps we can use an apitrace to investigate the root cause.

Revision history for this message

In freedesktop.org Bugzilla #104778, Oss-linuxpf (oss-linuxpf) wrote on 2019-03-13:

#26

(In reply to Mark Janes from comment #20)
> Yoshinori: Mesa i965 team is seeking a way to reproduce this bug, so we can
> analyze and fix it.
>
> How often does this occur in your product? If it is reproducible, then
> perhaps we can use an apitrace to investigate the root cause.

While I operated in about 1month * 4 machines,
I saw this problem only once.

So, I don't know how to reproduce this.
But when I saw this, I executed 'cp' command on xterm for copy some files. (I think that I do not matter.)

I keep operating machine to know frequency.

Revision history for this message

In freedesktop.org Bugzilla #104778, Mark-a-janes (mark-a-janes) wrote on 2019-03-13:

#27

Hmm...`cp` in xterm is a pretty clear indicator that this issue is random and not triggered by a specific workload.

Lionel suggested that it would be good to have a feedback from the kernel about what didn't pass validation.

There is a kernel option to generate debug traces for that but you have to recompile your kernel with that option. Lionel, can you provide some details?

It would be a good data point to see if a much older kernel produces this error (eg 4.9, 4.4). I can't deploy those kernels in Mesa i965 CI because they lack features needed to run our Vulkan test suites.

Revision history for this message

In freedesktop.org Bugzilla #104778, Lionel-g-landwerlin (lionel-g-landwerlin) wrote on 2019-03-14:

#28

(In reply to Mark Janes from comment #22)
> Hmm...`cp` in xterm is a pretty clear indicator that this issue is random
> and not triggered by a specific workload.
>
> Lionel suggested that it would be good to have a feedback from the kernel
> about what didn't pass validation.
>
> There is a kernel option to generate debug traces for that but you have to
> recompile your kernel with that option. Lionel, can you provide some
> details?
>
> It would be a good data point to see if a much older kernel produces this
> error (eg 4.9, 4.4). I can't deploy those kernels in Mesa i965 CI because
> they lack features needed to run our Vulkan test suites.

With the kernel compiled with CONFIG_DRM_I915_DEBUG_GEM and the following command issued as root :

echo 15 > /sys/module/drm/parameters/debug

You should be able to get some traces about why the execbuffer failed.

Unfortunately that generates a lot of traces...

Revision history for this message

In freedesktop.org Bugzilla #104778, Greatquux (greatquux) wrote on 2019-03-14:

#29

I haven't encountered this issue at all since moving away from modesetting and back to the intel DDX driver. So whatever extra exercises GLAMOR was doing may be triggering the bug. I'm sure that doesn't help actually fix it but it might at least help people experiencing it to have a more stable desktop.

Revision history for this message

In freedesktop.org Bugzilla #104778, Oss-linuxpf (oss-linuxpf) wrote on 2019-03-15:

#30

I saw this problem three times from yesterday.
All of them occurred during file sync over LAN with rsync.
I think this problem might be related to load by disk i/o or network i/o.
But unfortunately I have not re-compiled kernel with CONFIG_DRM_I915_DEBUG_GEM yet.
I will try to it next week.

Revision history for this message

In freedesktop.org Bugzilla #104778, Oss-linuxpf (oss-linuxpf) wrote on 2019-03-25:

#31

Created attachment 143768
Debug trace.

I got debug traces. Please see attached file.
PID needs to be checked is 2602.
After that this process was exited with "i965: Failed to submit batchbuffer: Bad address".
At that time I repeated to copy and delete of files by rsync.

Note: This is occurred on core i3-6100E. Software version are same as the above.

Revision history for this message

In freedesktop.org Bugzilla #104778, Oss-linuxpf (oss-linuxpf) wrote on 2019-05-07:

#32

I got how to reproduce.

Cached memory grows big by reading many files and free RAM becomes empty.
In this situation (repeat release and allocate caches frequently), drawing process faces this problem.

Does conflict of memory cause this problem?

Revision history for this message

In freedesktop.org Bugzilla #104778, Denys-kostin (denys-kostin) wrote on 2019-08-08:

#33

Hello Yoshinori Gento

>I got how to reproduce.
Does this mean that you could provide an apitrace or somekind of reproducer? It would be really helpful.

Revision history for this message

In freedesktop.org Bugzilla #104778, Oss-linuxpf (oss-linuxpf) wrote on 2019-08-29:

#34

(In reply to Denis from comment #28)
> Hello Yoshinori Gento
>
> >I got how to reproduce.
> Does this mean that you could provide an apitrace or somekind of reproducer?
> It would be really helpful.

Hello Denis

I didn't produce an apitrace nor reproducer.
I updated kernel to 4.19.57.
Then, this problem became hard to occur, but still occurs.

Bug Watch Updater (bug-watch-updater) on 2019-09-02

Changed in xserver-xorg-video-intel (Fedora):
status:	Confirmed → Incomplete

Revision history for this message

In freedesktop.org Bugzilla #104778, Gitlab-migration (gitlab-migration) wrote on 2019-09-25:

#35

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1680.

Bug Watch Updater (bug-watch-updater) on 2019-09-26

Changed in xserver-xorg-video-intel (Fedora):
status:	Incomplete → Unknown

Bug Watch Updater (bug-watch-updater) on 2019-11-10

Changed in mesa:
status:	Unknown → New

Bug Watch Updater (bug-watch-updater) on 2019-11-28

Changed in mesa:
status:	New → Fix Released

Revision history for this message

themusicgod1 (themusicgod1) wrote on 2019-11-28:

#36

TFA above suggests that the change is related to 4.19, but the above shows that 4.15 was also affected. Does this matter?

Revision history for this message

mlissner (mlissner-michaeljaylissner) wrote on 2020-01-01:

#37

I'm encountering this about once/day on my Lenovo X1 carbon laptop with kernel 5.3.0-24-generic.

I'm a developer, but I don't know much beyond that background about how to fix this. I just know that I'm kind of quickly losing my mind, and that it wasn't happening when I first started using the laptop.

Are there any solutions that I can try? I can run commands but if somebody says to install a particular kernel, that's a half-day affair that'll 50/50 fail as I research things. I'm happy to help though if there's any way I can, because I can't live with a laptop that completely crashes every 10 hours or so.

Anything I can do?

Revision history for this message

Lars Falk-Petersen (julenissen) wrote on 2020-01-02:

#38

Just got this on 19.10 on a thinkpad t470s.

$ uname -a
Linux hostname 5.3.0-24-generic #26-Ubuntu SMP Thu Nov 14 01:33:18 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

I run ZFS. My first thought was OOM, but can't see any mentions of it.

...
jan. 02 09:51:06 hostname gnome-shell[27297]: Usage of object.actor is deprecated for ArgosButton
                                                get@resource:///org/gnome/shell/ui/environment.js:249:29
                                                _processOutput@/<email address hidden>
                                                wrapper@resource:///org/gnome/gjs/modules/_legacy.js:82:22
                                                _update/<@/<email address hidden>.
                                                spawnWithCallback/<@/<email address hidden>
                                                readStream/<@/<email address hidden>
jan. 02 09:51:30 hostname /usr/lib/gdm3/gdm-x-session[27136]: i965: Failed to submit batchbuffer: Bad address
jan. 02 09:51:30 hostname seahorse[8246]: seahorse: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
jan. 02 09:51:30 hostname firefox[986]: firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
jan. 02 09:51:30 hostname gnome-shell[27297]: Gdk-Message: 09:51:30.655: /usr/lib/firefox/firefox: Fatal IO error 11 (Resource te
jan. 02 09:51:30 hostname at-spi-bus-launcher[27266]: X connection to :0 broken (explicit kill or server shutdown).
jan. 02 09:51:30 hostname pulseaudio[26437]: X connection to :0 broken (explicit kill or server shutdown).
...

Revision history for this message

Carl-Frederik Hallberg (tfiskgul) wrote on 2020-03-06:

#39

Also encountered this:

$ uname -a
Linux kaltop 5.3.0-40-generic #32-Ubuntu SMP Fri Jan 31 20:24:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 19.10
Release: 19.10
Codename: eoan

mar 06 12:44:09 kaltop /usr/lib/gdm3/gdm-x-session[2575]: i965: Failed to submit batchbuffer: Bad address
mar 06 12:44:09 kaltop gnome-shell[2782]: X IO Error
mar 06 12:44:09 kaltop pulseaudio[2557]: X connection to :0 broken (explicit kill or server shutdown).
mar 06 12:44:09 kaltop WebKitWebProces[18049]: WebKitWebProcess: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
mar 06 12:44:09 kaltop gnome-shell[2782]: Gdk-Message: 12:44:09.213: /usr/lib/firefox/firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.

Revision history for this message

Pablo A. Garcia (pgrodriguez) wrote on 2020-05-08:

#40

Also facing this issue persistently, always when working with little memory available (as #32).

My system: Dell XPS 13 9370 - Ubuntu 18.04.4 LTS

$ uname -a
Linux charon 5.3.0-51-generic #44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ journalctl --since=-2w | grep -A1 'Failed to submit batchbuffer'
abr 27 17:17:09 charon org.gnome.Shell.desktop[3359]: i965: Failed to submit batchbuffer: Dirección incorrecta
abr 27 17:17:09 charon kernel: mce: CPU3: Core temperature above threshold, cpu clock throttled (total events = 2304)
--
abr 27 18:23:14 charon /usr/lib/gdm3/gdm-x-session[3239]: i965: Failed to submit batchbuffer: Bad address
abr 27 18:23:14 charon org.gnome.Shell.desktop[3359]: [9820:9820:0427/182314.955599:ERROR:chrome_browser_main_extra_parts_x11.cc(62)] X IO error received (X server probably went away)
--
abr 27 21:59:47 charon org.gnome.Shell.desktop[24418]: i965: Failed to submit batchbuffer: Dirección incorrecta
abr 27 21:59:48 charon org.gnome.Shell.desktop[24418]: [20630:20630:0427/215948.650070:ERROR:sandbox_linux.cc(374)] InitializeSandbox() called with multiple threads in process gpu-process.
--
abr 28 11:32:50 charon /usr/lib/gdm3/gdm-x-session[24313]: i965: Failed to submit batchbuffer: Bad address
abr 28 11:32:50 charon dropbox[24681]: dropbox: Fatal IO error 11 (Recurso no disponible temporalmente) on X server :0.
--
may 06 11:10:23 charon /usr/lib/gdm3/gdm-x-session[6485]: i965: Failed to submit batchbuffer: Bad address
may 06 11:10:23 charon org.gnome.Shell.desktop[6634]: [8456:8456:0506/111023.129635:ERROR:gl_surface_presentation_helper.cc(259)] GetVSyncParametersIfAvailable() failed for 1 times!
--
may 08 11:07:53 charon /usr/lib/gdm3/gdm-x-session[6255]: i965: Failed to submit batchbuffer: Bad address
may 08 11:07:53 charon dropbox[6661]: dropbox: Fatal IO error 11 (Recurso no disponible temporalmente) on X server :0.

Revision history for this message

Pablo A. Garcia (pgrodriguez) wrote on 2020-05-08:

#41

Previous issue (https://gitlab.freedesktop.org/mesa/mesa/-/issues/1680) was closed as duplicated of this one.

see https://gitlab.freedesktop.org/mesa/mesa/-/issues/1680#note_328635.

Changed in mesa:
status:	Fix Released → Unknown

Sebastien Bacher (seb128) on 2020-05-11

Changed in mesa (Ubuntu):
importance:	Undecided → Low
status:	Confirmed → Triaged

Bug Watch Updater (bug-watch-updater) on 2022-05-20

Changed in mesa:
status:	Unknown → New

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

freedesktop-bugs #104778
[RESOLVED MOVED] Edit
auto-gitlab.freedesktop.org-mesa-mesa #1680
[closed Needs Information bugzilla i965] Edit
auto-gitlab.freedesktop.org-mesa-mesa-- #1680
[closed Needs Information bugzilla i965] Edit
auto-gitlab.freedesktop.org-mesa-mesa-- #2134
[opened Needs Information i965 iris] Edit

Bug watches keep track of this bug in other bug trackers.