[i965]xorg on intel freezes on startup when using a kernel with CONFIG_HIGHMEM64G and AccelMethod is set to UXA (UXA bug)

Bug #322356 reported by nikos
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xf86-video-intel
Fix Released
Critical
xserver-xorg-video-intel (Ubuntu)
Invalid
High
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-intel

On Ubuntu Jaunty Alpha3 with all the latest updates(28.1.2009) the xorg freezes when starting, only when the system is running a kernel with the option CONFIG_HIGHMEM64G enabled.
This has been experienced on thinkpad x300 with 4GB of memory, with a GM965/GL960 graphics controller.

I firstly experienced this problem when i tried the linux-server kernel. The system boots fine, but when i start X then the graphics completely freeze. I am able to reboot the machine with alt-ctrl-delete, although nothing is shown on the screen until it resets.
The same system with the linux-generic kernel works fine.

I recompiled the kernel shipped with jaunty, using the linux-generic config file and changing only the kernel support to be for 64GB of memory CONFIG_HIGHMEM64G=y. The result was the exactly the same like using the linux-server kernel. The graphics freeze but the computer can be rebooted.
UPDATE: After some further tests, the problem exist only when "Option "AccelMethod" "UXA"" is used. With the AccelMethod set to EXA, the X server works also with CONFIG_HIGHMEM64G kernels

packages: xserver-xorg-video-intel 2:2.6.1-1ubuntu1
kernel: 2.6.28-5-server

[lspci]
00:00.0 Host bridge [0600]: Intel Corporation Mobile PM965/GM965/GL960 Memory Controller Hub [8086:2a00] (rev 0c)
 Subsystem: Lenovo Device [17aa:20b3]
00:02.0 VGA compatible controller [0300]: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller [8086:2a02] (rev 0c)
 Subsystem: Lenovo Device [17aa:20b5]

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Created an attachment (id=19626)
kernel .config

Using nopat on the kernel commandline causes the Xserver to "only" hang with movable cursor instead of kernel panics. For non-CONFIG_HIGHMEM64G kernels, the Xserver works with and without nopat. This is a Lenovo T61 with 965GM integrated graphics. Adding kernel .config and Xorg.log from the nopat case.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Created an attachment (id=19627)
Xorg.log with nopat

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

The actual problem is this: the i915 drm-driver feeds pages allocated using kcalloc into the agp subsystem. On systems having more than 3G of RAM, these pages may have physical addresses beyond the 4GB boundary, thus being unreachable for the (current?) agp implementation. On its way into the agp subsystem, the extra bits are chopped off, and if the GPU writes anything in that space, it is probably overwriting kernel memory. If i find out how to allocate memory in the low 4GB, a patch will follow.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Sorry, the above was incorrect. The bad physical addresses are from the mapping of some inode in i915_gem.c:1087.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Created an attachment (id=19734)
Guard against highmem pages before putting them into the agp subsystem

This attachment adds a simple check against highmem pages, making memory corruption on this way impossible. This may be a problem on 64bit kernels, if the "unsigned long" used as address data type in the agp subsystem is 64bit wide. On the other hand i suspect that the agp subsystem doesn't handle addresses above 4G well.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Created an attachment (id=19735)
Guard against highmem pages fetched from shmem file

The attached patch is purely diagnostic, to bail out early when we get highmem pages.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Created an attachment (id=19736)
Make the gem shmem file only allocate in GFP_DMA32

Attached patch finally fixes the problem in this bug, making the shmem subsystem return only pages from GFP_DMA32, instead of GFP_HIGHMEM.

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

AGP really just need to get fixed for >32-bit addresses. It shouldn't be too hard of a job.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Created an attachment (id=19786)
Modify agp subsystem to handle dma_addr_t physical addresses

This one lets my session start up until the opengl compositing manager starts. I suspect DRM_IOCTL_AGP_ALLOC needs to be extended to pass physical addresses using 64bit to userspace. I can imagine that the __u32 is too small for 64bit kernels, too(Given a sufficient amount of memory available). Another place i left of was passing physical addresses to intelfb, which then passes them to mtrr, which again is using unsigned longs for physical addresses.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Created an attachment (id=19821)
Modify agp subsystem to handle dma_addr_t physical addresses

Remove accidental drm api change. physical pages are not used in userspace for i9xx after all.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Created an attachment (id=19824)
Use offset of agp area for gtt variant of i915_gem_gtt_pwrite

This one corrects 264c96fe844237c3a5af92a7ee1f2bea4836ad4d.

Using this patch and the AGP changes in attachment 19821 i can get
my kde4 session to start up completely. I tested on top of the old for-review-branch, forcing the slow path in i915_gem_gtt_pwrite, so i ran into a few issues ;-).

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Created an attachment (id=19827)
Modify agp subsystem to handle dma_addr_t physical addresses

This one should be the last revision of this patch. Removed the remaining api change in agpgart.h, included changes to the other agp backends, changing the prototype of the mask_memory functions. This one is tested and works on top of 57742578dc476ef5d1a06b08f61da0aae32185f4.

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

Applied 19827 to for-airlied and drm-intel-next. Was reviewed by Arjan as well.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

The patch does not work correctly, quoting Dave Airlie:

> So we have calls to set_memory_array_uc that used to take unsigned
> long *, they now take dma_addr_t *... this would be an issue.

This will break all users of agp_generic_alloc_pages and agp_generic_destroy_pages on CONFIG_HIGHMEM64G systems.

Soo.. For me, the obvious solution would be to iterate over the array, calling set_memory_uc. Since the code obviously does not check for errors from set_memory_array_uc, this should work the same. Similar for set_memory_array_wb. Patch will follow.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Created an attachment (id=21075)
Use set_memory_{uc,wb} instead of set_memory_array_{uc,wb} for CONFIG_HIGHMEM64G

This one also fixes two unrelated warnings. I should have looked for Warnings in an earlier iteration..

Revision history for this message
In , Michael Fu (michael-fu-intel) wrote :

*** Bug 19003 has been marked as a duplicate of this bug. ***

Revision history for this message
In , pgzh (peter-ganzhorn) wrote :

Is there a patch that is safe to use yet?
Kernel 2.6.28 is released and I can't get my X4500HD (G45) to work with it - X just does not want to start and leaves me with the following:

(EE) intel(0): Failed to pin front buffer: Cannot allocate memory

Fatal server error:
Couldn't bind memory for BO front buffer

I had a look at my kernel config but can't find CONFIG_HIGHMEM* anywhere, I recall something like this config option isn't available on x86-64 kernels - right?

If you need someone to test some patches for the kernel - I volunteer, you just have to tell me what patches I have to apply and in what order.
The testing system I have is the following:
Asus P5Q-EM Board, G45 chipset
8 GB of RAM (it's obvious I'd like to have 64GB highmem support)

Ubuntu 8.10 for 64-Bit CPUs running a self-compiled kernel (2.6.27.10 for now)
xf86-video-intel 2.5.1
libdrm 2.4.1

I tried to run kernel 2.6.28 with xf86-video-intel 2.5.99.1 and libdrm 2.4.3 ending up with the mentioned allocation error.
So if you have a patch I can test with 2.6.28, please tell me how to use it and I'll gladly tell you if it works for me ;)

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

(In reply to comment #17)
> Is there a patch that is safe to use yet?
> Kernel 2.6.28 is released and I can't get my X4500HD (G45) to work with it - X
> just does not want to start and leaves me with the following:
>
> (EE) intel(0): Failed to pin front buffer: Cannot allocate memory

This seems a separate issue, i.e. bug#19179. Are you using big "Virtual" value in xorg.conf?

Revision history for this message
In , pgzh (peter-ganzhorn) wrote :

I am not using the "virtual" value at all - my screen size is 1920x1200 if that matters.

The problem does only occur with 2.6.28, until 2.6.28-rc4 X simply freezes on startup. Any kernel >= 2.6.28-rc4 gives me the mentioned allocation error.

Since there was a patch "drm/ i915: GEM on PAE has problems - disable it for now." in -rc4, I think it somehow is related to it (because I didn't have the freezes after it, but X still did not start)
Are you sure my problem is unrelated to this bug?

Revision history for this message
In , Gordon Jin (gordon-jin) wrote :

(In reply to comment #19)
> Are you sure my problem is unrelated to this bug?
Unsure. I take back my comment#18, as you're not using "Virtual".
I'll let Eric to comment.
Eric, is this bug dup with bug#18082?

Revision history for this message
In , pgzh (peter-ganzhorn) wrote :

Here's a bit more information, gathered with 2.6.28 (Vanilla), libdrm 2.4.1 and xf86-video-intel 2.5.1:

cat /var/log/Xorg.0.log | grep -e '(WW)' -e '(EE)'
(WW) intel(0): libpciaccess reported 0 rom size, guessing 64kB
(WW) intel(0): Allocation error, framebuffer compression disabled
(EE) intel(0): Failed to pin front buffer: Cannot allocate memory

And here's some (I guess serious) errors in dmesg of 2.6.28, produced by the attempt to start X:

mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
resource map sanity check conflict: 0xd0000000 0xdfffffff 0xd0000000 0xd7feffff vesafb
------------[ cut here ]------------
WARNING: at arch/x86/mm/ioremap.c:226 __ioremap_caller+0x339/0x380()
Modules linked in: w83627ehf hwmon_vid
Pid: 3057, comm: Xorg Not tainted 2.6.28-pgzh #1
Call Trace:
 [<ffffffff8025b294>] warn_on_slowpath+0x64/0xa0
 [<ffffffff80261228>] iomem_map_sanity_check+0x98/0xc0
 [<ffffffff80247ad9>] __ioremap_caller+0x339/0x380
 [<ffffffff80491bdf>] i915_gem_entervt_ioctl+0x2cf/0x5a0
 [<ffffffff80491bdf>] i915_gem_entervt_ioctl+0x2cf/0x5a0
 [<ffffffff80491910>] i915_gem_entervt_ioctl+0x0/0x5a0
 [<ffffffff80480bf2>] drm_ioctl+0x112/0x340
 [<ffffffff802cc335>] vfs_ioctl+0x85/0xb0
 [<ffffffff802cc3dc>] do_vfs_ioctl+0x7c/0x460
 [<ffffffff802cc809>] sys_ioctl+0x49/0x80
 [<ffffffff8022a36b>] system_call_fastpath+0x16/0x1b
---[ end trace 7b9ce6d857e6ff4d ]---
[drm:i915_gem_object_bind_to_gtt] *ERROR* GTT full, but LRU list empty
[drm:i915_gem_object_pin] *ERROR* Failure to bind: -12<4>Clocksource tsc unstable (delta = -143772081 ns)
iounmap: bad address ffffc20011780000
Pid: 3057, comm: Xorg Tainted: G W 2.6.28-pgzh #1
Call Trace:
 [<ffffffff804911d9>] i915_gem_leavevt_ioctl+0x39/0x50
 [<ffffffff80480bf2>] drm_ioctl+0x112/0x340
 [<ffffffff802cc335>] vfs_ioctl+0x85/0xb0
 [<ffffffff802cc3dc>] do_vfs_ioctl+0x7c/0x460
 [<ffffffff802748d0>] hrtimer_wakeup+0x0/0x30
 [<ffffffff80673f9e>] do_nanosleep+0x7e/0xd0
 [<ffffffff802cc809>] sys_ioctl+0x49/0x80
 [<ffffffff80275487>] sys_nanosleep+0x77/0x80
 [<ffffffff8022a36b>] system_call_fastpath+0x16/0x1b
Xorg[3057]: segfault at 0 ip 00007f8c04da265f sp 00007fff10a12f10 error 6 in intel_drv.so[7f8c04d5a000+69000]

Without vesafb (just tried it because of the vesafb-related error):
mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
[drm:i915_gem_object_bind_to_gtt] *ERROR* GTT full, but LRU list empty
[drm:i915_gem_object_pin] *ERROR* Failure to bind: -12<3>iounmap: bad address ffffc20011280000
Pid: 3037, comm: Xorg Not tainted 2.6.28-pgzh #3
Call Trace:
 [<ffffffff8048d4b9>] i915_gem_leavevt_ioctl+0x39/0x50
 [<ffffffff8047ced2>] drm_ioctl+0x112/0x340
 [<ffffffff802545ea>] set_next_entity+0x3a/0x80
 [<ffffffff802cc335>] vfs_ioctl+0x85/0xb0
 [<ffffffff802cc3dc>] do_vfs_ioctl+0x7c/0x460
 [<ffffffff802cc809>] sys_ioctl+0x49/0x80
 [<ffffffff8022a36b>] system_call_fastpath+0x16/0x1b
Xorg[3037]: segfault at 0 ip 00007f49f10e765f sp 00007ffffcd57f50 error 6 in intel_drv.so[7f49f109f000+69000]

Please tell me if this is some different bug and if I should file a new bug report in that case.

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

Peter: If you ever have doubt about whether you've got the same bug, just open a new report. Your information doesn't seem to be related to this bug at all.

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

*** Bug 18082 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

GEM should now be disabled with PAE, which at least fixes the corruption. We need to get a version of these patches that airlied will accept.

Revision history for this message
In , Mh-familie-heinz (mh-familie-heinz) wrote :

I just updated to 2.6.29-rc1 without applying any additional patches and CONFIG_HIGHMEM64G=y and it didn't crash. But i took a deeper look and saw that those patches are not part of -rc1.

Is this bug fixed otherwise?

Revision history for this message
In , Peng-li (peng-li) wrote :

please look at another bug http://bugs.freedesktop.org/show_bug.cgi?id=19415,
I got totally different result from this one.

Revision history for this message
nikos (frangakis) wrote : xorg on intel freezes on startup when using a kernel with CONFIG_HIGHMEM64G

Binary package hint: xserver-xorg-video-intel

On Ubuntu Jaunty Alpha3 with all the latest updates(28.1.2009) the xorg freezes when starting, only when the system is running a kernel with the option CONFIG_HIGHMEM64G enabled.
This has been experienced on thinkpad x300 with 4GB of memory, with a GM965/GL960 graphics controller.

I firstly experienced this problem when i tried the linux-server kernel. The system boots fine, but when i start X then the graphics completely freeze. I am able to reboot the machine with alt-ctrl-delete, although nothing is shown on the screen until it resets.
The same system with the linux-generic kernel works fine.

I recompiled the kernel shipped with jaunty, using the linux-generic config file and changing only the kernel support to be for 64GB of memory CONFIG_HIGHMEM64G=y. The result was the exactly the same like using the linux-server kernel. The graphics freeze but the computer can be rebooted.

packages: xserver-xorg-video-intel 2:2.6.1-1ubuntu1
kernel: 2.6.28-5-server

nikos (frangakis)
description: updated
Revision history for this message
Geir Ove Myhr (gomyhr) wrote : Re: xorg on intel freezes on startup when using a kernel with CONFIG_HIGHMEM64G and AccelMethod is set to UXA

Thank you for reporting this bug and helping making ubuntu better. Could you please upload the output of `lspci -vvnn` and the log file /var/log/Xorg.0.log (both with UXA and EXA). If you use any other options in /etc/X11/xorg.conf, please upload this file as well.

Changed in xserver-xorg-video-intel:
status: New → Incomplete
Revision history for this message
nikos (frangakis) wrote :

Hello, here are the requested files

Revision history for this message
nikos (frangakis) wrote :
Revision history for this message
nikos (frangakis) wrote :
Revision history for this message
nikos (frangakis) wrote :
Geir Ove Myhr (gomyhr)
description: updated
Changed in xserver-xorg-video-intel:
status: Incomplete → Confirmed
Revision history for this message
Sean McNamara (smcnam) wrote : Re: [i965] xorg on intel freezes on startup when using a kernel with CONFIG_HIGHMEM64G and AccelMethod is set to UXA

This is a known issue. If you have >= 4GB RAM, you _currently_ have two options to get a working desktop:

1. Use a regular x86 kernel, a la the -generic flavor. This has no PAE, so you're limited to about 3GB of mappable system memory.
2. Use an x86_64 kernel.

The linux-server kernel is currently a big "no-no" for Jaunty if you have Intel graphics. Check the upstream bug report for status.

Changed in xserver-xorg-video-intel:
status: Unknown → In Progress
Revision history for this message
In , Eric Anholt (eric-anholt) wrote :

*** Bug 19739 has been marked as a duplicate of this bug. ***

Revision history for this message
In , Sven (sven-koehler) wrote :

(In reply to comment #24)
> GEM should now be disabled with PAE, which at least fixes the corruption. We
> need to get a version of these patches that airlied will accept.

You should at least print some warning or information to the logs, that PAE is not supported.

Will GEM work with PAE at some future point? No NX bit protection without PAE.

Bryce Harrington (bryce)
Changed in xserver-xorg-video-intel:
importance: Undecided → High
status: Confirmed → Triaged
Bryce Harrington (bryce)
summary: - [i965] [UXA] xorg on intel freezes on startup when using a kernel with
- CONFIG_HIGHMEM64G and AccelMethod is set to UXA
+ [i965]xorg on intel freezes on startup when using a kernel with
+ CONFIG_HIGHMEM64G and AccelMethod is set to UXA (UXA bug)
Revision history for this message
Bryce Harrington (bryce) wrote :

We have a PPA with some new tools for debugging X freezes:

  https://launchpad.net/~ubuntu-x-swat/+archive/x-freeze-test

I know how irritating X freezes can be. They're also typically
quite hard to debug, but the information provided by these new
tools should help upstream figure them out.

You can help by doing the following:

 A. Install the PPA packages on Jaunty and boot kernel 2.6.30-rc2
 B. Reproduce your freeze
 C. ssh into the machine and run the steps to collect the info
 D. Attach the tarball of the results to this bug report

With this information, we'll be able to forward your bug upstream.

(For more information on triaging X freeze bugs, see
 https://wiki.ubuntu.com/X/Troubleshooting/Freeze )

Changed in xserver-xorg-video-intel (Ubuntu):
status: Triaged → New
status: New → Incomplete
Revision history for this message
Bryce Harrington (bryce) wrote :

Since there's been no response, I'm closing this as expired.

I notice that you are using version 2.6.1 of the driver, whereas we are shipping 2.6.3, which has a number of fixes especially for UXA. Please re-test against the Jaunty release and open a new bug report if this is still an issue.

Changed in xserver-xorg-video-intel (Ubuntu):
status: Incomplete → Invalid
Revision history for this message
In , Jesse Barnes (jbarnes-virtuousgeek) wrote :

Adjusting severity: crashes & hangs should be marked critical.

Revision history for this message
In , Pierre-pirsoft (pierre-pirsoft) wrote :

Fixed by commits
07613ba2f464f59949266f4337b75b91eb610795: agp: switch AGP to use page array instead of unsigned long array
95934f939c46ea2b37f3c91a4f8c82e003727761: drm/i915: enable GEM on PAE.
0b7af262aba912f52bc6ef76f1bc0960b01b8502: agp/intel: Make intel_i965_mask_memory use dma_addr_t for physical addresses

Changed in xserver-xorg-video-intel:
status: In Progress → Fix Released
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
Changed in xserver-xorg-video-intel:
importance: Critical → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.