Ubuntu

Kernel High Memory Support 64GB [PAE] incompatible with intel video UXA

Reported by Marius Gedminas on 2009-04-06
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Fix Released
Undecided
Unassigned
xf86-video-intel
Fix Released
Critical
linux (Ubuntu)
High
Andy Whitcroft

Bug Description

X hangs on startup, giving a black screen, when kernel has CONFIG_HIGHMEM64G, which means every -server kernel compiled by Ubuntu. The only work-around is to use a -generic kernel which doesn't see anything above 3 GB RAM.

This bug is important because as support for EXA will be dropped we will all be using UXA which means everyone who uses -server kernels or has 64GB mem option selected will be unable to do so.

[Original Bug Report]
In Jaunty with Intel GM965 video, trying to add Option "AccelMethod" "UXA" to /etc/X11/xorg.conf results in a system freeze with a black screen when X starts if you use the -server kernel. (CapsLock doesn't work, the magic SysRq works). Specifically, I was using linux-image-2.6.28-11-server version 2.6.28-11.40. The -generic server works (more or less, this is not the place to complain about other UXA bugs).

Why is this important (i.e. why would anyone run the -server kernel on a desktop)? This is why:

If a user installs ubuntu-xen-desktop (which is a package for running Xen "on desktops", so a natural thing to install on a desktop), apt pulls in linux-image-server as a dependency. GRUB then decides that the -server kernel is preferred to the regular -generic and boots into it (which is weird, but okay, maybe Xen needs that?).

I had to uninstall the -server kernel (and ubuntu-xen-desktop) as a workaround.

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
MachineType: LENOVO 646655G
Package: linux-image-2.6.28-11-generic 2.6.28-11.40
ProcCmdLine: root=UUID=34a7bfc5-59dc-4d74-a131-45b6ae4663b1 ro quiet splash vga=872
ProcEnviron:
 LC_CTYPE=lt_LT.UTF-8
 PATH=(custom, user)
 LANG=lt_LT.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.28-11.40-generic
SourcePackage: linux

Created an attachment (id=19626)
kernel .config

Using nopat on the kernel commandline causes the Xserver to "only" hang with movable cursor instead of kernel panics. For non-CONFIG_HIGHMEM64G kernels, the Xserver works with and without nopat. This is a Lenovo T61 with 965GM integrated graphics. Adding kernel .config and Xorg.log from the nopat case.

Created an attachment (id=19627)
Xorg.log with nopat

The actual problem is this: the i915 drm-driver feeds pages allocated using kcalloc into the agp subsystem. On systems having more than 3G of RAM, these pages may have physical addresses beyond the 4GB boundary, thus being unreachable for the (current?) agp implementation. On its way into the agp subsystem, the extra bits are chopped off, and if the GPU writes anything in that space, it is probably overwriting kernel memory. If i find out how to allocate memory in the low 4GB, a patch will follow.

Sorry, the above was incorrect. The bad physical addresses are from the mapping of some inode in i915_gem.c:1087.

Created an attachment (id=19734)
Guard against highmem pages before putting them into the agp subsystem

This attachment adds a simple check against highmem pages, making memory corruption on this way impossible. This may be a problem on 64bit kernels, if the "unsigned long" used as address data type in the agp subsystem is 64bit wide. On the other hand i suspect that the agp subsystem doesn't handle addresses above 4G well.

Created an attachment (id=19735)
Guard against highmem pages fetched from shmem file

The attached patch is purely diagnostic, to bail out early when we get highmem pages.

Created an attachment (id=19736)
Make the gem shmem file only allocate in GFP_DMA32

Attached patch finally fixes the problem in this bug, making the shmem subsystem return only pages from GFP_DMA32, instead of GFP_HIGHMEM.

AGP really just need to get fixed for >32-bit addresses. It shouldn't be too hard of a job.

Created an attachment (id=19786)
Modify agp subsystem to handle dma_addr_t physical addresses

This one lets my session start up until the opengl compositing manager starts. I suspect DRM_IOCTL_AGP_ALLOC needs to be extended to pass physical addresses using 64bit to userspace. I can imagine that the __u32 is too small for 64bit kernels, too(Given a sufficient amount of memory available). Another place i left of was passing physical addresses to intelfb, which then passes them to mtrr, which again is using unsigned longs for physical addresses.

Created an attachment (id=19821)
Modify agp subsystem to handle dma_addr_t physical addresses

Remove accidental drm api change. physical pages are not used in userspace for i9xx after all.

Created an attachment (id=19824)
Use offset of agp area for gtt variant of i915_gem_gtt_pwrite

This one corrects 264c96fe844237c3a5af92a7ee1f2bea4836ad4d.

Using this patch and the AGP changes in attachment 19821 i can get
my kde4 session to start up completely. I tested on top of the old for-review-branch, forcing the slow path in i915_gem_gtt_pwrite, so i ran into a few issues ;-).

Created an attachment (id=19827)
Modify agp subsystem to handle dma_addr_t physical addresses

This one should be the last revision of this patch. Removed the remaining api change in agpgart.h, included changes to the other agp backends, changing the prototype of the mask_memory functions. This one is tested and works on top of 57742578dc476ef5d1a06b08f61da0aae32185f4.

Applied 19827 to for-airlied and drm-intel-next. Was reviewed by Arjan as well.

The patch does not work correctly, quoting Dave Airlie:

> So we have calls to set_memory_array_uc that used to take unsigned
> long *, they now take dma_addr_t *... this would be an issue.

This will break all users of agp_generic_alloc_pages and agp_generic_destroy_pages on CONFIG_HIGHMEM64G systems.

Soo.. For me, the obvious solution would be to iterate over the array, calling set_memory_uc. Since the code obviously does not check for errors from set_memory_array_uc, this should work the same. Similar for set_memory_array_wb. Patch will follow.

Created an attachment (id=21075)
Use set_memory_{uc,wb} instead of set_memory_array_{uc,wb} for CONFIG_HIGHMEM64G

This one also fixes two unrelated warnings. I should have looked for Warnings in an earlier iteration..

*** Bug 19003 has been marked as a duplicate of this bug. ***

Is there a patch that is safe to use yet?
Kernel 2.6.28 is released and I can't get my X4500HD (G45) to work with it - X just does not want to start and leaves me with the following:

(EE) intel(0): Failed to pin front buffer: Cannot allocate memory

Fatal server error:
Couldn't bind memory for BO front buffer

I had a look at my kernel config but can't find CONFIG_HIGHMEM* anywhere, I recall something like this config option isn't available on x86-64 kernels - right?

If you need someone to test some patches for the kernel - I volunteer, you just have to tell me what patches I have to apply and in what order.
The testing system I have is the following:
Asus P5Q-EM Board, G45 chipset
8 GB of RAM (it's obvious I'd like to have 64GB highmem support)

Ubuntu 8.10 for 64-Bit CPUs running a self-compiled kernel (2.6.27.10 for now)
xf86-video-intel 2.5.1
libdrm 2.4.1

I tried to run kernel 2.6.28 with xf86-video-intel 2.5.99.1 and libdrm 2.4.3 ending up with the mentioned allocation error.
So if you have a patch I can test with 2.6.28, please tell me how to use it and I'll gladly tell you if it works for me ;)

(In reply to comment #17)
> Is there a patch that is safe to use yet?
> Kernel 2.6.28 is released and I can't get my X4500HD (G45) to work with it - X
> just does not want to start and leaves me with the following:
>
> (EE) intel(0): Failed to pin front buffer: Cannot allocate memory

This seems a separate issue, i.e. bug#19179. Are you using big "Virtual" value in xorg.conf?

I am not using the "virtual" value at all - my screen size is 1920x1200 if that matters.

The problem does only occur with 2.6.28, until 2.6.28-rc4 X simply freezes on startup. Any kernel >= 2.6.28-rc4 gives me the mentioned allocation error.

Since there was a patch "drm/ i915: GEM on PAE has problems - disable it for now." in -rc4, I think it somehow is related to it (because I didn't have the freezes after it, but X still did not start)
Are you sure my problem is unrelated to this bug?

(In reply to comment #19)
> Are you sure my problem is unrelated to this bug?
Unsure. I take back my comment#18, as you're not using "Virtual".
I'll let Eric to comment.
Eric, is this bug dup with bug#18082?

Here's a bit more information, gathered with 2.6.28 (Vanilla), libdrm 2.4.1 and xf86-video-intel 2.5.1:

cat /var/log/Xorg.0.log | grep -e '(WW)' -e '(EE)'
(WW) intel(0): libpciaccess reported 0 rom size, guessing 64kB
(WW) intel(0): Allocation error, framebuffer compression disabled
(EE) intel(0): Failed to pin front buffer: Cannot allocate memory

And here's some (I guess serious) errors in dmesg of 2.6.28, produced by the attempt to start X:

mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
resource map sanity check conflict: 0xd0000000 0xdfffffff 0xd0000000 0xd7feffff vesafb
------------[ cut here ]------------
WARNING: at arch/x86/mm/ioremap.c:226 __ioremap_caller+0x339/0x380()
Modules linked in: w83627ehf hwmon_vid
Pid: 3057, comm: Xorg Not tainted 2.6.28-pgzh #1
Call Trace:
 [<ffffffff8025b294>] warn_on_slowpath+0x64/0xa0
 [<ffffffff80261228>] iomem_map_sanity_check+0x98/0xc0
 [<ffffffff80247ad9>] __ioremap_caller+0x339/0x380
 [<ffffffff80491bdf>] i915_gem_entervt_ioctl+0x2cf/0x5a0
 [<ffffffff80491bdf>] i915_gem_entervt_ioctl+0x2cf/0x5a0
 [<ffffffff80491910>] i915_gem_entervt_ioctl+0x0/0x5a0
 [<ffffffff80480bf2>] drm_ioctl+0x112/0x340
 [<ffffffff802cc335>] vfs_ioctl+0x85/0xb0
 [<ffffffff802cc3dc>] do_vfs_ioctl+0x7c/0x460
 [<ffffffff802cc809>] sys_ioctl+0x49/0x80
 [<ffffffff8022a36b>] system_call_fastpath+0x16/0x1b
---[ end trace 7b9ce6d857e6ff4d ]---
[drm:i915_gem_object_bind_to_gtt] *ERROR* GTT full, but LRU list empty
[drm:i915_gem_object_pin] *ERROR* Failure to bind: -12<4>Clocksource tsc unstable (delta = -143772081 ns)
iounmap: bad address ffffc20011780000
Pid: 3057, comm: Xorg Tainted: G W 2.6.28-pgzh #1
Call Trace:
 [<ffffffff804911d9>] i915_gem_leavevt_ioctl+0x39/0x50
 [<ffffffff80480bf2>] drm_ioctl+0x112/0x340
 [<ffffffff802cc335>] vfs_ioctl+0x85/0xb0
 [<ffffffff802cc3dc>] do_vfs_ioctl+0x7c/0x460
 [<ffffffff802748d0>] hrtimer_wakeup+0x0/0x30
 [<ffffffff80673f9e>] do_nanosleep+0x7e/0xd0
 [<ffffffff802cc809>] sys_ioctl+0x49/0x80
 [<ffffffff80275487>] sys_nanosleep+0x77/0x80
 [<ffffffff8022a36b>] system_call_fastpath+0x16/0x1b
Xorg[3057]: segfault at 0 ip 00007f8c04da265f sp 00007fff10a12f10 error 6 in intel_drv.so[7f8c04d5a000+69000]

Without vesafb (just tried it because of the vesafb-related error):
mtrr: type mismatch for d0000000,10000000 old: write-back new: write-combining
[drm:i915_gem_object_bind_to_gtt] *ERROR* GTT full, but LRU list empty
[drm:i915_gem_object_pin] *ERROR* Failure to bind: -12<3>iounmap: bad address ffffc20011280000
Pid: 3037, comm: Xorg Not tainted 2.6.28-pgzh #3
Call Trace:
 [<ffffffff8048d4b9>] i915_gem_leavevt_ioctl+0x39/0x50
 [<ffffffff8047ced2>] drm_ioctl+0x112/0x340
 [<ffffffff802545ea>] set_next_entity+0x3a/0x80
 [<ffffffff802cc335>] vfs_ioctl+0x85/0xb0
 [<ffffffff802cc3dc>] do_vfs_ioctl+0x7c/0x460
 [<ffffffff802cc809>] sys_ioctl+0x49/0x80
 [<ffffffff8022a36b>] system_call_fastpath+0x16/0x1b
Xorg[3037]: segfault at 0 ip 00007f49f10e765f sp 00007ffffcd57f50 error 6 in intel_drv.so[7f49f109f000+69000]

Please tell me if this is some different bug and if I should file a new bug report in that case.

Peter: If you ever have doubt about whether you've got the same bug, just open a new report. Your information doesn't seem to be related to this bug at all.

*** Bug 18082 has been marked as a duplicate of this bug. ***

GEM should now be disabled with PAE, which at least fixes the corruption. We need to get a version of these patches that airlied will accept.

I just updated to 2.6.29-rc1 without applying any additional patches and CONFIG_HIGHMEM64G=y and it didn't crash. But i took a deeper look and saw that those patches are not part of -rc1.

Is this bug fixed otherwise?

please look at another bug http://bugs.freedesktop.org/show_bug.cgi?id=19415,
I got totally different result from this one.

*** Bug 19739 has been marked as a duplicate of this bug. ***

(In reply to comment #24)
> GEM should now be disabled with PAE, which at least fixes the corruption. We
> need to get a version of these patches that airlied will accept.

You should at least print some warning or information to the logs, that PAE is not supported.

Will GEM work with PAE at some future point? No NX bit protection without PAE.

In Jaunty with Intel GM965 video, trying to add Option "AccelMethod" "UXA" to /etc/X11/xorg.conf results in a system freeze with a black screen when X starts if you use the -server kernel. (CapsLock doesn't work, the magic SysRq works). Specifically, I was using linux-image-2.6.28-11-server version 2.6.28-11.40. The -generic server works (more or less, this is not the place to complain about other UXA bugs).

Why is this important (i.e. why would anyone run the -server kernel on a desktop)? This is why:

If a user installs ubuntu-xen-desktop (which is a package for running Xen "on desktops", so a natural thing to install on a desktop), apt pulls in linux-image-server as a dependency. GRUB then decides that the -server kernel is preferred to the regular -generic and boots into it (which is weird, but okay, maybe Xen needs that?).

I had to uninstall the -server kernel (and ubuntu-xen-desktop) as a workaround.

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
MachineType: LENOVO 646655G
Package: linux-image-2.6.28-11-generic 2.6.28-11.40
ProcCmdLine: root=UUID=34a7bfc5-59dc-4d74-a131-45b6ae4663b1 ro quiet splash vga=872
ProcEnviron:
 LC_CTYPE=lt_LT.UTF-8
 PATH=(custom, user)
 LANG=lt_LT.UTF-8
 SHELL=/bin/bash
ProcVersionSignature: Ubuntu 2.6.28-11.40-generic
SourcePackage: linux

Marius Gedminas (mgedmin) wrote :
Vitali Kulikou (sabotatore) wrote :

Actual!!!

Adjusting severity: crashes & hangs should be marked critical.

I believe there is an incompatibility between the High Memory Support <64GB> option in the Kernel and UXA.
I've been compiling the 2.6.30RCx and 2.6.29.x kernel both from kernel.org and ubuntu source and that option seems to break UXA. The reason I was compiling was to get the full 4GB RAM memory my laptop has.
I've been using UXA since the intel bug in Jaunty with great succes and I'd like to continue doing so.

00:02.0 VGA compatible controller: Intel Corporation Mobile GM965/GL960 Integrated Graphics Controller (rev 0c)

If you need more info please let me know.

summary: - -server kernel does not support intel video with UXA
+ Kernel High Memory Support <64GB> incompatible with intel video UXA
VladNistor (vladnistor) on 2009-05-17
summary: - Kernel High Memory Support <64GB> incompatible with intel video UXA
+ Kernel High Memory Support 64GB [PAE] incompatible with intel video UXA
description: updated
VladNistor (vladnistor) wrote :

I am now testing Karmic Alpha 1 and the problem is still here.

UXA works on kernel 2.6.30-5-generic but not 2.6.30-5-server which are available on Karmic.

This bug is important because as support for EXA will be dropped we will all be using UXA which means everyone who uses -server kernels or has 64GB mem option selected will be unable to do so.

description: updated
Changed in xserver-xorg-video-intel:
status: Unknown → In Progress
Bryce Harrington (bryce) on 2009-05-17
tags: added: xorg-needs-kernel-fix
Dirk (rptq) wrote :

I have the same problem on my HP laptop with Intel Mobile 4 Integrated Graphics Controller [8086:2a42] (rev 07). Tested in current Jaunty and Karmic Alpha 1.

One observation that may be related: the performance problems with the Intel drivers that have been reported elsewhere seem to affect me only when I'm running a kernel with the 64G option and PAE enabled. The "/proc/mtrr fix" (see Bug #314928) helps a lot. When I am running a generic kernel, I don't have any serious performance problems and no need to change /proc/mtrr.

Seems this should be addressed with the following spec - https://wiki.ubuntu.com/KernelTeam/Specs/KarmicKernelFlavours. Setting this to Triaged for kernel team to track. Thanks.

Changed in linux (Ubuntu):
importance: Undecided → High
status: New → Triaged

Fixed by commits
07613ba2f464f59949266f4337b75b91eb610795: agp: switch AGP to use page array instead of unsigned long array
95934f939c46ea2b37f3c91a4f8c82e003727761: drm/i915: enable GEM on PAE.
0b7af262aba912f52bc6ef76f1bc0960b01b8502: agp/intel: Make intel_i965_mask_memory use dma_addr_t for physical addresses

Changed in xserver-xorg-video-intel:
status: In Progress → Fix Released
Robert Hooker (sarvatt) wrote :

There is now PAE support for GEM in linus' tree (pre 2.6.31-rc1)

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=43813f399c72aa22e01a680559c1cb5274bf2140

  drm/i915: enable GEM on PAE.
  agp: switch AGP to use page array instead of unsigned long array

Andy Whitcroft (apw) on 2009-06-30
Changed in linux (Ubuntu):
assignee: nobody → Andy Whitcroft (apw)
status: Triaged → In Progress
Andy Whitcroft (apw) wrote :

Ok this should now be fixed in Karmic. The fix for this was released in the 2.6.31-1 kernel. Closing this off Fix Released.

Changed in linux (Ubuntu):
status: In Progress → Fix Released
Andy Whitcroft (apw) wrote :

As a side note. Grub is selecting -server by default as a side effect of the issue reported against the kernel in Bug #364029.

David Kohen (kohen-d) wrote :

Can anyone please backport these fixes to Jaunty? There are programs I use that don't work on Karmic yet...

Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
Changed in linux:
status: New → Fix Released
Changed in xserver-xorg-video-intel:
importance: Critical → Unknown
Changed in xserver-xorg-video-intel:
importance: Unknown → Critical
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.