No PCI IOMEM space available below 4GB

Reported by TJ on 2009-03-14
32
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Linux
Confirmed
High
linux (Ubuntu)
Medium
Unassigned
Nominated for Jaunty by DanYargici
Nominated for Karmic by DanYargici
nvidia-graphics-drivers-180 (Ubuntu)
Undecided
Unassigned
Nominated for Jaunty by DanYargici
Nominated for Karmic by DanYargici

Bug Description

Binary package hint: nvidia-glx-180

This can affect almost all releases (6.10 7.04, 7.10, 8.04, 8.10, 9.04+)

It usually only shows up dramatically with video cards that have large memory (e.g. 256MB) and on systems that

a) have 3GB or more RAM and/or
b) have 64-bit CPUs on 32-bit north-bridge chipsets (e.g. Intel 945)

Typical symptom: "failed to load the nvidia kernel module"

This issue could affect all versions of the Nvidia proprietary drivers and others.

Recently I've seen several users in IRC #ubuntu asking for help but without solving the issue. When I saw Keith Dewitt asking the same question on 2009-03-14 I arranged with him to access his system via SSH and a multiuser screen session to diagnose the issue. Keith was very patient and supportive and his assistance led directly to this discovery.

There are also threads on the nvidia forums with the same issues.

The symptoms are that users report that systems with Nvidia-based graphics cards won't start the X server successfully. There are a wide range of reports that don't immediately pin-point the cause.

The fact the nvidia kernel module failed to load is the biggest clue.

Check dmesg and /var/log/kern.log for something along these lines:

[ 20.137717] nvidia: module license 'NVIDIA' taints kernel.
[ 20.412849] ACPI: PCI Interrupt Link [LNEB] enabled at IRQ 18
[ 20.412858] nvidia 0000:02:00.0: PCI INT A -> Link[LNEB] -> GSI 18 (level, low) -> IRQ 18
[ 20.412862] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
[ 20.412862] NVRM: BAR1 is 256M @ 0x30000000 (PCI:0002:00.0)
[ 20.412865] NVRM: This is a 64-bit BAR mapped above 4GB by the system BIOS or
[ 20.412865] NVRM: Linux kernel. The NVIDIA Linux graphics driver and other
[ 20.412866] NVRM: system software do not currently support this configuration
[ 20.412867] NVRM: reliably.
[ 20.412872] nvidia: probe of 0000:02:00.0 failed with error -1
[ 20.412887] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 20.412889] NVRM: None of the NVIDIA graphics adapters were initialized!

Also check /var/log/Xorg.0.log or /var/log/Xorg.0.log.old for this tell-tale:

(--) PCI:*(0@2:0:0) nVidia Corporation GeForce 8400 GS rev 161, Mem @ 0xfd000000
/16777216, 0x130000000/268435456, 0xfa000000/33554432, I/O @ 0x0000ec00/128, BIO
S @ 0x????????/131072

(**) NVIDIA(0): Depth 24, (--) framebuffer bpp 32
(==) NVIDIA(0): RGB weight 888
(==) NVIDIA(0): Default visual is TrueColor
(==) NVIDIA(0): Using gamma correction (1.0, 1.0, 1.0)
(**) NVIDIA(0): Enabling RENDER acceleration
(II) NVIDIA(0): Support for GLX with the Damage and Composite X extensions is
(II) NVIDIA(0): enabled.
(EE) NVIDIA(0): Failed to load the NVIDIA kernel module!
(EE) NVIDIA(0): *** Aborting ***
(II) UnloadModule: "nvidia"

Notice that the video card's IOMEM allocation is at 5GB (0x130000000/268435456 = 5GB/256MB).

/var/log/dmesg shows:

[ 0.000000] BIOS-e820: 0000000100000000 - 0000000130000000 (usable)

[ 0.477047] pci 0000:00:0b.0: BAR 9: can't allocate resource
[ 0.477047] pci 0000:02:00.0: BAR 1: can't allocate resource

lspci -nn reveals:
00:0b.0 PCI bridge [0604]: nVidia Corporation MCP73 PCI Express bridge [10de:056
e] (rev a1)
02:00.0 VGA compatible controller [0300]: nVidia Corporation GeForce 8400 GS [10
de:0404] (rev a1)

lspci -vvnn -s 02:00.0

02:00.0 VGA compatible controller [0300]: nVidia Corporation GeForce 8400 GS [10
de:0404] (rev a1)
        Subsystem: eVga.com. Corp. Device [3842:c738]
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Step
ping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 18
        Region 0: Memory at fd000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 130000000 (64-bit, prefetchable) [size=256M]
        Region 3: Memory at fa000000 (64-bit, non-prefetchable) [size=32M]
        Region 5: I/O ports at ec00 [size=128]
        Expansion ROM at febe0000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel modules: nvidia, nvidiafb

The reason for the failure is that the video chipset's PCI IOMEM RAM (in this case 256MB) cannot be allocated in the PCI IOMEM region below 4GB (from 3GB-4GB) since other devices have already been given assignments that mean there isn't a 256MB gap available on a 256MB boundary - in other words, either at 3GB or 3.25GB.

The host system has 4GB of system RAM which causes the kernel to prevent the use of the 3GB-3.25GB range, and other allocations prevent the use of other ranges.

Workarounds:

a) alter the BIOS video IOMEM position to below 4GB (on 64-bit architectures), or
b) some BIOSes allow setting the "top of memory below 4GB" If so, set it to less than or equal to 2.75GB and try progressively lower values - one should allow a hole large enough for the video IOMEM to fit starting at 3GB (below 4GB), or
c) reduce the system's RAM to less than 2.5GB to leave sufficient free space.

For the last year I've had an ongoing project to write a completely new PCI IOMEM system for the Linux kernel. I've added the mainline bug report that triggered the development to this report. Additionally, here is a link to my Wiki describing the issue and solutions.

http://tjworld.net/wiki/Linux/PCIDynamicResourceAllocationManagement

TJ (tj) on 2009-03-14
description: updated
TJ (tj) on 2009-03-14
description: updated
TJ (tj) on 2009-03-14
description: updated
Changed in linux:
status: Unknown → Confirmed
TJ (tj) on 2009-03-14
Changed in linux (Ubuntu):
assignee: nobody → intuitivenipple
importance: Undecided → Medium
status: New → Confirmed
TJ (tj) on 2009-03-14
description: updated
description: updated
TJ (tj) on 2009-03-14
description: updated
TJ (tj) on 2009-03-14
description: updated
description: updated
keithdewitt (keithdewitt) wrote :

Changes made to BIOS:

Enter BIOS setup

Select:

Advanced Chipset Setup

Select:

Top of Memory Under 4GB

Set to:

2.75GB (Has to be set under 3GB)

Save Changes

====
ECS GF7100PVT-M3 LGA 775 NVIDIA GeForce 7100 HDMI Micro ATX Intel Motherboard.

For this report the video card is the Gf8400 GS overriding the onboard Gf7100. The Gf7100 should work as is.

Processor x86 Family 6 Model 23 Stepping 6 GenuineIntel ~2666 Mhz

BIOS Version/Date American Megatrends Inc. 080015, 10/30/2007

SMBIOS Version 2.5
Total Physical Memory 4,096.00 MB

Bryce Harrington (bryce) wrote :

Hi intuitivenipple,

Please attach the output of `lspci -vvnn`, and attach your /var/log/Xorg.0.log (and maybe Xorg.0.log.old) file from after reproducing this issue. If you've made any customizations to your /etc/X11/xorg.conf please attach that as well.

[This is an automated message. Apologies if it has reached you inappropriately; please just reply to this message indicating so.]

tags: added: needs-xorglog
tags: added: needs-lspci-vvnn
Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: New → Incomplete
Duane (duane-e164) wrote :

Bryce, this has already been reported against the other bugs, and on the nvidia forum, the only fix at present is to patch the kernel.

http://www.nvnews.net/vbulletin/showthread.php?t=113682

TJ (tj) wrote :

The nvdia package is attached to this report since it is one of the most visibly affected packages.

I am almost finished writing the new kernel PCI IOMEM allocation system which will solve these issues. It should land for either mainline .30 or .31, time permitting.

Ettore (shinder) wrote :

Sorry, what does it mean? We must wait kernel 2.6.30 or 2.6.31?

On Wed, 2009-03-25 at 08:21 +0000, Ettore wrote:
> Sorry, what does it mean? We must wait kernel 2.6.30 or 2.6.31?

Or later, yes.

Ettore (shinder) wrote :

ok, thanks. But for the moment is there something i can do?
I tried to patch the kernel
http://www.nvnews.net/vbulletin/showthread.php?t=113682
but with no luck.
The problem is that i can't have a resolution more than 800*600

Duane (duane-e164) wrote :

I'm currently running Jaunty, the details on the nvidia site are for Hardy (give or take) so the current version as of a few days ago is 2.6.28-11-generic, replace that version with whatever version kernel you are running in the details below:

uname -r

If you want to patch your current kernel the steps to take are as follows:

sudo su -
cd /usr/src
apt-get build-dep linux-image-2.6.28-11-generic
apt-get source linux-image-2.6.28-11-generic
patch -p0 < NVRM_512M_fix.txt
cd linux-2.6.28
cp /boot/config-2.6.28-11-generic debian/config/amd64/config.generic
CONCURRENCY_LEVEL=2 AUTOBUILD=1 NOEXTRAS=1 fakeroot debian/rules binary-generic
cd ..
dpkg -i linux-image-2.6.28-11-generic_2.6.28-11.36_amd64.deb

I'm not claiming the above is the best way to do this, merely that it works for me, up until recently I was running 3G however I swapped out a 1G module for a 2G and now how 4G memory total and when I boot up I see the nvidia logo and dmesg confirms the kernel module is being loaded on bootup etc etc etc.

Duane (duane-e164) wrote :

I forgot to mention, I editted the 512M patch by hand to change the directory/path from 2.6.24 to 2.6.28 and I changed it from starting at line 122 to line 126, the line change is optional since it appears to patch fine, but with a line offset warning.

Ettore (shinder) wrote :

Thanks Duane, i will re-re-re-retry :-)

TJ (tj) on 2009-03-26
Changed in linux (Ubuntu):
status: Confirmed → In Progress
Changed in nvidia-graphics-drivers-180 (Ubuntu):
status: Incomplete → Invalid
Ettore (shinder) wrote :

The patch doesn't work for me. I tried 4 times. it works only when i take out 2gb of ram. i'll wait better times :-)

Duane (duane-e164) wrote :

I have no idea what kernel version you are using but the patch has worked fine for me on all versions I tried it against, you do need a little technical knowledge about your hardware, do you have 256M or 512 or 1G or ....? I only have 1 system to test against with 512M of video ram.

Also when building against 2.6.24 kernels I was able to use custom-binary-generic rather than binary-generic, this one change has forced me to keep installing my own kernel build if I apt-get -u dist-upgrade because the system will pull in and install the copy from repositories etc.

Ettore (shinder) wrote :

Nothing to do my friend. This morning i have retried 2 times but no luck.
I use 2.6.27.11 kernel, my system is 32 bit and i've 512M of video ram

TJ (tj) wrote :

Please use IRC or the forums for support discussions. The bug report is for information valuable to resolving the issue or describing unreported aspects and affected situations.

Duane (duane-e164) wrote :

The reason it's failing for you is because you need a 64bit processor + 64bit operating system, 32bit OS/CPU can't address that amount of ram.

Ettore (shinder) wrote :

But the new kernel PCI IOMEM allocation system which will solve these issues TJ is writing will be also for 32bit OS/CPU?

TJ (tj) wrote :

On Mon, 2009-03-30 at 10:07 +0000, Ettore wrote:
> But the new kernel PCI IOMEM allocation system which will solve these
> issues TJ is writing will be also for 32bit OS/CPU?

Yes.

Martin Erik Werner (arand) wrote :

I just went through plucking a 2GB capsule out, and driver works.

I'm using 32bit with 4GB physical memory, on GeForce 8600M GT, PCI-E 16x, 512 MB, 169 MHz.

Is there a way to manually patch with this configuration, if so, how?

When can we expect the proper patch to arrive? Karmic? Loony?

Martin Erik Werner (arand) wrote :

For anyone having this trouble on similar case as me (see above) the solution was embarrassingly simple: install 64bit version instead.

Ettore (shinder) wrote :

With or without the kernel patch?

Martin Erik Werner (arand) wrote :

No kernel patch at all, just downloaded the latest daily amd64 instead of 32bit and woosh! It all works with 4GB installed.

Martin Erik Werner (arand) wrote :

For anyone not as lucky as me, needing to go through the patching, there seems to be fairly up-to-date instructions on: http://ubuntuforums.org/showthread.php?t=849395

For the record, I am using a dell xps1530 laptop, so if you do as well, a -> 64bit switch might be the very simple solution.

Ettore (shinder) wrote :

Arand, but you have a 32 bit or 64 bit cpu? Because I have a 32 bit cpu and the patch doesn't work. I tried to install ubuntu 64 bit but is the same thing.

I have this bug, now ia have instaledd amd64 Ubuntu jaunty, for first time all works but when i reboot i have same problem," failed to load the nvidia kernel module"

Brinley Ang (brinley) wrote :

I had encountered the exact same error first reported in the report when I bumped up my ram from 2gigs to 4gigs

I am on an Asus Pro31sc running 64bit Jaunty and a BIOS update fixed the problem for me.

Martin Erik Werner (arand) wrote :

This seems to be working on my particular laptop [XPS M1530], on Karmic, prior on this machine (on Jaunty) I was able to solve it by using the amd64 and get the nvidia driver working there, now it seems to be working as it should on i386.

Those for which using an amd64 kernel made no change, is this still happening on 9.10?

Rinias (rinias) wrote :

What's the status of this bug fix? I had no problem with openSUSE 11.2 x86_64, but in testing oS 11.3 RC1, I found that the nVidia driver (265) would not compile because of a memory mapping issue. This does not occur in Slackware64 13.1. Slackware does do some remapping, however, to fix the issue. The following is from my dmesg :

pnp: PnP ACPI init
ACPI: bus type pnp registered
pnp 00:0c: disabling [mem 0x00000000-0x0009ffff] because it overlaps 0000:01:00.0 BAR 1 [mem 0x00000000-0x0fffffff 64bit pref]
pnp 00:0c: disabling [mem 0x000c0000-0x000cffff] because it overlaps 0000:01:00.0 BAR 1 [mem 0x00000000-0x0fffffff 64bit pref]
pnp 00:0c: disabling [mem 0x000e0000-0x000fffff] because it overlaps 0000:01:00.0 BAR 1 [mem 0x00000000-0x0fffffff 64bit pref]
pnp 00:0c: disabling [mem 0x00100000-0xbfffffff] because it overlaps 0000:01:00.0 BAR 1 [mem 0x00000000-0x0fffffff 64bit pref]
pnp: PnP ACPI: found 13 devices
ACPI: ACPI bus type pnp unregistered
system 00:01: [mem 0xfed14000-0xfed19fff] has been reserved
system 00:08: [io 0x04d0-0x04d1] has been reserved
system 00:08: [io 0x0800-0x087f] has been reserved
system 00:08: [io 0x0400-0x041f] has been reserved
system 00:08: [io 0x0500-0x053f] has been reserved
system 00:08: [mem 0xfed1c000-0xfed1ffff] has been reserved
system 00:08: [mem 0xfed20000-0xfed3ffff] has been reserved
system 00:08: [mem 0xfed45000-0xfed89fff] has been reserved
system 00:08: [mem 0xffb00000-0xffbfffff] has been reserved
system 00:08: [mem 0xfff00000-0xffffffff] has been reserved
system 00:0a: [io 0x0250-0x0253] has been reserved
system 00:0a: [io 0x0256-0x025f] has been reserved
system 00:0a: [mem 0xfec00000-0xfec00fff] could not be reserved
system 00:0a: [mem 0xfee00000-0xfee00fff] has been reserved
system 00:0b: [mem 0xe0000000-0xefffffff] has been reserved

I'll attach the dmesg as well.

Changed in linux (Ubuntu):
assignee: TJ (intuitivenipple) → nobody
status: In Progress → Triaged
Changed in linux:
importance: Unknown → High
Alfred Homan (alfredhoman) wrote :

Take a look https://bugs.launchpad.net/system76/+bug/661248 is it connected somehow?
I have the same "probe of 0000:02:00.0 failed with error -1" on 1x2Gb and 9800 GTX 512

TJ, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please test for this with the latest development release of Ubuntu? ISO images are available from http://cdimage.ubuntu.com/daily-live/current/ .

If it remains an issue, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

Also, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

Changed in linux (Ubuntu):
status: Triaged → Incomplete
TJ (tj) wrote :

I did some work upstream but it became too complicated. Others have also tried but nothing has changed. In the meantime the hardware out there that has a 32-bit limit imposed by its southbridge on 64-bit CPUs is all now old legacy gear so I don't foresee the kernel fixing this, ever.

It still remains a problem for the hardware out there still in use where the user wants to install and use the maximum RAM possible but the 32-bit limitation prevents the PCIOMEM addresses being remapped above the 4GB boundary.

Changed in linux (Ubuntu):
status: Incomplete → Triaged

TJ, would it be possible to perform an apport-collect as requested in https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-180/+bug/342926/comments/29 so folks would have example hardware and logs to reference regarding this issue?

tags: added: needs-kernel-logs
removed: needs-lspci-vvnn needs-xorglog
TJ (tj) wrote :

No point. All work - if any - will be upstream not in Ubuntu and the information required is already available in the initial comment I created for this bug report.

tags: removed: needs-kernel-logs
tags: added: needs-kernel-logs
TJ (tj) wrote :

Christopher, please leave the bug status as it is now. I was the original reporter *and* member of the Ubuntu Kernel team and I created this bug specifically to track the issue against the mainline kernel. There is no work to be done nor data collection required against Ubuntu, as I have previously described.

Any updates should be posted to the linked mainline Linux bug-tracker.

tags: removed: needs-kernel-logs
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.