Regression : no image when starting lucid beta 2, with Nvidia Geforce6200 NV44A. "Error referencing VRAM ctxdma: -12". Worked in alpha 3 and in karmic

Bug #564617 reported by Mossroy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Nouveau Xorg driver
Fix Released
Critical
Fedora
New
Undecided
Unassigned
xserver-xorg-video-nouveau (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Binary package hint: xserver-xorg-video-nouveau

I can't start the lucid beta 2 liveCD on a computer with a Nvidia GeForce 6200 (NV44A rev A1) : the screen stays blank after booting.
It works properly with a lucid alpha 3 liveCD, and with a karmic liveCD.

I managed to install the beta 2 on this computer through the alternate CD : the behavior is the same when booting on the hard drive.
I installed all the updates (as of 16th april 2040) : same behavior

I experience the precise same behavior with Fedora live : Fedora 12 live boots correctly (but relies on "nv" driver), Fedora 13 beta live gives the same blank screen (and also uses "nouveau" driver).

I think this is a regression in the "nouveau" graphic driver : the version bundled with lucid alpha 3 works, the one with lucid beta 2 (or fedora 13 beta) does not.
Here is an excerpt of the dmesg on beta 2 :
  nouveau 0000:03:00.0: RAMHT space exhausted. ch=0
  nouveau 0000:03:00.0: Error referencing VRAM ctxdma: -12
  nouveau 0000:03:00.0: gpuobj -12

This error message does not appear when using lucid alpha 3.

Revision history for this message
Mossroy (mossroy) wrote :
Revision history for this message
Mossroy (mossroy) wrote :
Revision history for this message
Mossroy (mossroy) wrote :

Maybe this is related : before these error lines appear the following lines in dmesg :
 agpgart-nvidia 0000:00:00.0: AGP 3.0 bridge
 agpgart: modprobe tried to set rate=x12. Setting to AGP3 x8 mode.
 agpgart-nvidia 0000:00:00.0: putting AGP V3 device into 8x mode
 nouveau 0000:03:00.0: putting AGP V3 device into 8x mode

They appear only in beta 2, not in alpha 3.

My video card is indeed plugged in a AGP port, that supports x8 mode. The motherboard is a Asus A7N8X-E deluxe and AGP mode x8 is enabled in the BIOS

Revision history for this message
Mossroy (mossroy) wrote :

Rarely, the screen doesn't stay blank : I can see the ubuntu logo, and then it shows a weird screen of grey lines (see attached photo)
It happened only twice, and without changing anything to the configuration, boot options or BIOS

Also, the first time I started this computer this morning, it reached the ubuntu desktop, but froze just after (no ctrl-alt-F1 available or anything).

I have to insist on the fact that I don't have any of these behaviors on karmic, or in lucid alpha 3, or in fedora 12. I don't think it is caused by a damaged hardware.

Revision history for this message
Mossroy (mossroy) wrote :

When the screen stays blank, ubuntu is not completely frozen : if I push the power button (without holding it), the hard disk flashes a few seconds and then it powers down : looks like a proper shutdown

I found a temporary workaround by adding startup parameters in grub :
if I add "nouveau.modeset=0 nouveau.noaccel=1 blacklist=vga16fb", the desktop appears properly and ubuntu seems to work fine.

Revision history for this message
Mossroy (mossroy) wrote :

Adding "nouveau.noaccel=1" in the grub parameters is sufficient to work around the problem

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Created an attachment (id=35110)
Dmesg on ubuntu lucid beta 2 installed (does not work)

I can't start the ubuntu lucid beta 2 liveCD on a computer with a Nvidia GeForce 6200 (NV44A rev A1) : the screen stays blank after booting.
It works properly with a ubuntu lucid alpha 3 liveCD (which probably uses a previous version of nouveau) and ubuntu karmic liveCD (which uses nv driver)

I managed to install the beta 2 on this computer through the alternate CD : the behavior is the same when booting on the hard drive.
I installed all the updates (as of 16th april 2040) : same behavior

I experience the precise same behavior with Fedora live : Fedora 12 live boots correctly (but relies on "nv" driver), Fedora 13 beta live gives the same blank screen (and also uses "nouveau" driver).

I think this is a regression in the "nouveau" graphic driver : the version bundled with lucid alpha 3 works, the one with lucid beta 2 (or fedora 13 beta) does not.
Here is an excerpt of the dmesg on beta 2 :
  nouveau 0000:03:00.0: RAMHT space exhausted. ch=0
  nouveau 0000:03:00.0: Error referencing VRAM ctxdma: -12
  nouveau 0000:03:00.0: gpuobj -12

This error message does not appear when using lucid alpha 3.

I found that adding "nouveau.noaccel=1" as a boot parameter is a workaround, both on ubuntu lucid beta 2 and on fedora 13 beta

My motherboard is a Asus A7N8X-E deluxe, on which a GeForce 6200 video card (branded MSI) is in the AGP port.

You will find attached the dmesg of :
- ubuntu lucid beta 2 installed (does not work)
- ubuntu lucid alpha 3 liveCD (works)
- ubuntu lucid beta 2 installed, with nouveau.noaccel=1 (works)

lspci gives :
03:00.0 VGA compatible controller: nVidia Corporation NV44A [Geforce 6200] (rev a1)

I first opened this bug on ubuntu launchpad : https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/564617

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Created an attachment (id=35111)
Dmesg on ubuntu lucid alpha 3 live CD (works)

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Created an attachment (id=35112)
Dmesg on ubuntu lucid beta 2 installed, with nouveau.noaccel=1 (works)

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Here are the different versions of libraries used by ubuntu (I don't know which ones are relevant in this case) :

On beta 2 (with all current updates) :
xorg 1:7.5+5ubuntu1
libdrmnouveau1 2.4.18-1ubuntu3
xserver-xorg-video-nouveau 1:0.0.15+git20100219+9b4118d-0ubuntu5
kernel 2.6.32-21-generic (on x86)

On alpha 3 liveCD :
xorg 1:7.5+1ubuntu8
libdrmnouveau1 2.4.18-1ubuntu2
xserver-xorg-video-nouveau 1:0.0.15+git20100219+9b4118d-0ubuntu2
kernel 2.6.32-14-generic (on x86)

Revision history for this message
Mossroy (mossroy) wrote :

Adding "nouveau.noaccel=1" in the grub parameters of the fedora 13 beta livecd makes it work too

Revision history for this message
Mossroy (mossroy) wrote :
Revision history for this message
In , Xavier (chantry-xavier) wrote :

Ok first, I noticed these strange differences :
dmesg-beta2:[ 6.686828] Linux agpgart interface v0.103
dmesg-beta2:[ 6.709693] agpgart: Detected NVIDIA nForce2 chipset
dmesg-beta2:[ 6.725012] agpgart-nvidia 0000:00:00.0: AGP aperture is 64M @ 0xe0000000
dmesg-beta2:[ 8.814073] agpgart-nvidia 0000:00:00.0: AGP 3.0 bridge
dmesg-beta2:[ 8.814090] agpgart: modprobe tried to set rate=x12. Setting to AGP3 x8 mode.
dmesg-beta2:[ 8.814097] agpgart-nvidia 0000:00:00.0: putting AGP V3 device into 8x mode
dmesg-alpha3:[ 4.397083] Linux agpgart interface v0.103
dmesg-alpha3:[ 5.005571] agpgart: Detected NVIDIA nForce2 chipset
dmesg-alpha3:[ 5.012047] agpgart-nvidia 0000:00:00.0: AGP aperture is 64M @ 0xe0000000

It might be worth to understand why that changed first, before looking at the rest. Unless someone tells you otherwise :)

(In reply to comment #3)
> Here are the different versions of libraries used by ubuntu (I don't know which
> ones are relevant in this case) :
>
> On beta 2 (with all current updates) :
> xorg 1:7.5+5ubuntu1
> libdrmnouveau1 2.4.18-1ubuntu3
> xserver-xorg-video-nouveau 1:0.0.15+git20100219+9b4118d-0ubuntu5
> kernel 2.6.32-21-generic (on x86)
>
> On alpha 3 liveCD :
> xorg 1:7.5+1ubuntu8
> libdrmnouveau1 2.4.18-1ubuntu2
> xserver-xorg-video-nouveau 1:0.0.15+git20100219+9b4118d-0ubuntu2
> kernel 2.6.32-14-generic (on x86)

I suppose you can exclude xorg and xorg nouveau driver, since you apparently get that problem right when nouveau initializes, before touching X.

There remains libdrm nouveau and the kernel. The problem is that these are ubuntu packages, and to people external to ubuntu (like all nouveau developers, and many users, like me), it's not easy to know exactly what code that is using.

If you could point out the equivalent upstream code, it would help.
For libdrm : http://cgit.freedesktop.org/mesa/drm/
For drm/ttm : http://git.kernel.org/?p=linux/kernel/git/airlied/drm-2.6.git;a=shortlog;h=drm-next
For nouveau drm : http://cgit.freedesktop.org/nouveau/linux-2.6/

More importantly, how do we know that this latest upstream code would not work for you ?

Revision history for this message
In , Marcin Slusarz (marcin-slusarz) wrote :

please try video=vga16fb:off kernel parameter and attach kernel log

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Thanks Xavier and Marcin for your quick answers.

Xavier, I had noticed the extra lines regarding AGP x8 (see https://bugs.launchpad.net/nouveau/+bug/564617/comments/3).
What is sure is that I did not touch anything inside the PC, or in the BIOS.

Is there an easy way to test with an upstream version of nouveau? I might run a liveCD, or even install another OS if it's really necessary.

Marcin,
I tried the "video=vga16fb:off" parameter : I can see the ubuntu logo, but then I only see grey lines on the screen (see attached photo)
Then, the PC seems frozen : I can only do a hard power-off
You'll find the dmesg and corresponding kern.log attached

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Created an attachment (id=35118)
Photo of grey lines that appear with option "video=vga16fb:off"

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Created an attachment (id=35120)
Dmesg on ubuntu lucid beta 2 installed, with video=vga16fb:off

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Created an attachment (id=35123)
kern.log on ubuntu lucid beta 2 installed, with video=vga16fb:off

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Maybe I should add that my screen is plugged on the DVI interface, not the VGA one
The AGP x8 is enabled in the BIOS

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Regarding the version number of libdrmnouveau1, normally ubuntu uses a prefix for the version upstream, and a suffix for the patches it applies on it.
So, I suppose it is based on version 2.4.18 ( http://cgit.freedesktop.org/mesa/drm/tag/?id=2.4.18 )
The changelog of ubuntu patches are there : http://changelogs.ubuntu.com/changelogs/pool/main/libd/libdrm/libdrm_2.4.18-1ubuntu3/changelog
The difference between version "ubuntu2" and "ubuntu3" seems to be the correction of this bug : https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/547124 , that did not apparently affect me, and that corresponds to the following commit : http://cgit.freedesktop.org/mesa/drm/commit/?id=df32c307e8f81b46ee8aa4dd7222fc18f175bbb3
Not sure if it's really relevant in our case.

Maybe version alpha3 of ubuntu had the option "nouveau.noaccel=1" by default, and not in version beta2? Which would explain the difference of behavior even if the versions are very close.

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Regarding the kernel used by ubuntu, the changelog is there : http://changelogs.ubuntu.com/changelogs/pool/main/l/linux/linux_2.6.32-21.32/changelog

There has been a few changes that consist in disabling acceleration for specific cards. See for example https://bugs.launchpad.net/ubuntu/+source/linux/+bug/544088
Well, disabling acceleration is a workaround in my opinion, not a real fix...

Mossroy (mossroy)
description: updated
Revision history for this message
Mossroy (mossroy) wrote :

I opened a bug on the nouveau bugtracker : https://bugs.freedesktop.org/show_bug.cgi?id=27706

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

In any case, I don't think this could be due to ubuntu patches because the behavior is the same with fedora 13 beta

The versions used by fedora seem a bit more recent :
kernel 2.6.33.1-24-fc13.i686
xorg-x11-drv-nouveau 1:0.0.16-2.20100218git2964702.fc13
libdrm 2.4.19-1.fc13

Revision history for this message
In , Xavier (chantry-xavier) wrote :

(In reply to comment #13)
> In any case, I don't think this could be due to ubuntu patches because the
> behavior is the same with fedora 13 beta
>
> The versions used by fedora seem a bit more recent :
> kernel 2.6.33.1-24-fc13.i686
> xorg-x11-drv-nouveau 1:0.0.16-2.20100218git2964702.fc13
> libdrm 2.4.19-1.fc13

Fedora has the advantage that its maintained and supported by a nouveau developer (actually the main one working on the kernel side). But you also should use the distrib bug tracker in that case.

Going back to ubuntu, it's not just a matter of what custom patches they have applied (though that's very important to know too), but also what changed between the working and broken version.
You said the main new changes was a quirk to disable accel ? How could that be the problem, since the new version only works when accel is disabled, and you need to do that manually ?
Anyway it seems the noaccel quirks do no affect your card, and should cause a message to be displayed when they are used.
http://people.canonical.com/~apw/raof-nv-accel-lucid/

Building latest code manually has several advantages :
- check if it has already been fixed there
- easy way for everyone to see what code you tried
- you can update whenever you want, and quickly apply patches if needed
- if it's indeed a regression, you can git bisect it to find the offending commit

Instructions are there :
http://nouveau.freedesktop.org/wiki/InstallNouveau
http://nouveau.freedesktop.org/wiki/InstallDRM

PS : a lot of what I said here is not specific to that bug, more a general concern about how nouveau bugs should be handled

Revision history for this message
In , Chris Halse Rogers (raof) wrote :

So, the pertinent changes between Lucid Alpha 3 and Beta 2 would be:
Alpha 3 had the nouveau kernel module + drm + ttm from linux 2.6.33 + the ctxprog voodoo generator backported, in the out-of-tree linux-backports-modules.
Beta 2 has the drm stack from 2.6.33.2 + a number of fixes pulled in from nouveau/linux-2.6

If you're interested in bisecting, it should be quicker to start with 2.6.33.2 - if that works, then the problem is in one of the small number of backported nouveau patches we've got. If it doesn't, it should be relatively quick to bisect from 2.6.33 to 2.6.33.2

If you're uncomfortable building this stack from source, the xorg-edgers PPA has recent snapshots, available here: https://edge.launchpad.net/~xorg-edgers/+archive/ppa

If that works, then this bug can be closed because it's already fixed. If that doesn't work then you'll probably need to do some building from source in order to identify which change broke it.

Revision history for this message
In , Marcin Slusarz (marcin-slusarz) wrote :

Before you start bisecting:
video=vga16fb:off does not work because in Ubuntu vga16fb is compiled as module and kernel parameters does not affect it - you have to blacklist it by hand.
There's "vga16fb: not registering due to another framebuffer present" (which is Ubuntu addon) in kernel log so it should not load it, BUT it exercises broken failure path in vga16fb which is iounmapping memory region which was not ioremapped by vga16fb (VGA_MAP_MEM is defined differently on different architectures - on x86 it's NOT ioremap), so it might affect nouveau - I need to investigate it more.

In reply to comment 13:
IIRC Fedora does not enable vga16fb, so:
- it might be diffent bug
- this bug is not related to vga16fb at all

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Christopher, you were right.
After upgrading to the PPA xorg-edgers packages, the problem seems to be solved : I can boot successfully without changing any boot parameter.

So, it looks like the problem has already been solved in nouveau.
For further reference, the PPA version number of libdrm is 2.4.20+git20100404.c7650003-0ubuntu0sarvatt3 (instead of 2.4.18-1ubuntu3)
xserver-xorg-video-nouveau version is 1:0.0.15+git20100416.40636169-0ubuntu0sarvatt instead of 1:0.0.15+git20100219+9b4118d-0ubuntu5

The fix must be between these 2 versions.

Sorry for the inconvenience, and thank you to all who had a look in this issue.

Revision history for this message
In , Mossroy-mossroy (mossroy-mossroy) wrote :

Created an attachment (id=35135)
Dmesg on ubuntu lucid beta 2 installed, with updated PPA packages (works)

Revision history for this message
Chris Halse Rogers (raof) wrote :

Generally, bugs in Ubuntu should be filed in Ubuntu (which you have done, thanks!) and then forwarded only once (a) we're confident it's a real upstream bug and (b) we have a good set of debugging information - some of which is going to be Ubuntu-specific, like what changed between alpha 3 and beta 2.

I'll comment further on the upstream bug.

Bryce Harrington (bryce)
tags: added: lucid
tags: added: karmic
Revision history for this message
Mossroy (mossroy) wrote :

Christopher was right : I opened the upstream bug too quickly.
I saw the same behavior on Fedora 13 beta, and thought these 2 distributions used very recent versions of nouveau.

I was wrong : the problem seems to be already solved upstream.
As suggested by Christopher, I enabled the xorg-edgers PPA https://edge.launchpad.net/~xorg-edgers/+archive/ppa and upgraded.
Now my lucid beta 2 installation starts without any additional boot parameter.

The PPA version number of libdrm is
2.4.20+git20100404.c7650003-0ubuntu0sarvatt3 (instead of 2.4.18-1ubuntu3)
xserver-xorg-video-nouveau version is
1:0.0.15+git20100416.40636169-0ubuntu0sarvatt instead of
1:0.0.15+git20100219+9b4118d-0ubuntu5

The versions bundled with Fedora 13 beta are :
xorg-x11-drv-nouveau 1:0.0.16-2.20100218git2964702.fc13
libdrm 2.4.19-1.fc13

IF the problem with Fedora is the same one (some people told me it could be another one), that would probably give a hint on where the fix might be : between versions 2.4.19-1.fc13 and 2.4.20+git20100404.c7650003-0ubuntu0sarvatt3. I know those versions are from different distros, so what I say is not completely true. But if we assume the fix comes from upstream (and not from a distro-specific patch), it should be between versions 2.4.19-1 and 2.4.20+git20100404.c7650003

Would it be possible to upgrade libdrm to 2.4.20 before the final release of lucid?
Or else backport the fix (if we manage to find it : I might try to help but I'm no expert)?

Revision history for this message
Mossroy (mossroy) wrote :

At least, if it's not possible or too complicated/risky before the final release, the video card should be blacklisted for acceleration, as it has been done for other cards : see https://bugs.launchpad.net/ubuntu/+source/linux/+bug/544088
It seems to be called the "noaccel quirk".

I think it's still better to have an unaccelerated video than no video at all...

Revision history for this message
Mossroy (mossroy) wrote :

The problem is still the same with the RC of lucid.

- on first boot on the liveCD, I had the ubuntu logo, and then the grey lines symptom (see the photo of comment #4)
- on second boot on the liveCD, I added the "nouveau.noaccel=1" in the startup parameters, which allowed me to boot succesfully
- on third and following boots, on this same liveCD, I don't have the ubuntu logo at all (the screen stays blank)

papukaija (papukaija)
tags: added: regression-release
Bryce Harrington (bryce)
Changed in xserver-xorg-video-nouveau (Ubuntu):
status: New → Confirmed
Changed in nouveau:
importance: Unknown → Critical
status: Unknown → Fix Released
Changed in nouveau:
importance: Critical → Unknown
Changed in nouveau:
importance: Unknown → Critical
Revision history for this message
Timo Aaltonen (tjaalton) wrote :

fixed in maverick and newer.

Changed in xserver-xorg-video-nouveau (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.