X server aborts if two primary devices found

Bug #459512 reported by Bryce Harrington on 2009-10-24
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
xorg-server (Ubuntu)
High
Bryce Harrington
Karmic
High
Bryce Harrington

Bug Description

[Problem]

If multiple video devices are present, X will look to see which is marked primary in the BIOS. If neither are so marked, or both are marked primary, X doesn't know what to do and aborts with an error such as this in Xorg.0.log:

(!!) More than one possible primary device found
(--) PCI: (0:1:0:0) 1002:9400:1002:2552 ATI Technologies Inc R600 [Radeon HD 2900 Series] rev 0, Mem @ 0xc0000000/268435456, 0xfcce0000/65536, I/O @ 0x0000a000/256, BIOS @ 0x????????/131072
(--) PCI: (0:2:0:0) 1002:9400:1002:2552 ATI Technologies Inc R600 [Radeon HD 2900 Series] rev 0, Mem @ 0xd0000000/268435456, 0xfcde0000/65536, I/O @ 0x0000b000/256, BIOS @ 0x????????/131072
Primary device is not PCI
...

(II) VESA: driver for VESA chipsets: vesa
(II) FBDEV: driver for framebuffer: fbdev
(II) Primary Device is:
(WW) Falling back to old probe method for vesa
(WW) Falling back to old probe method for fbdev
...
(EE) open /dev/fb0: No such file or directory
(EE) No devices detected.

Fatal server error:
no screens found

[Discussion]
For some additional background on this class of issues, bug 267241 is worth reviewing. During Jaunty development we found that ALL dual-card setups were broken, including ones with primary/secondary states properly set.

As a temporary fix, I made a patch that simply prevented it from aborting in this case, by just selecting the card with the highest BusID. The problem with the patch is that it could cause X to boot with the secondary card even if the primary was properly marked. But at least it prevented X from failing to start up.

However, a patch from fedora turned up in time for inclusion in Jaunty, that gave a mostly-better solution, by adding code to query bios and detect the primary card and use it. We felt this to be a more elegant solution so this was the patch we shipped in Jaunty. Debian also picked up this patch.

This patch has one flaw however - it still exits out in the specific condition that both cards are PCI (as opposed to one PCI / one VGA), and neither card are marked primary. While this sounds unusual, it does appear in the wild; see attached Xorg.0.log and lspci for example.

[Proposal]
The solution I used originally to just force X to pick the one with the highest bus id would work on top of the fedora patch. In fact, using that plus the fedora patch together gives us a best of both worlds solution since together they cover each other's flaws.

There is still one (potential) issue that it could select the wrong device as primary, if neither device is specified as primary, and if the expected primary has the lower busid. But in this case the bug will be much less severe; instead of not booting at all, it will boot up but just have the monitors reversed from what the user expects.

Bryce Harrington (bryce) wrote :
Bryce Harrington (bryce) wrote :
Changed in xorg-server (Ubuntu):
assignee: nobody → Bryce Harrington (bryceharrington)
importance: Undecided → High
milestone: none → ubuntu-9.10
status: New → In Progress
Bryce Harrington (bryce) wrote :

From the lspci output, note the two video devices are identical types:

00:00.0 Host bridge: Intel Corporation 82X38/X48 Express DRAM Controller (rev 01)
 Subsystem: ASUSTeK Computer Inc. Device 8295

01:00.0 VGA compatible controller: ATI Technologies Inc R600 [Radeon HD 2900 Series]
 Subsystem: ATI Technologies Inc Device 2552

02:00.0 VGA compatible controller: ATI Technologies Inc R600 [Radeon HD 2900 Series]
 Subsystem: ATI Technologies Inc Device 2552

From my reading of the source, X will still abort even if the devices are not the same; it's just checking if one is marked primary or not. However, for testing purposes we should make sure the case of two identical video cards works properly. (I don't have hardware on hand that can replicate this scenario unfortunately.)

Bryce Harrington (bryce) wrote :

This patch fixes the issue of X being unable to make up its mind, by making it just pick the one with the larger busid.

(Or technically, the last appropriate device seen... it looks like the device list will always be sorted from lower busid to higher.)

Bryce Harrington (bryce) wrote :

This patch is just the inverse of the previous one - if more than one device is present, just select the first one seen.

I'm going to go with the other patch since it's closest to what we were doing before, however I don't have strong preferences here. The main reason we went with the last busid before was for the use case where there is an on-board video chip, and the user has stuck in a second card that they'd prefer to be used. However, I think in these cases there is always going to be a primary device, so I don't think it is a relevant consideration any more.

Anyway, I'm posting both patches here and soliciting feedback on which would cover more cases in practice.

Bryce Harrington (bryce) wrote :

One other note worth mentioning - the ideal solution (mentioned in the other bug referenced above) is to probe the devices and see which ones have monitors attached. This would let us avoid setting the wrong card as primary.

This functionality is not available currently. As I understand it, the probing would need to be done by the kernel. Jesse has told me that this functionality has been implemented for -intel for KMS; I anticipate it will be available in Lucid's kernel. Presumably getting this for -ati/KMS should be feasible for Lucid as well.

Bryce Harrington (bryce) wrote :

I've packaged each of the patches into PPA for tester convenience:

 highest bus id: https://edge.launchpad.net/~bryceharrington/+archive/black

 lowest bus id: https://edge.launchpad.net/~bryceharrington/+archive/white

I'd appreciate feedback on which of these two seems to work better. (And of course, if they *don't* solve the issue for some reason, I'd really like to know that!)

I have encountered this bug as well.

# lspci
04:00.0 VGA compatible controller: nVidia Corporation G94 [GeForce 9600 GT] (rev a1)
05:00.0 VGA compatible controller: nVidia Corporation G94 [GeForce 9600 GT] (rev a1)

Worked around by adding the following to xorg.conf

Section "Device"
 Identifier "Configured Video Device"
 Busid "PCI:5:0:0"
        Driver "nvidia"
EndSection

Tried both of Bryce Harrington's PPA patches but without the stanza added to xorg.conf they had no effect. Currently using the 'high' bus ID.

Download full text (3.8 KiB)

Joshua, please post your Xorg.0.log from both with my PPA and without.

(You should always post your Xorg.0.log with X bugs.)

On Sun, Oct 25, 2009 at 02:43:18AM -0000, Joshua Harding wrote:
> I have encountered this bug as well.
>
> # lspci
> 04:00.0 VGA compatible controller: nVidia Corporation G94 [GeForce 9600 GT] (rev a1)
> 05:00.0 VGA compatible controller: nVidia Corporation G94 [GeForce 9600 GT] (rev a1)
>
> Worked around by adding the following to xorg.conf
>
> Section "Device"
> Identifier "Configured Video Device"
> Busid "PCI:5:0:0"
> Driver "nvidia"
> EndSection
>
> Tried both of Bryce Harrington's PPA patches but without the stanza
> added to xorg.conf they had no effect. Currently using the 'high' bus
> ID.
>
> --
> X server aborts if two primary devices found
> https://bugs.launchpad.net/bugs/459512
> You received this bug notification because you are a direct subscriber
> of the bug.
>
> Status in ???xorg-server??? package in Ubuntu: In Progress
> Status in xorg-server in Ubuntu Karmic: In Progress
>
> Bug description:
> [Problem]
>
> If multiple video devices are present, X will look to see which is marked primary in the BIOS. If neither are so marked, or both are marked primary, X doesn't know what to do and aborts with an error such as this in Xorg.0.log:
>
> (!!) More than one possible primary device found
> (--) PCI: (0:1:0:0) 1002:9400:1002:2552 ATI Technologies Inc R600 [Radeon HD 2900 Series] rev 0, Mem @ 0xc0000000/268435456, 0xfcce0000/65536, I/O @ 0x0000a000/256, BIOS @ 0x????????/131072
> (--) PCI: (0:2:0:0) 1002:9400:1002:2552 ATI Technologies Inc R600 [Radeon HD 2900 Series] rev 0, Mem @ 0xd0000000/268435456, 0xfcde0000/65536, I/O @ 0x0000b000/256, BIOS @ 0x????????/131072
> Primary device is not PCI
> ...
>
> (II) VESA: driver for VESA chipsets: vesa
> (II) FBDEV: driver for framebuffer: fbdev
> (II) Primary Device is:
> (WW) Falling back to old probe method for vesa
> (WW) Falling back to old probe method for fbdev
> ...
> (EE) open /dev/fb0: No such file or directory
> (EE) No devices detected.
>
> Fatal server error:
> no screens found
>
>
> [Discussion]
> For some additional background on this class of issues, bug 267241 is worth reviewing. During Jaunty development we found that ALL dual-card setups were broken, including ones with primary/secondary states properly set.
>
> As a temporary fix, I made a patch that simply prevented it from aborting in this case, by just selecting the card with the highest BusID. The problem with the patch is that it could cause X to boot with the secondary card even if the primary was properly marked. But at least it prevented X from failing to start up.
>
> However, a patch from fedora turned up in time for inclusion in Jaunty, that gave a mostly-better solution, by adding code to query bios and detect the primary card and use it. We felt this to be a more elegant solution so this was the patch we shipped in Jaunty. Debian also picked up this patch.
>
> This patch has one flaw however - it still exits out in the specific condition that both cards are PCI (as opposed to one PCI / one VGA), and neither card are marked primary. While this sou...

Read more...

Attached is my Xorg.0.log. This is with your patched xserver-xorg from the PPA.

Mark Shuttleworth (sabdfl) wrote :

White works for me, thank you!

Black fails badly for me, X.org crashes, the Xorg.log for the crash is attached as xorg_black_crash_log.log. I don't know for sure but I wonder if this would happen with white too if I plugged the monitor in the other card.

Can you comment on plans for the release, and this bug?

Steve Langasek (vorlon) wrote :

Mark,

Is it possible for you to confirm whether plugging the monitor into the other card does cause the crash with white?

If the currently available fix is only going to correct the problem for any given user 50% of the time, I think we should wait to be able to address this fully and fix it via SRU. A reupload of xorg-server at this point is going to set back CD validation, leaving us very little margin for dealing with any other issues that show up and necessitate a respin of media.

Though of course, not getting this fix in for final release does mean that the live CD will not work on this hardware without fiddling.

Mark Shuttleworth (sabdfl) wrote :

On 26/10/09 12:55, Steve Langasek wrote:
> Is it possible for you to confirm whether plugging the monitor into the
> other card does cause the crash with white?
>

Interestingly, it did not cause a crash. So, white worked in the case
where the monitor was connected to the card it picked, and it did not
crash when the monitor was connected to a different card.

More interestingly, though, the monitor stayed completely dark when it
was connected to the other card. The BIOS and boot sequence just didn't
display at all. It's as if the top card (the one picked by white) is
used at POST. I don't know if that's just luck, or if it's some
convention in the BIOS, or how this is generally determined, but trying
to boot with the monitor connected any other way just shows nothing.

The machine does boot, however, so I captured a Xorg log for that case.
It shows no crash, but it shows that there were no connected monitors.
Interestingly, plugging a monitor in after the boot did not bring
anything up on that monitor.

I don't think this provides enough new insight to decide what the likely
impact of the patch is on the average dual-card system, or other systems.

Mark

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package xorg-server - 2:1.6.4-2ubuntu4

---------------
xorg-server (2:1.6.4-2ubuntu4) karmic; urgency=low

  * Add 188_default_primary_to_first_busid.patch: X can abort if multiple
    video devices are present, and none are marked as primary. This makes
    X just pick the first one it sees and carry on.
    (LP: #459512)

 -- Bryce Harrington <email address hidden> Mon, 26 Oct 2009 10:05:44 -0700

Changed in xorg-server (Ubuntu Karmic):
status: In Progress → Fix Released
Bryce Harrington (bryce) wrote :

Great, thanks for the testing! To follow up on the remaining loose threads...

From the log file, the crash with black is occurring much later in the boot cycle; it is setting primary and picking the driver correctly, it is just failing somewhere down the road. My guess is that there is also something in the driver that looks at the pci bus rather than honoring the primary display setting. In any case, since white works, we know we can bypass whatever the crash is.

As to the issue where the display isn't shown during boot, yes that's quite interesting - the kernel has decided the first card is primary, but black picks the secondary. So what you've described makes total sense... and again indicates that white is the more correct solution.

Regarding Joshua's issue, the patch clearly worked, it's just that with -nvidia you typically have to specify more in the xorg.conf than other drivers. From the log file it looks like it's getting past the issue with the primary card selection, but without the xorg.conf settings, it's trying to use the open -nv driver instead of -nvidia. I'd suggest testing with Driver "nvidia" included in xorg.conf but not the BusID, and if it still fails, open a new bug report.

Alan Johnson (nilgiri) wrote :

There does not seem to be an xorg-server package in the repos. You mean xserver-xorg-core?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers