Comment 0 for bug 214814

Revision history for this message
TJ (tj) wrote :

See also upstream bug:

http://bugzilla.kernel.org/show_bug.cgi?id=10396

Systems based on the Intel 450NX chipset may experience issues where devices aren't recognised that lead to drivers failing, unhandled IRQs, and other serious boot failures. The issue is caused because this chipset has 3 PCI root buses. When it was first released some operating systems (read: Windows NT) didn't always correctly discover the 2nd and 3rd PCI buses. As a result the PCI BIOS tables were 'hacked' to have a fake bridge device on PCI bus 0 that points to the same bus number as the 1st bus so they would be scanned correctly by the OS.

$ lspci
00:0a.0 PCI bridge: Intel Corporation 21154 PCI-to-PCI Bridge
00:10.0 Host bridge: Intel Corporation 450NX - 82451NX Memory & I/O Controller (rev 03)
00:12.0 Host bridge: Intel Corporation 450NX - 82454NX/84460GX PCI Expander Bridge (rev 04)
00:13.0 Host bridge: Intel Corporation 450NX - 82454NX/84460GX PCI Expander Bridge (rev 04)
00:14.0 Host bridge: Intel Corporation 450NX - 82454NX/84460GX PCI Expander Bridge (rev 04)

As a result, in a well-behaved OS the 2nd and 3rd PCI buses would be scanned twice. Once as secondaries of the 1st bus, and then as root buses in their own right. This caused problems with devices being discovered twice.

A fix-up for all i450N chipsets was introduced in arch/i386/pci/fixups.c::pci_fixup_i450nx(). Note: arch/i386 was refactored to arch/x86/ subsequently. The fix-up checks the PCI config for the subsidiary buses and if it finds them scans them. This adds them to the root_pci_bus list. Later in the boot process the ACPI/PCI code reads the ACPI DSDT table, finds the PCI bus entries (PNP0A03) and tries to scan them. It fails when scanning the 2nd and 3rd buses with:

[ 0.910906] ACPI: PCI Root Bridge [PX0B] (0000:02)
[ 0.912085] ACPI: Bus 0000:02 not present in PCI namespace
[ 0.917111] ACPI: PCI Root Bridge [PX1A] (0000:03)
[ 0.920085] ACPI: Bus 0000:03 not present in PCI namespace

Unfortunately, the report is misleading since the reason is that the bus is found to be already registered and therefore ignored. The situation can be worked around by booting with "pci=noacpi".

The solution is to make the pci_fixup_i450nx() code selected based on the DMI of the system. I've introduced a patch that does this. Initially the only DMI it will match is Dell PowerEdge 6300 but if other systems are found to be affected the output of "sudo dmidecode" should be captured and reported. Additional DMI_MATCH entries can then be added to the patch.

I found this reference to the issue in AKM's 2.6.0 mm tree and the linux-scsi mailing list archive:

"I can tell you what's going on here. This is a 450NX based motherboard. The 450NX chipset from Intel was the first chipset to have peer PCI busses. For backwards compatibility, some machine makers hacked their PCI BIOS to have a fake bridge device on PCI bus 0 that points to the same bus number as the peer bus. This way if the OS didn't know about the peer bus registers it would still find the devices by scanning behind the bridge. In this case we are scanning behind this fake bridge and then also scanning based upon the peer bus registers in the chipset, and as a result we are finding the device twice. In order to fix this problem you need to change the peer bus quirk code for the 450NX chipset to scan the list of bus 0 devices looking for a bridge that has the same config as the peer bus registers and if so delete the bridge from the list. That will avoid double scanning and will avoid having the PCI code try and configure sub busses via a fake bridge when it should do all configurations via the 450NX peer bus registers.

--
  Doug Ledford <email address hidden>"

http://marc.info/?l=linux-scsi&m=106839680416899&w=2

In this particular case a Dell PowerEdge 6300 with a PERC 2 RAID array controller (aacraid) fails to boot on any kernel after v.2.6.20 (Feisty). Reports show:

[ 0.000000] Linux version 2.6.24-15-generic (root@PowerEdge6300) (gcc version 4.1.2 (Ubuntu 4.1.2-0ubuntu4)) #1 SMP Fri Apr 4 09:18:39 BST 2008 (Ubuntu 2.6.24-15.26-generic)

[ 436.079664] Adaptec aacraid driver 1.1-5[2449]-ms

[ 492.476969] BUG: soft lockup - CPU#2 stuck for 11s! [modprobe:1376]
[ 492.483317]
[ 492.484874] Pid: 1376, comm: modprobe Not tainted (2.6.24-15-generic #1)
[ 492.491642] EIP: 0060:[<c0216641>] EFLAGS: 00000287 CPU: 2
[ 492.497226] EIP is at delay_tsc+0x41/0x50
[ 492.501302] EAX: 0000059e EBX: 0000003f ECX: 00000000 EDX: 0000003f
[ 492.507640] ESI: 17c02b3e EDI: df84f278 EBP: 17c025a0 ESP: df9dfd4c
[ 492.513972] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 492.519443] CR0: 8005003b CR2: 0812574c CR3: 1f97b000 CR4: 00000690
[ 492.525781] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 492.532114] DR6: ffff0ff0 DR7: 00000400
[ 492.536029] [<c02165c6>] __delay+0x6/0x10
[ 492.540264] [<f89496aa>] aac_fib_send+0x21a/0x2d0 [aacraid]
[ 492.546108] [<c012363a>] enqueue_task_fair+0x1a/0x30
[ 492.551318] [<f8945a94>] aac_get_adapter_info+0x74/0x620 [aacraid]
[ 492.557753] [<f8942f54>] aac_probe_one+0x224/0x450 [aacraid]
[ 492.563642] [<f8949b80>] aac_command_thread+0x0/0x6d0 [aacraid]
[ 492.569801] [<c0223136>] pci_device_probe+0x56/0x80
[ 492.574903] [<c027e85e>] driver_probe_device+0x8e/0x190
[ 492.580373] [<c027eace>] __driver_attach+0x9e/0xa0
[ 492.585385] [<c027dc7b>] bus_for_each_dev+0x3b/0x60
[ 492.590491] [<c027e6d6>] driver_attach+0x16/0x20
[ 492.595330] [<c027ea30>] __driver_attach+0x0/0xa0
[ 492.600259] [<c027e00a>] bus_add_driver+0x8a/0x1e0
[ 492.605281] [<c02232e3>] __pci_register_driver+0x53/0xa0
[ 492.610815] [<f8850033>] aac_init+0x33/0x74 [aacraid]
[ 492.616098] [<c0151511>] sys_init_module+0x151/0x1990
[ 492.621377] [<c01778fa>] __do_fault+0x21a/0x410
[ 492.626170] [<c0166421>] handle_fasteoi_irq+0x91/0xf0
[ 492.631465] [<c01053b2>] syscall_call+0x7/0xb
[ 492.636066] =======================

[ 17.155571] irq 10: nobody cared (try booting with the "irqpoll" option)
[ 17.155571] Pid: 0, comm: swapper Not tainted 2.6.25-rc8-custom #1
[ 17.155571] [<c025ad74>] __report_bad_irq+0x24/0x80

This was first thought to be part of bug #149071 "-server kernel variant fails to boot on PowerEdge 2650 with AACRAID timeouts" but it now appears likely that has a different root cause.

Attached here are patches for Gutsy and Hardy. An upstream patch for v2.6.25-rc8 is attached to the bugzilla report.