[nForce2] pata_amd pata_acpi can't load

Bug #1536397 reported by Sergey on 2016-01-20
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Linux
Invalid
Undecided
Unassigned
linux (Ubuntu)
High
Unassigned

Bug Description

Since the 3.19.0-33-generic system fall to (initramfs) prompt. Now I can boot the system only with 3.19.0-15-generic.

In dmesg I found:
pata_amd 0000:00:09.0: can't enable device: BAR 0 [io 0x01f0-0x01f7] not claimed
pata_amd: probe of 0000:00:09.0 failed with error -22

And analogical for pata_acpi.

P.S. I have a suspicion to the option CONFIG_PCI_BUS_ADDR_T_64BIT=y .

WORKAROUND: Use kernel boot parameter:
pci=nocrs

---
ApportVersion: 2.17.2-0ubuntu1.8
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: paly 2672 F.... pulseaudio
CurrentDesktop: XFCE
DistroRelease: Ubuntu 15.04
HibernationDevice: RESUME=UUID=d4137d34-9c31-41b2-9557-b14531398f02
InstallationDate: Installed on 2015-11-14 (67 days ago)
InstallationMedia: Xubuntu 15.04 "Vivid Vervet" - Release i386 (20150422.1)
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
Package: linux (not installed)
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-3.19.0-15-generic root=/dev/sda2 ro rootflags=subvol=@ quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 3.19.0-15.15-generic 3.19.3
RelatedPackageVersions:
 linux-restricted-modules-3.19.0-15-generic N/A
 linux-backports-modules-3.19.0-15-generic N/A
 linux-firmware 1.143.7
RfKill:
 0: phy0: Wireless LAN
  Soft blocked: yes
  Hard blocked: no
Tags: vivid
UdevLog: Error: [Errno 2] Немає такого файла або каталогу: '/var/log/udev'
Uname: Linux 3.19.0-15-generic i686
UpgradeStatus: Upgraded to vivid on 2015-12-05 (47 days ago)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 07/07/2004
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 6.00 PG
dmi.board.name: nVidia-nForce
dmi.chassis.type: 3
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr6.00PG:bd07/07/2004:svn:pn:pvr:rvn:rnnVidia-nForce:rvr:cvn:ct3:cvr:

Sergey (xpaly) wrote :

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1536397

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Sergey (xpaly) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected vivid
description: updated
Sergey (xpaly) wrote : CRDA.txt

apport information

apport information

Sergey (xpaly) wrote : IwConfig.txt

apport information

apport information

Sergey (xpaly) wrote : Lspci.txt

apport information

apport information

apport information

apport information

apport information

Sergey (xpaly) wrote : PulseList.txt

apport information

Sergey (xpaly) wrote : UdevDb.txt

apport information

apport information

Sergey (xpaly) on 2016-01-21
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.4 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-wily

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: needs-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Sergey (xpaly) wrote :

I installed 4.4.0 kernel from wily. My system loaded correctly.

tags: added: kernel-fixed-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
A. Eibach (andi3) wrote :

Many, many thanks Sergey!!
This problem really fussed me as I thought it's my hardware (as usual).
They must really have fixed a huge bug in 4.4.0.

For the Ubuntu folks here, note that 4.3.4 from Kernel Mainline will still NOT work. (neither does 4.3.0-7, which made it into main repo only 2 days ago, Jan 21)

Sergey (xpaly) wrote :

I tried to load Xubuntu 15.10 installation LiveCD from USB Stick. It was loaded normally. (4.2.0-16-generic)

I tested next kenrels from mainline (i386):
3.19.0-031900-generic = ok
3.19.8-031908-generic = ok
4.2.8-040208-generic = pata_amd, pata_acpi error
4.3.0-040300-generic = pata_amd, pata_acpi error
4.3.4-040304-generic = pata_amd, pata_acpi error

I can check anything else.

P.S. Links what kernel was used:
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.19-vivid/
http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.19.8-vivid/
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.2.8-wily/
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.3-wily/
http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.3.4-wily/

A. Eibach (andi3) wrote :

>4.3.0-040300-generic = pata_amd, pata_acpi error
>4.3.4-040304-generic = pata_amd, pata_acpi error

Confirmed!

> I tried to load Xubuntu 15.10 installation LiveCD from USB Stick. It was loaded normally. (4.2.0-16-generic)

Aha, _that_ worked?
Well, my 4.2.x was exactly ... let me see ... 4.2.0-10-generic and it FAILED.
Yours is a mere 6 patch levels newer and ... works. This _is_ peculiar.

A. Eibach (andi3) wrote :

@Sergey, to find the actual bugger we'd have to test somewhere _between_ 4.3.5 and 4.3.9[99].
However, there was nothing like that available for testing.

Sergey, the next step is to fully reverse commit bisect from kernel 4.3 to 4.4 in order to identify the last bad commit, followed immediately by the first good one. Once this good commit has been identified, it may be reviewed for backporting. Could you please do this following https://wiki.ubuntu.com/Kernel/KernelBisection#How_do_I_reverse_bisect_the_upstream_kernel.3F ?

Please note, finding adjacent kernel versions is not fully commit bisecting.

After the fix commit (not kernel version) has been identified, then please mark this report Status Confirmed.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

tags: added: kernel-fixed-upstream-4.4 needs-reverse-bisect
removed: needs-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
A. Eibach (andi3) wrote :

Christopher, it's NOT that simple in this case.
I can confirm that 4.3.4 fails and that 4.4.0 works.
But there is definitely nothing in the mainline that might give a chance to find the "offending" patch, that broke it.

All we _could_ probably do is test 4.4.rc1 ... 4.4rc8.

But that might become a 2-weekend job :)

Oleg Blashchuk (2contras) wrote :

4.4-rc1 works just fine, so now I'm bisecting from 4.3 to 4.4-rc1, but it's also few thousand commits, something about 13 rounds to find.
I'm on 14.04 LTS and bug appeared only in 3.16 and 3.19 kernels, all installations with 3.13 works OK.
3.16.45 - ok
3.16.46 and higher - error
3.19.25 - ok
3.19.26 and higher - error
Bug was added at the end of July, but there were few hundreds of commits starting from 28 July and none of them directly linked to pata_amd or pata_acpi.

Sergey (xpaly) wrote :

Hi!
About my system:
3.19.0-25-generic = fail
3.19.0-26-generic = fail

I tested mainline kernels from 4.4.0 to 4.3.4.

4.4.0-040400rc1-generic = ok (from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.4-rc1+cod1-wily/ )
4.3.5-040305-generic = fail ( from http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.3.5-wily/ )

4.3.4 = fail

I will try to do bisect. What I must to write here?

Sergey (xpaly) wrote :

Error message from 4.3.5-generic:

pata_amd 0000:00:09.0: can't enable device: BAR 0 [io size 0x0008] not assigned
pata_acpi 0000:00:09.0: can't enable device: BAR 0 [io size 0x0008] not assigned

Sergey (xpaly) wrote :

By the https://wiki.ubuntu.com/Kernel/KernelBisection#How_do_I_reverse_bisect_the_upstream_kernel.3F

I can't to do "Reverse commit bisecting upstream kernel versions"

$ git bisect good v4.3.5
fatal: Needed a single revision
Bad rev input: v4.3.5

$ git bisect good v4.3-5
fatal: Needed a single revision
Bad rev input: v4.3-5

Oleg Blashchuk (2contras) wrote :

And the winner is....

4d6b4e69a245e9df4b84dba387596086cb66887d is the first bad commit
commit 4d6b4e69a245e9df4b84dba387596086cb66887d
Author: Jiang Liu <email address hidden>
Date: Wed Oct 14 14:29:41 2015 +0800

    x86/PCI/ACPI: Use common interface to support PCI host bridge

    Use common interface to simplify ACPI PCI host bridge implementation.

    Signed-off-by: Jiang Liu <email address hidden>
    Reviewed-by: Hanjun Guo <email address hidden>
    Acked-by: Bjorn Helgaas <email address hidden>
    Signed-off-by: Rafael J. Wysocki <email address hidden>

Actually it is good, not bad commit :) So, what's next?

Sergey (xpaly) on 2016-02-02
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: cherry-pick reverse-bisect-done
removed: needs-reverse-bisect
Changed in linux (Ubuntu):
status: Confirmed → Triaged
Oleg Blashchuk (2contras) wrote :

It was impossible to resist temptation to find where the story began. It all started between 4.1 and 4.2-rc1. Bisecting points to this commit:

3d9fecf6bfb8b12bc2f9a4c7109895a2a2bb9436 is the first bad commit
commit 3d9fecf6bfb8b12bc2f9a4c7109895a2a2bb9436
Author: Bjorn Helgaas <email address hidden>
Date: Tue Jun 9 17:31:38 2015 -0500

    x86/PCI: Use host bridge _CRS info on systems with >32 bit addressing

    We enable _CRS on all systems from 2008 and later. On older systems, we
    ignore _CRS and assume the whole physical address space (excluding RAM and
    other devices) is available for PCI devices, but on systems that support
    physical address spaces larger than 4GB, it's doubtful that the area above
    4GB is really available for PCI.

    After d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible"), we
    try to use that space above 4GB *first*, so we're more likely to put a
    device there.

    On Juan's Toshiba Satellite Pro U200, BIOS left the graphics, sound, 1394,
    and card reader devices unassigned (but only after Windows had been
    booted). Only the sound device had a 64-bit BAR, so it was the only device
    placed above 4GB, and hence the only device that didn't work.

    Keep _CRS enabled even on pre-2008 systems if they support physical address
    space larger than 4GB.

    Fixes: d56dbf5bab8c ("PCI: Allocate 64-bit BARs above 4G when possible")
    Reported-and-tested-by: Juan Dayer <email address hidden>
    Reported-and-tested-by: Alan Horsfield <email address hidden>
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=99221
    Link: https://bugzilla.opensuse.org/show_bug.cgi?id=907092
    Signed-off-by: Bjorn Helgaas <email address hidden>
    CC: <email address hidden> # v3.14+

Oleg Blashchuk (2contras) wrote :

Actually, this leads to a workaround. Adding kernel option "pci=nocrs" in grub allows to boot currents kernels. I'm not sure whether it leads to some other problems.

tags: added: bisect-done
tags: removed: bisect-done
A. Eibach (andi3) wrote :

Oleg, huge thanks for your hard work!!

Ты чудо!

Sergey (xpaly) wrote :

I tested "pci=nocrs" kernel option on the 3.19.0-50-generic. System can to boot.

Sergey:
>"System can to boot."

Are you saying it can boot successfully, or it is not able to boot?

Sergey (xpaly) wrote :

My system can boot successfully with this workaround (kernel option "pci=nocrs").

A. Eibach (andi3) wrote :

Guessed so. Oleg did a marvellous job in figuring that out. That was not easy-peasy by any means.

description: updated

A. Eibach, this report has nothing to do with the bugzilla report you linked, as the bug this report is scoped to is already fixed upstream.

Changed in linux:
importance: Unknown → Undecided
status: Unknown → New
status: New → Invalid
A. Eibach (andi3) wrote :

Nothing to do ??

I beg to differ!
kernel.org #111901 is a kind of RfC whether CRS usage with old hardware should be levitated or whether users are---even in future---entitled to boot their < 4.4.0 kernels with pci=nocrs.

And as the forced enabling of CRS was causing the problems that Sergey reported, I would not say "it has nothing to do".
At least I would not believe it unless a more verbose explanation is given _why_ _not_.

A. Eibach (andi3) wrote :

Besides, about your "fixed upstream": I think it was entirely accidental. Something got modified in PCI handling, and abracadabra, it made old hardware work again.
But as fas as I could read the code fix, it didn't look as if Sergey's and mine hardware was actually targeted.
We were LUCKY that it worked from some certain 4.4.0 RC. That's about it.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.