kernel fails to boot from SSD on non AHCI chipset (NCQ unsupported)

Bug #1591293 reported by FerVira
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Ubuntu 16.04 64bits WON'T BOOT from a SSD Kinsgton Savage (2016) on chipset ASUS M2N68-AM SE (2010) with non-AHCI SATA2
SATA Settings: 32 bit transfers: ENABLED, AHCI mode: not available.
Fresh installed system FREEZES inmediatly after starting to boot, PC looses its video signal.
kern.log does not register anything.

This motherboard does not support AHCI, but the SSD does. Fresh installed 16.04 kernel does not disable NCQ (AHCI).

Same system will slowly boot with errors when disabling 32 Bit transfers from BIOS, and freeze.
Same PC boot ubuntustudio 12.04 live usb automatically disabling NCQ when SSD is connected.
SAME SSD WILL BOOT without erros on chipset Intel Pinetrail-M with SATA3 and AHCI.
(Same SSD and both PC boot fresh Windows 10 in sixteen seconds without any issue.)

Expected behaviour:
Kernel recognizes that AHCI is not supported by the hardware, and then disables NCQ because it requires AHCI,
OR kernel recognizes that NCQ commands are failing and then disables NCQ,
and by disabling NCQ the system can boot without any problem.

Real behaviour:
Kernel only disables NCQ when SSD drives are on a blacklist, but does not disable it when the chipset does not support AHCI (NCQ). Kernel isn't disabling NCQ even when NCQ commands are failing. This way system won't boot, or will boot but then freeze.

It's not easy to understand why the system isn't booting. Many users are replacing their hard drives with SSD to improve the performance of their not-so-old PC, and if chipset does not support AHCI they will need to do expert diagnosis and tweaks to make ubuntu boot.

Other tests:
SATA Settings: 32 BIT TRANSFER: DISABLED, AHCI: not available.
System boots from SSD with ATA errors and taking some minutes when it should take some seconds.
Kernel DOES NOT disable NCQ after failing (it does on 12.04 live, same PC)
System becomes definetly unresponsive a few minutes after start up, mouse pointer still moves but does not respond to clicks, and PC has to be hard reset.
(In the posterior boot, system cleans some orphan inodes for one of the dm-crypt volumes)

kern.log:

Jun 9 22:24:41 equipment kernel: [ 0.000000] Linux version 4.4.0-21-generic (buildd@lgw01-21) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 (Ubuntu 4.4.0-21.37-generic 4.4.6)
Jun 9 22:24:41 equipment kernel: [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.4.0-21-generic root=UUID=88b24888-31fd-4a8e-98cd-f7e34b2bcc39 ro quiet splash
....
Jun 9 22:24:41 equipment kernel: [ 0.000000] DMI: System manufacturer System Product Name/M2N68-AM SE, BIOS 0804 07/26/2010
...
Jun 9 22:24:41 equipment kernel: [ 2.073694] ata4.00: ATA-9: KINGSTON SHSS37A480G, SAFM00.U, max UDMA/133
Jun 9 22:24:41 equipment kernel: [ 2.073697] ata4.00: 937703088 sectors, multi 16: LBA48 NCQ (depth 31/32)
Jun 9 22:24:41 equipment kernel: [ 2.073777] ata4.00: configured for UDMA/133
Jun 9 22:24:41 equipment kernel: [ 2.076892] ata3: SATA link down (SStatus 0 SControl 300)
Jun 9 22:24:41 equipment kernel: [ 2.077116] scsi 3:0:0:0: Direct-Access ATA KINGSTON SHSS37A 00.U PQ: 0 ANSI: 5
Jun 9 22:24:41 equipment kernel: [ 2.077421] sd 3:0:0:0: [sda] 937703088 512-byte logical blocks: (480 GB/447 GiB)
Jun 9 22:24:41 equipment kernel: [ 2.077494] sd 3:0:0:0: [sda] Write Protect is off
Jun 9 22:24:41 equipment kernel: [ 2.077496] sd 3:0:0:0: [sda] Mode Sense: 00 3a 00 00
Jun 9 22:24:41 equipment kernel: [ 2.077528] sd 3:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
...
Jun 9 22:24:41 equipment kernel: [ 33.997707] ata4.00: exception Emask 0x0 SAct 0x7ff4000 SErr 0x0 action 0x6 frozen
Jun 9 22:24:41 equipment kernel: [ 33.997731] ata4.00: failed command: READ FPDMA QUEUED
Jun 9 22:24:41 equipment kernel: [ 33.997742] ata4.00: cmd 60/08:70:f8:2f:80/00:00:11:00:00/40 tag 14 ncq 4096 in
Jun 9 22:24:41 equipment kernel: [ 33.997742] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 9 22:24:41 equipment kernel: [ 33.997757] ata4.00: status: { DRDY }
... ( 1 time for each partition ) ...
Jun 9 22:24:41 equipment kernel: [ 33.998050] ata4.00: status: { DRDY }
Jun 9 22:24:41 equipment kernel: [ 33.998059] ata4: hard resetting link
Jun 9 22:24:41 equipment kernel: [ 34.317656] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 9 22:24:41 equipment kernel: [ 34.318498] ata4.00: configured for UDMA/133
Jun 9 22:24:41 equipment kernel: [ 34.318504] ata4.00: device reported invalid CHS sector 0
...
Jun 9 22:24:41 equipment kernel: [ 34.318524] ata4.00: device reported invalid CHS sector 0
Jun 9 22:24:41 equipment kernel: [ 34.318535] ata4: EH complete
Jun 9 22:24:41 equipment kernel: [ 64.970594] ata4.00: exception Emask 0x0 SAct 0x4fc00097 SErr 0x0 action 0x6 frozen
Jun 9 22:24:41 equipment kernel: [ 64.970612] ata4.00: failed command: READ FPDMA QUEUED
Jun 9 22:24:41 equipment kernel: [ 64.970623] ata4.00: cmd 60/08:00:80:7f:80/00:00:28:00:00/40 tag 0 ncq 4096 in
Jun 9 22:24:41 equipment kernel: [ 64.970623] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 9 22:24:41 equipment kernel: [ 64.970638] ata4.00: status: { DRDY }
... (another time for each partition) ...
Jun 9 22:24:41 equipment kernel: [ 64.994812] ata4: hard resetting link
Jun 9 22:24:41 equipment kernel: [ 65.314544] ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 9 22:24:41 equipment kernel: [ 65.315365] ata4.00: configured for UDMA/133
Jun 9 22:24:41 equipment kernel: [ 65.315368] ata4.00: device reported invalid CHS sector 0
... (another time for each partition)
Jun 9 22:24:41 equipment kernel: [ 65.315391] ata4: EH complete
Jun 9 22:24:41 equipment kernel: [ 95.943478] ata4.00: exception Emask 0x0 SAct 0x404403fe SErr 0x0 action 0x6 frozen
Jun 9 22:24:41 equipment kernel: [ 95.943852] ata4.00: failed command: READ FPDMA QUEUED
Jun 9 22:24:41 equipment kernel: [ 95.944688] ata4.00: cmd 60/08:08:e0:72:80/00:00:27:00:00/40 tag 1 ncq 4096 in
Jun 9 22:24:41 equipment kernel: [ 95.944688] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jun 9 22:24:41 equipment kernel: [ 95.946796] ata4.00: status: { DRDY }
... (another time for each partition) ...
Jun 9 22:24:41 equipment kernel: [ 127.285212] ata4: EH complete
Jun 9 22:24:41 equipment kernel: [ 130.229590] random: nonblocking pool is initialized
Jun 9 22:24:41 equipment kernel: [ 138.186900] NET: Registered protocol family 38
Jun 9 22:24:41 equipment kernel: [ 148.716604] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)
Jun 9 22:24:41 equipment kernel: [ 149.088932] lp: driver loaded but no devices found
Jun 9 22:24:41 equipment kernel: [ 149.097246] ppdev: user-space parallel port driver
Jun 9 22:24:41 equipment kernel: [ 149.254278] EXT4-fs (dm-0): re-mounted. Opts: errors=remount-ro
... (system finally boots, but freezes) ...

ANOTHER TESTS:
Ubuntu 16.04 64bits boot from SSD in a 2012 Intel Pinetrail-M with SATA3 ports, AHCI, and common BIOS. (non UEFI)
SATA Settings: AHCI mode: extended, AHCI extension: enabled, 32 bit transfers: enabled OR disabled.
System boots normally from SSD without any errors.

kern.log:

Jun 9 15:46:36 equipment kernel: [ 0.000000] Linux version 4.4.0-21-generic (buildd@lgw01-21) (gcc version 5.3.1 20160413 (Ubuntu 5.3.1-14ubuntu2) ) #37-Ubuntu SMP Mon Apr 18 18:33:37 UTC 2016 (Ubuntu 4.4.0-21.37-generic 4.4.6)
Jun 9 15:46:36 equipment kernel: [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-4.4.0-21-generic root=UUID=88b24885-31fd-4a8e-98cd-f7e34b2bcc39 ro quiet splash vt.handoff=7
...
Jun 9 16:36:54 equipment kernel: [ 0.000000] DMI: Intel Corporation Pine Trail - M /Pine Trail - M , BIOS 6.00 09/28/2011
...
Jun 9 15:46:36 equipment kernel: [ 4.196178] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Jun 9 15:46:36 equipment kernel: [ 4.196259] ata2: SATA link down (SStatus 0 SControl 300)
Jun 9 15:46:36 equipment kernel: [ 4.197128] ata1.00: ATA-9: KINGSTON SHSS37A480G, SAFM00.U, max UDMA/133
Jun 9 15:46:36 equipment kernel: [ 4.197136] ata1.00: 937703088 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
Jun 9 15:46:36 equipment kernel: [ 4.197403] ata1.00: configured for UDMA/133
Jun 9 15:46:36 equipment kernel: [ 4.197846] scsi 0:0:0:0: Direct-Access ATA KINGSTON SHSS37A 00.U PQ: 0 ANSI: 5
Jun 9 15:46:36 equipment kernel: [ 4.198578] sd 0:0:0:0: [sda] 937703088 512-byte logical blocks: (480 GB/447 GiB)
Jun 9 15:46:36 equipment kernel: [ 4.198612] sd 0:0:0:0: Attached scsi generic sg0 type 0
Jun 9 15:46:36 equipment kernel: [ 4.198981] sd 0:0:0:0: [sda] Write Protect is off
Jun 9 15:46:36 equipment kernel: [ 4.198996] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
Jun 9 15:46:36 equipment kernel: [ 4.199117] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jun 9 15:46:36 equipment kernel: [ 4.203108] sda: sda1 sda2 < sda5 sda6 > sda3 sda4
Jun 9 15:46:36 equipment kernel: [ 4.206065] sd 0:0:0:0: [sda] Attached SCSI disk
...

Then kernel finish booting and system works perfectly on Pine Trail M.

I will try to disable NCQ by parameter in the kernel run command, as M2N68AM-SE is my everyday-work PC.
---
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: xion-rock 1768 F.... pulseaudio
CurrentDesktop: Unity
DistroRelease: Ubuntu 16.04
HibernationDevice: RESUME=UUID=27fa59aa-2cc6-43d2-83da-e64f02ffb27e
InstallationDate: Installed on 2016-06-09 (1 days ago)
InstallationMedia: Ubuntu 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.1)
Lsusb:
 Bus 002 Device 003: ID 0bda:8179 Realtek Semiconductor Corp. RTL8188EUS 802.11n Wireless Network Adapter
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: System manufacturer System Product Name
Package: linux (not installed)
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-21-generic root=UUID=88b24885-31fd-4a8e-98cd-f7e34b2bcc39 ro recovery nomodeset
ProcVersionSignature: Ubuntu 4.4.0-21.37-generic 4.4.6
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-21-generic N/A
 linux-backports-modules-4.4.0-21-generic N/A
 linux-firmware 1.157
RfKill:

StagingDrivers: r8188eu
Tags: xenial staging
Uname: Linux 4.4.0-21-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
_MarkForUpload: True
dmi.bios.date: 07/26/2010
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: 0804
dmi.board.asset.tag: To Be Filled By O.E.M.
dmi.board.name: M2N68-AM SE
dmi.board.vendor: ASUSTeK Computer INC.
dmi.board.version: Rev X.0x
dmi.chassis.asset.tag: Asset-1234567890
dmi.chassis.type: 3
dmi.chassis.vendor: Chassis Manufacture
dmi.chassis.version: Chassis Version
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr0804:bd07/26/2010:svnSystemmanufacturer:pnSystemProductName:pvrSystemVersion:rvnASUSTeKComputerINC.:rnM2N68-AMSE:rvrRevX.0x:cvnChassisManufacture:ct3:cvrChassisVersion:
dmi.product.name: System Product Name
dmi.product.version: System Version
dmi.sys.vendor: System manufacturer

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1591293/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1591293

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
FerVira (vira)
summary: - 16.10 kernel fails to boot from SSD on ASUS M2N68-AM SE
+ kernel fails to boot from SSD on non AHCI chipset (NCQ unsupported)
Revision history for this message
FerVira (vira) wrote : AlsaInfo.txt

apport information

tags: added: apport-collected staging
description: updated
Revision history for this message
FerVira (vira) wrote : CRDA.txt

apport information

Revision history for this message
FerVira (vira) wrote : CurrentDmesg.txt

apport information

Revision history for this message
FerVira (vira) wrote : IwConfig.txt

apport information

Revision history for this message
FerVira (vira) wrote : JournalErrors.txt

apport information

Revision history for this message
FerVira (vira) wrote : Lspci.txt

apport information

Revision history for this message
FerVira (vira) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
FerVira (vira) wrote : ProcEnviron.txt

apport information

Revision history for this message
FerVira (vira) wrote : ProcInterrupts.txt

apport information

Revision history for this message
FerVira (vira) wrote : ProcModules.txt

apport information

Revision history for this message
FerVira (vira) wrote : PulseList.txt

apport information

Revision history for this message
FerVira (vira) wrote : UdevDb.txt

apport information

Revision history for this message
FerVira (vira) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
FerVira (vira) wrote :

NOTE: Apport file reports were generated by booting system on recovery-mode with 32Bit Transfers disabled.

description: updated
FerVira (vira)
description: updated
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.7-rc1 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.7-rc3-yakkety/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
FerVira (vira)
tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
FerVira (vira) wrote :

I did test the kernel Joseph said:
linux-image-4.7.0-040700rc1-lowlatency_4.7.0-040700rc1.201606100619_amd64.deb

Bug is still there.

I did reset the PC as soon as kernel take too long to boot and began to inform that NCQ command was failing, with similar errors than with xenial kernel, such as:
equipment kernel: [ 64.970612] ata4.00: failed command: READ FPDMA QUEUED
equipment kernel: [ 64.970638] ata4.00: status: { DRDY }
equipment kernel: [ 64.994812] ata4: hard resetting link

After reset and reboot with libata.force=noncq nothing was logged to syslog. I didn't want to complete a fifteen minutes boot with a failing upstream kernel oriented to yakkety in my xenial installation, fearing it could break some bytes. I already spent too much time to install and configure xenial.
I hope this helps anyway.
BTW can't you just check if something has been done with upstream kernels to correct this behaviour?
Thanks for your job.

penalvch (penalvch)
tags: added: latest-bios-0804
removed: ahci ncq
tags: added: kernel-bug-exists-upstream-4.7-rc1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.