Comment 8 for bug 1363462

Revision history for this message
Jason M. (jason-ubuntu) wrote :

Hi @Joseph,

Good news and bad news.

Good news: the Crucical 1TB M550s were recognized and queued trim support was disabled:
  [ 3.661992] ata4.00: disabling queued TRIM support
  [ 3.661993] ata4.00: ATA-9: Crucial_CT1024M550SSD1, MU01, max UDMA/133
  [ 3.661994] ata4.00: 2000409264 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
  [ 3.662337] ata3.00: disabling queued TRIM support
  [ 3.662337] ata3.00: ATA-9: Crucial_CT1024M550SSD1, MU01, max UDMA/133
  [ 3.662338] ata3.00: 2000409264 sectors, multi 16: LBA48 NCQ (depth 31/32), AA

The bad news: it did not stop "failed command: WRITE FPDMA QUEUED" errors, e.g.
  [ 12.098247] ata4.00: exception Emask 0x10 SAct 0x60000001 SErr 0x400100 action 0x6 frozen
  [ 12.098266] ata4.00: irq_stat 0x08000000, interface fatal error
  [ 12.098280] ata4: SError: { UnrecovData Handshk }
  [ 12.098291] ata4.00: failed command: WRITE FPDMA QUEUED
  [ 12.098304] ata4.00: cmd 61/00:00:48:23:01/04:00:03:00:00/40 tag 0 ncq 524288 out
  [ 12.098304] res 40/00:f4:48:1f:01/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
  [ 12.098346] ata4.00: status: { DRDY }
  [ 12.098355] ata4.00: failed command: WRITE FPDMA QUEUED
  [ 12.098368] ata4.00: cmd 61/00:e8:48:1b:01/04:00:03:00:00/40 tag 29 ncq 524288 out
  [ 12.098368] res 40/00:f4:48:1f:01/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
  [ 12.098409] ata4.00: status: { DRDY }
  [ 12.098417] ata4.00: failed command: WRITE FPDMA QUEUED
  [ 12.098429] ata4.00: cmd 61/00:f0:48:1f:01/04:00:03:00:00/40 tag 30 ncq 524288 out
  [ 12.098429] res 40/00:f4:48:1f:01/00:00:03:00:00/40 Emask 0x10 (ATA bus error)
  [ 12.098466] ata4.00: status: { DRDY }
  [ 12.098476] ata4: hard resetting link

A RAID1 scrub indicated there was a repair on ATA4, so these resets caused some data issue(es).

So far, only "libata.force=X.YY:noncq" seems to fix the problem. After reading quite a few articles, I'm beginning to think there are some pretty serious issues w/ the kernel regarding SSDs and controller interactions. There are many, many folks having trouble, and the consistent response is to disable NCQ, although there may also be some interactions w/ MSI and SSDs, as least as indicated by
  - http://article.gmane.org/gmane.linux.ide/58740
  - https://bugzilla.kernel.org/show_bug.cgi?id=60731

What surprises me is that so many vendors are providing SSD backed VMs based on Linux, and yet it's mostly individuals seeing these problems with common distros, e.g. Ubuntu, Fedora, CentOS. Is there some major disconnect here?