ata NCQ error with md, drbd and high load

Bug #1516269 reported by Armin Schindler
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Low
Unassigned

Bug Description

With Ubuntu wily kernel 4.2.0-18-generic a SyncTarget to a DRBD on LVM on MD Raid1 produces errors:

Nov 14 17:46:20 kobol kernel: [ 198.351892] ata1.00: exception Emask 0x60 SAct 0x38 SErr 0x800 action 0x6 frozen
Nov 14 17:46:20 kobol kernel: [ 198.351922] ata1.00: irq_stat 0x20000000, host bus error
Nov 14 17:46:20 kobol kernel: [ 198.351935] ata1: SError: { HostInt }
Nov 14 17:46:20 kobol kernel: [ 198.351944] ata1.00: failed command: WRITE FPDMA QUEUED
Nov 14 17:46:20 kobol kernel: [ 198.351957] ata1.00: cmd 61/48:18:00:d8:95/05:00:07:00:00/40 tag 3 ncq 692224 out
Nov 14 17:46:20 kobol kernel: [ 198.351957] res 40/00:20:40:e1:95/00:00:07:00:00/40 Emask 0x60 (host bus error)
Nov 14 17:46:20 kobol kernel: [ 198.351988] ata1.00: status: { DRDY }
Nov 14 17:46:20 kobol kernel: [ 198.351997] ata1.00: failed command: WRITE FPDMA QUEUED
Nov 14 17:46:20 kobol kernel: [ 198.352009] ata1.00: cmd 61/40:20:40:e1:95/00:00:07:00:00/40 tag 4 ncq 32768 out
Nov 14 17:46:20 kobol kernel: [ 198.352009] res 40/00:20:40:e1:95/00:00:07:00:00/40 Emask 0x60 (host bus error)
Nov 14 17:46:20 kobol kernel: [ 198.352039] ata1.00: status: { DRDY }
Nov 14 17:46:20 kobol kernel: [ 198.352047] ata1.00: failed command: WRITE FPDMA QUEUED
Nov 14 17:46:20 kobol kernel: [ 198.352059] ata1.00: cmd 61/80:28:80:e1:95/00:00:07:00:00/40 tag 5 ncq 65536 out
Nov 14 17:46:20 kobol kernel: [ 198.352059] res 40/00:20:40:e1:95/00:00:07:00:00/40 Emask 0x60 (host bus error)
Nov 14 17:46:20 kobol kernel: [ 198.352090] ata1.00: status: { DRDY }
Nov 14 17:46:20 kobol kernel: [ 198.352099] ata1: hard resetting link
Nov 14 17:46:20 kobol kernel: [ 198.352106] ata2.00: exception Emask 0x60 SAct 0x70000 SErr 0x800 action 0x6 frozen
Nov 14 17:46:20 kobol kernel: [ 198.352133] ata2.00: irq_stat 0x20000000, host bus error
Nov 14 17:46:20 kobol kernel: [ 198.352145] ata2: SError: { HostInt }
Nov 14 17:46:20 kobol kernel: [ 198.352154] ata2.00: failed command: WRITE FPDMA QUEUED
Nov 14 17:46:20 kobol kernel: [ 198.352166] ata2.00: cmd 61/48:80:00:d8:95/05:00:07:00:00/40 tag 16 ncq 692224 out
Nov 14 17:46:20 kobol kernel: [ 198.352166] res 40/00:88:40:e1:95/00:00:07:00:00/40 Emask 0x60 (host bus error)
Nov 14 17:46:20 kobol kernel: [ 198.352196] ata2.00: status: { DRDY }
Nov 14 17:46:20 kobol kernel: [ 198.352204] ata2.00: failed command: WRITE FPDMA QUEUED
Nov 14 17:46:20 kobol kernel: [ 198.352217] ata2.00: cmd 61/40:88:40:e1:95/00:00:07:00:00/40 tag 17 ncq 32768 out
Nov 14 17:46:20 kobol kernel: [ 198.352217] res 40/00:88:40:e1:95/00:00:07:00:00/40 Emask 0x60 (host bus error)
Nov 14 17:46:20 kobol kernel: [ 198.352246] ata2.00: status: { DRDY }
Nov 14 17:46:20 kobol kernel: [ 198.352254] ata2.00: failed command: WRITE FPDMA QUEUED
Nov 14 17:46:20 kobol kernel: [ 198.352266] ata2.00: cmd 61/80:90:80:e1:95/00:00:07:00:00/40 tag 18 ncq 65536 out
Nov 14 17:46:20 kobol kernel: [ 198.352266] res 40/00:88:40:e1:95/00:00:07:00:00/40 Emask 0x60 (host bus error)
Nov 14 17:46:20 kobol kernel: [ 198.352295] ata2.00: status: { DRDY }
Nov 14 17:46:20 kobol kernel: [ 198.352304] ata2: hard resetting link
Nov 14 17:46:21 kobol kernel: [ 198.671678] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Nov 14 17:46:21 kobol kernel: [ 198.673618] ata2.00: configured for UDMA/133
Nov 14 17:46:21 kobol kernel: [ 198.673628] ata2: EH complete
Nov 14 17:46:21 kobol kernel: [ 198.727651] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Nov 14 17:46:21 kobol kernel: [ 198.735336] ata1.00: configured for UDMA/133
Nov 14 17:46:21 kobol kernel: [ 198.735343] ata1: EH complete

The error didn't show up before the update from kernel 4.2.0-16-generic.
Disabling NCQ with
 echo 1 > /sys/block/sda/device/queue_depth
is a workaround.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1516269

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Armin Schindler (armin-melware) wrote :

Cannot run apport-collect (error on headless server).

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Armin Schindler (armin-melware) wrote :

After checking with kernel 4.2.0-16-generic again, I got the error too.
I have another server (same hardware) still running with 4.2.0-16-generic (not updated yet) and the error doesn't show up, so I thought the update of the kernel brought the problem. The servers hardware was completely exchanged, but problem stays.

Revision history for this message
penalvch (penalvch) wrote :

Armin Schindler, could you please boot into a live environment via http://cdimage.ubuntu.com/daily-live/current/ and then run the apport-collect as requested in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1516269/comments/1 ?

Changed in linux (Ubuntu):
importance: Undecided → Low
status: Confirmed → Incomplete
Revision history for this message
Armin Schindler (armin-melware) wrote :

It is a headless server in a datacenter with remote access only. Is there another way to use apport-collect?

Revision history for this message
penalvch (penalvch) wrote :
Revision history for this message
Armin Schindler (armin-melware) wrote :

The ReportingBugs documentation says I should send the apport file via
 ubuntu-bug -c <apport_file.extension> -u <bug number>
But this gives me:
Usage: ubuntu-bug [options] [symptom|pid|package|program path|.apport/.crash file]
ubuntu-bug: error: -u/--update-bug option cannot be used together with options for a new report

Shall I just add the .apport file here as attachment to this bug-report?

Revision history for this message
penalvch (penalvch) wrote :

Armin Schindler, to advise, if you typed literally:
ubuntu-bug -c <apport_file.extension> -u <bug number>

That wouldn't work, as the things in greater/less than signs are to be replaced with applicable information. For your report, it would be:

apport-cli -u 1516269

Revision history for this message
Armin Schindler (armin-melware) wrote :

When I use
  apport-cli -u 1516269
it returns:
ERROR: You need to use apport-collect for updating an existing bug

penalvch (penalvch)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.