Random file corruption when writing large files to hba mounted volume

Bug #1585668 reported by Rainier Ramos
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
High
Unassigned

Bug Description

We have an Ubuntu 16.04 LTS server with a 3Par volume via QLogic HBA. When mounted, writing large files to it repeatedly produces the following errors:

[435753.554230] sd 5:0:0:1: [sde] tag#10 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[435753.554232] sd 5:0:0:1: [sde] tag#10 Sense Key : Illegal Request [current]
[435753.554233] sd 5:0:0:1: [sde] tag#10 Add. Sense: Invalid field in cdb
[435753.554235] sd 5:0:0:1: [sde] tag#10 CDB: Write same(16) 93 08 00 00 00 00 04 7f ff f7 00 7f ff ff 00 00
[435753.554236] blk_update_request: critical target error, dev sde, sector 75497463

We see similar errors with XFS and EXT4. ZFS seems to work fine though. Here's how to recreate:

// create a large file
$ sudo dd if=/dev/urandom of=/random_file.bin bs=1048576 count=800

// 3Par volume is /dev/sde
$ sudo mkfs.xfs /dev/sde

// mount to /test
$ sudo mount /dev/sde /test

// watch syslog
$ tail -f /var/log/syslog &

$ sudo cp /random_file.bin /test

WORKAROUND: sudo sh -c 'echo 16384 > /sys/block/sde/queue/max_sectors_kb'

We made this permanent by adding a udev rule:
$ cat /etc/udev/rules.d/61-block-max-sectors.rules
ACTION=="add|change", KERNEL=="sd[a-z]", SUBSYSTEM=="block", DRIVERS=="qla2xxx", RUN+="/bin/sh -c '/bin/echo 16384 > /sys/block/%k/queue/max_sectors_kb'"

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: linux-image-4.4.0-22-generic 4.4.0-22.40
ProcVersionSignature: Ubuntu 4.4.0-22.40-generic 4.4.8
Uname: Linux 4.4.0-22-generic x86_64
NonfreeKernelModules: zfs zunicode zcommon znvpair zavl
AlsaDevices:
 total 0
 crw-rw---- 1 root audio 116, 1 May 24 12:30 seq
 crw-rw---- 1 root audio 116, 33 May 24 12:30 timer
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.1-0ubuntu2
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
Date: Wed May 25 10:18:33 2016
HibernationDevice: RESUME=UUID=8e53840e-3198-4383-ac8b-0a811ed2d0b0
InstallationDate: Installed on 2016-05-19 (5 days ago)
InstallationMedia: Ubuntu-Server 16.04 LTS "Xenial Xerus" - Release amd64 (20160420.3)
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lsusb:
 Bus 002 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 003: ID 0424:2514 Standard Microsystems Corp. USB 2.0 Hub
 Bus 001 Device 002: ID 8087:0020 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
MachineType: Dell Inc. PowerEdge R310
PciMultimedia:

ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-4.4.0-22-generic root=UUID=459ae48a-619a-4e7e-8001-bbdf1968a944 ro
RelatedPackageVersions:
 linux-restricted-modules-4.4.0-22-generic N/A
 linux-backports-modules-4.4.0-22-generic N/A
 linux-firmware 1.157
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
dmi.bios.date: 08/17/2011
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.8.2
dmi.board.name: 05XKKK
dmi.board.vendor: Dell Inc.
dmi.board.version: A05
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.8.2:bd08/17/2011:svnDellInc.:pnPowerEdgeR310:pvr:rvnDellInc.:rn05XKKK:rvrA05:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R310
dmi.sys.vendor: Dell Inc.

Revision history for this message
Rainier Ramos (rkramos) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Rainier Ramos (rkramos) wrote :

The issue seems to be that the current default for max_sectors_kb is too high for our setup:

$ cat /sys/block/sde/queue/max_sectors_kb
32767

Cutting this in half resolved our issue:

$ sudo sh -c 'echo 16384 > /sys/block/sde/queue/max_sectors_kb'

We made this permanent by adding a udev rule:

$ cat /etc/udev/rules.d/61-block-max-sectors.rules
ACTION=="add|change", KERNEL=="sd[a-z]", SUBSYSTEM=="block", DRIVERS=="qla2xxx", RUN+="/bin/sh -c '/bin/echo 16384 > /sys/block/%k/queue/max_sectors_kb'"

Changed in linux (Ubuntu):
importance: Undecided → High
Revision history for this message
penalvch (penalvch) wrote :

Rainier Ramos, thank you for reporting this and helping make Ubuntu better.

In order to allow additional upstream developers to examine the issue, at your earliest convenience, could you please test the latest upstream kernel available from http://kernel.ubuntu.com/~kernel-ppa/mainline/?C=N;O=D ? Please keep in mind the following:
1) The one to test is at the very top line at the top of the page (not the daily folder).
2) The release names are irrelevant.
3) The folder time stamps aren't indicative of when the kernel actually was released upstream.
4) Install instructions are available at https://wiki.ubuntu.com/Kernel/MainlineBuilds .

If testing on your main install would be inconvenient, one may:
1) Install Ubuntu to a different partition and then test this there.
2) Backup, or clone the primary install.

If the latest kernel did not allow you to test to the issue (ex. you couldn't boot into the OS) please make a comment in your report about this, and continue to test the next most recent kernel version until you can test to the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this issue is fixed in the mainline kernel, please add the following tags by clicking on the yellow circle with a black pencil icon, next to the word Tags, located at the bottom of the report description:
kernel-fixed-upstream
kernel-fixed-upstream-X.Y-rcZ

Where X, and Y are the first two numbers of the kernel version, and Z is the release candidate number if it exists.

If the mainline kernel does not fix the issue, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-X.Y-rcZ

Please note, an error to install the kernel does not fit the criteria of kernel-bug-exists-upstream.

Also, you don't need to apport-collect further unless specifically requested to do so.

Once testing of the latest upstream kernel is complete, please mark this report Status Confirmed. Please let us know your results.

Thank you for your understanding.

tags: added: bios-outdated-1.12.0
description: updated
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-da-key
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.