htree_dirblock_to_tree:920: inode #53629599: block 214443464: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1667681412, rec_len=45654, name_len=39

Bug #1259829 reported by Ritesh Khadgaray on 2013-12-11
42
This bug affects 5 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Low
Unassigned

Bug Description

fs goes into read-only mode while building LibreOffice.

WORKAROUND: Disable discard option - /dev/mapper/volumegroup-root / ext4 discard,noatime,nodiratime,errors=remount-ro 0 1

disabling ncq has no effect.

$ dmesg
...
[ 2045.473249] virbr0: port 1(vnet0) entered forwarding state
[ 2045.473283] IPv6: ADDRCONF(NETDEV_CHANGE): virbr0: link becomes ready
[10660.961381] perf samples too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[11822.935891] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629599: block 214443464: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1667681412, rec_len=45654, name_len=39
[11822.935896] Aborting journal on device dm-1-8.
[11822.935998] EXT4-fs (dm-1): Remounting filesystem read-only
[11822.960425] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629605: block 214443466: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=2707156714, rec_len=19312, name_len=162
[11850.985003] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629557: block 214443458: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=512948573, rec_len=8858, name_len=176
[11850.985276] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629465: block 214443455: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=160939375, rec_len=26085, name_len=126
[11850.985499] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629325: block 214443451: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2322664969, rec_len=33791, name_len=132
[11850.985927] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629467: block 214443456: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=954332768, rec_len=21653, name_len=30
[11850.986409] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629074: block 214443433: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=2061605548, rec_len=4984, name_len=3
[11850.986831] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53628835: block 214443432: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=3523041938, rec_len=53167, name_len=41
[11850.987001] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629098: block 214443436: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=4225920287, rec_len=35138, name_len=75
[11850.987466] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629275: block 214443449: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=923253145, rec_len=44001, name_len=144
[11850.988115] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629270: block 214443448: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1892288796, rec_len=55247, name_len=58
[11851.042303] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629300: block 214443450: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=1809316884, rec_len=4208, name_len=195
[11851.042938] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629401: block 214443453: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=4223616103, rec_len=36326, name_len=130
[11851.045745] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629406: block 214443454: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=415237227, rec_len=24702, name_len=59
[11851.125849] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629101: block 214443437: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=23292377, rec_len=28820, name_len=135
[11851.126086] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629543: block 214443457: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2624043515, rec_len=58633, name_len=250
[11851.126292] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629244: block 214443446: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=194839831, rec_len=15430, name_len=123
[11851.126529] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629345: block 214443452: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=3316601052, rec_len=49124, name_len=10
[11851.126751] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629080: block 214443434: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=2978861680, rec_len=55340, name_len=75
[11851.127326] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629254: block 214443447: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=2927900409, rec_len=17708, name_len=67
[11851.191574] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53628436: block 214443431: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=3175406590, rec_len=48803, name_len=238
[11851.191832] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629561: block 214443459: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=880872205, rec_len=23750, name_len=202
[11851.192102] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629565: block 214443460: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=3673769297, rec_len=36964, name_len=8
[11851.193231] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629577: block 214443462: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=164744460, rec_len=27548, name_len=230
[ritesh@x230t libreoffice-4.1.2~rc3]$ ^C
[ritesh@x230t libreoffice-4.1.2~rc3]$ smartctl -a /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.12.0-7-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

[ritesh@x230t libreoffice-4.1.2~rc3]$ sudo smartctl -a /dev/sda
sudo: unable to open /var/lib/sudo/ritesh/4: No such file or directory
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.12.0-7-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: Crucial_CT960M500SSD1
Serial Number: 1335094BE7CA
LU WWN Device Id: 5 00a075 1094be7ca
Firmware Version: MU03
User Capacity: 960,197,124,096 bytes [960 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Dec 11 11:38:48 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
     was never started.
     Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
     without error or no self-test has ever
     been run.
Total time to complete Offline
data collection: ( 4470) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
     Auto Offline data collection on/off support.
     Suspend Offline collection upon new
     command.
     Offline surface scan supported.
     Self-test supported.
     Conveyance Self-test supported.
     Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
     power-saving mode.
     Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
     General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 74) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x0035) SCT Status supported.
     SCT Feature Control supported.
     SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x002f 100 100 000 Pre-fail Always - 21
  5 Reallocated_Sector_Ct 0x0033 100 100 000 Pre-fail Always - 0
  9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 287
 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 32
171 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
172 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0
173 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 1
174 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 13
180 Unused_Rsvd_Blk_Cnt_Tot 0x0033 000 000 000 Pre-fail Always - 16523
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 056 050 000 Old_age Always - 44 (Min/Max 22/50)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 16
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 000 Old_age Always - 0
202 Unknown_SSD_Attribute 0x0031 100 100 000 Pre-fail Offline - 0
206 Unknown_SSD_Attribute 0x000e 100 100 000 Old_age Always - 0
210 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 4
246 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 2001841864
247 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 63823477
248 Unknown_Attribute 0x0032 100 100 --- Old_age Always - 84291328

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Vendor (0xff) Completed without error 00% 279 -
# 2 Extended offline Completed without error 00% 273 -
# 3 Vendor (0xff) Completed without error 00% 90 -
# 4 Vendor (0xff) Completed without error 00% 5 -
# 5 Short offline Aborted by host 00% 2 -
# 6 Vendor (0xff) Completed without error 00% 2 -

SMART Selective self-test log data structure revision number 1
 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1 0 0 Not_testing
    2 0 0 Not_testing
    3 0 0 Not_testing
    4 0 0 Not_testing
    5 0 0 Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[ritesh@x230t libreoffice-4.1.2~rc3]$ uname -a
Linux x230t 3.12.0-7-generic #15-Ubuntu SMP Sun Dec 8 23:39:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ mount
/dev/mapper/ubuntu--vg-root on / type ext4 (rw,noatime,errors=remount-ro,discard)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/cgroup type tmpfs (rw)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /sys/firmware/efi/efivars type efivarfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /tmp type tmpfs (rw,noatime,mode=1777)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
none on /sys/fs/pstore type pstore (rw)
tmpfs on /var/tmp type tmpfs (rw,noatime,mode=1777)
tmpfs on /var/log type tmpfs (rw,noatime,mode=0755)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,relatime,hugetlb)
/dev/sda2 on /boot type ext2 (rw)
/dev/sda1 on /boot/efi type vfat (rw)
systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,user=ritesh)

fsck/badblock (read only) says disk is clean.
---
ApportVersion: 2.12.7-0ubuntu1
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC1: ritesh 2141 F.... pulseaudio
 /dev/snd/controlC0: ritesh 2141 F.... pulseaudio
CRDA:
 country IN:
  (2402 - 2482 @ 40), (N/A, 20)
  (5170 - 5250 @ 40), (N/A, 20)
  (5250 - 5330 @ 40), (N/A, 20), DFS
  (5735 - 5835 @ 40), (N/A, 20)
CurrentDesktop: Unity
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=f9f2cec2-98f6-4f05-850a-9f60676c3299
MachineType: ASUSTeK Computer Inc. K43SA
MarkForUpload: True
Package: linux (not installed)
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.12.0-7-generic root=/dev/mapper/ubuntu--vg-root ro radeon.dpm=1 quiet splash
ProcVersionSignature: Ubuntu 3.12.0-7.15-generic 3.12.4
RelatedPackageVersions:
 linux-restricted-modules-3.12.0-7-generic N/A
 linux-backports-modules-3.12.0-7-generic N/A
 linux-firmware 1.117
StagingDrivers: rts5139
Tags: trusty staging
Uname: Linux 3.12.0-7-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip kvm libvirtd lpadmin plugdev sambashare sudo
WifiSyslog:

dmi.bios.date: 11/17/2011
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: K43SA.211
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: K43SA
dmi.board.vendor: ASUSTeK Computer Inc.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK Computer Inc.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrK43SA.211:bd11/17/2011:svnASUSTeKComputerInc.:pnK43SA:pvr1.0:rvnASUSTeKComputerInc.:rnK43SA:rvr1.0:cvnASUSTeKComputerInc.:ct10:cvr1.0:
dmi.product.name: K43SA
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK Computer Inc.
---
ApportVersion: 2.12.7-0ubuntu3
Architecture: amd64
CurrentDesktop: Unity
DistroRelease: Ubuntu 14.04
InstallationDate: Installed on 2013-12-19 (1 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20131021.1)
MarkForUpload: True
Package: linux (not installed)
Tags: trusty
Uname: Linux 3.13.0-999-generic x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1259829

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty

apport information

tags: added: apport-collected staging
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Ritesh Khadgaray (khadgaray) wrote :

disabling "discard" option fixed this for me.

apport information

Changed in linux (Ubuntu):
status: Incomplete → New

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Did this issue occur in a previous version of Ubuntu, or is this a new issue?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.12 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13-rc3-trusty/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Ritesh Khadgaray (khadgaray) wrote :

> Did this issue occur in a previous version of Ubuntu, or is this a new issue?

This is a new disk, with the latest firmware ( purchased on 25th Nov). This issue was first seen when I tried to build libreoffice ( the build goes from 256mb to 32gb , and goes down to 18gb) .

Worked fine when building mozilla/firefox, and other packages.

> If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.
I have tested this with mainline kernel (3.13 rc3), and 3.12 from trusty. The issue is reproducible on both with discard option enabled.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Can you also test the 3.2 final kernel to see if this is a regression or not:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-precise/

tags: added: latest-bios-211 regression-potential
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Ritesh Khadgaray (khadgaray) wrote :

Unable to reproduce this issue anymore. Another odd thing which I noticed "fstrim -v /" , now returns "0 bytes" trimmed on multiple run. This was not the case earlier.

Ritesh Khadgaray (khadgaray) wrote :

The system did not reboot; after enabling discard and running fstrim/discard; due to corrupted partition table. I was unable to test this any further.

The disk seems to be fine,based on badblock check. memtest86 seems to show a clean system.

Changed in linux (Ubuntu):
status: Incomplete → New
Brad Figg (brad-figg) wrote :

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed

apport information

description: updated
Ritesh Khadgaray (khadgaray) wrote :

Saw something interesting with 3.13 daily build kernel
Linux K43SA 3.13.0-999-generic #201312200414 SMP Fri Dec 20 09:16:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[ 17.017210] psmouse serio4: elantech: assuming hardware version 3 (with firmware version 0x450f01)
[ 17.029210] psmouse serio4: elantech: Synaptics capabilities query result 0x78, 0x15, 0x0c.
[ 17.088588] input: ETPS/2 Elantech Touchpad as /devices/platform/i8042/serio4/input/input12
[ 18.284033] ata1: log page 10h reported inactive tag 0
[ 18.284039] ata1.00: exception Emask 0x1 SAct 0x30 SErr 0x0 action 0x0
[ 18.284040] ata1.00: irq_stat 0x40000008
[ 18.284042] ata1.00: failed command: READ FPDMA QUEUED
[ 18.284046] ata1.00: cmd 60/08:20:00:00:db/00:00:0b:00:00/40 tag 4 ncq 4096 in
[ 18.284046] res 40/00:2c:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[ 18.284048] ata1.00: status: { DRDY }
[ 18.284050] ata1.00: failed command: SEND FPDMA QUEUED
[ 18.284053] ata1.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq 512 out
[ 18.284053] res 40/00:2c:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[ 18.284054] ata1.00: status: { DRDY }
[ 18.284292] ata1.00: supports DRM functions and may not be fully accessible
[ 18.290634] ata1.00: supports DRM functions and may not be fully accessible
[ 18.296726] ata1.00: configured for UDMA/133
[ 18.296729] ata1.00: device reported invalid CHS sector 0
[ 18.296735] sd 0:0:0:0: [sda]
[ 18.296737] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 18.296738] sd 0:0:0:0: [sda]
[ 18.296739] Sense Key : Aborted Command [current] [descriptor]
[ 18.296742] Descriptor sense data with sense descriptors (in hex):
[ 18.296743] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 18.296749] 00 00 00 00
[ 18.296751] sd 0:0:0:0: [sda]
[ 18.296753] Add. Sense: No additional sense information
[ 18.296762] sd 0:0:0:0: [sda] CDB:
[ 18.296763] Write same(16): 93 08 00 00 00 00 00 1b 36 00 00 00 00 b0 00 00
[ 18.296770] end_request: I/O error, dev sda, sector 1783296
[ 18.296777] EXT4-fs (dm-1): discard request in group:1 block:1984 count:22 failed with -5
[ 18.296778] ata1: EH complete
[ 18.667971] wlan0: authenticate with 00:1b:57:ba:8a:a5

Ritesh Khadgaray (khadgaray) wrote :

I dont see these error message with 3.12 kernel, or if I boot with ncq disabled.

 $ echo 1 > /sys/block/sda/device/queue_depth

Additionally, fstrim seems to return the same upon reboot

first run
# fstrim -v /
/: 920118112256 bytes were trimmed

second run ( after reboot )
/: 920175951872 bytes were trimmed

third run (after reboot)
 fstrim -v /boot/
/boot/: 75911168 bytes were trimmed
root@K43SA:/home/ritesh# fstrim -v /
/: 920183271424 bytes were trimmed

fstrim does claim 0 bytes to trim, after each subsequent run

__self:
http://askubuntu.com/questions/133946/are-these-sata-errors-dangerous/133960#133960
https://bbs.archlinux.org/viewtopic.php?id=147189
http://www.itechlounge.net/2013/07/linux-ata-failed-command-read-fpdma-queued/
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1094446

tags: added: kernel-bug-exists-upstream-v3.13-rc3
removed: kernel-bug-exists-upstream
description: updated
Ritesh Khadgaray (khadgaray) wrote :

[ 0.000000] Linux version 3.13.0-999-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201312200414 SMP Fri Dec 20 09:16:44 UTC 2013

discard enabled., I am able to reproduce this. Copy files over twioce ( and deleting the first copy) .

Ritesh Khadgaray, thank you for your testing. I would avoid using the mainline daily folder as this would be a downstream construct, and upstream may not be terribly interested in it. Given you tested v3.13-rc3, this would be fine.

Despite this, would you mind testing a 3.2.x Ubuntu kernel series for regression purposes following https://wiki.ubuntu.com/Kernel/KernelBisection#Bisecting_Ubuntu_kernel_versions ?

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Ritesh Khadgaray (khadgaray) wrote :
Download full text (5.3 KiB)

[ 0.000000] Initializing cgroup subsys cpuset
[ 0.000000] Initializing cgroup subsys cpu
[ 0.000000] Linux version 3.2.53-030253-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1
ubuntu5) ) #201311281435 SMP Thu Nov 28 19:36:21 UTC 2013
[ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.2.53-030253-generic root=/dev/mapper/ubuntu--vg-root ro
 recovery nomodeset
[ 0.000000] KERNEL supported cpus:
...
[ 1.175970] SCSI subsystem initialized
[ 1.176000] libata version 3.00 loaded.
[ 1.176033] usbcore: registered new interface driver usbfs
[ 1.176043] usbcore: registered new interface driver hub
[ 1.176061] usbcore: registered new device driver usb
[ 2.047965] ahci 0000:00:1f.2: version 3.0
[ 2.047976] ahci 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
[ 2.048835] ahci 0000:00:1f.2: irq 45 for MSI/MSI-X
[ 2.062968] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x5 impl SATA mode
[ 2.063847] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ems apst
[ 2.064665] ahci 0000:00:1f.2: setting latency timer to 64
[ 2.071412] scsi0 : ahci
[ 2.072281] scsi1 : ahci
[ 2.073122] scsi2 : ahci
[ 2.073944] scsi3 : ahci
[ 2.074745] scsi4 : ahci
[ 2.075535] scsi5 : ahci
[ 2.076425] ata1: SATA max UDMA/133 abar m2048@0xdff06000 port 0xdff06100 irq 45
[ 2.077157] ata2: DUMMY
[ 2.077871] ata3: SATA max UDMA/133 abar m2048@0xdff06000 port 0xdff06200 irq 45
[ 2.078607] ata4: DUMMY
[ 2.079339] ata5: DUMMY
[ 2.080055] ata6: DUMMY
[ 2.081032] Fixed MDIO Bus: probed
[ 2.081743] tun: Universal TUN/TAP device driver, 1.6
[ 2.082431] tun: (C) 1999-2004 Max Krasnyansky <email address hidden>
[ 2.083154] PPP generic driver version 2.4.2
...
[ 2.271428] rtc_cmos 00:06: setting system clock to 2013-12-24 15:50:13 UTC (1387900213)
[ 2.273757] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[ 2.274368] EDD information not available.
[ 2.398657] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2.399336] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2.400529] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 2.401154] ata1.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
[ 2.401206] ata1.00: ACPI cmd ef/90:03:00:00:00:a0 (SET FEATURES) succeeded
[ 2.401527] ata1.00: supports DRM functions and may not be fully accessible
[ 2.402229] ata1.00: ATA-9: Crucial_CT960M500SSD1, MU03, max UDMA/133
[ 2.402871] ata1.00: 1875385008 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 2.404074] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 2.404721] ata1.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
[ 2.404772] ata1.00: ACPI cmd ef/90:03:00:00:00:a0 (SET FEATURES) succeeded
[ 2.405012] ata1.00: supports DRM functions and may not be fully accessible
[ 2.405767] ata1.00: configured for UDMA/133
[ 2.406557] scsi 0:0:0:0: Direct-Access ATA Crucial_CT960M50 MU03 PQ: 0 ANSI: 5
[ 2.407392] sd 0:0:0:0: Attached scsi generic sg0 type 0
[ 2.407427] sd 0:0:0:0: [sda] 1875385008 512-byte logical blocks:...

Read more...

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Ritesh Khadgaray (khadgaray) wrote :

fs layout

/boot -> sda2, ext4 with discard
/boot/efi -> sda1, vfat
/ -> sda3_crypt ( luks with lvm w/ discard enabled )

I enable discard as described using - http://askubuntu.com/a/122206/125818 .

description: updated
Ritesh Khadgaray (khadgaray) wrote :

The run, after the crash is seen

$ fstrim -v /
/: 601705070592 bytes were trimmed

Additionally, it seems trim is not required. I could be wrong though.
http://forum.crucial.com/t5/Solid-State-Drives-SSD/M500-on-OS-X-Let-SSD-sort-itself-out-or-enable-TRIM/td-p/127854q

description: updated
description: updated

Ritesh Khadgaray, thank you for your testing results. Given you tested this in a recent mainline kernel (although not the most recent), and you have a SSD (relatively newer HW from kernel perspective), I'm going to mark this Traiged for now. Hence, the issue you are reporting is an upstream one. Could you please report this problem through the appropriate channel (I would check in with linux-ext4 and CC "Theodore Ts'o" <email address hidden> and Andreas Dilger <email address hidden>) by following the instructions _verbatim_ at https://wiki.ubuntu.com/Bugs/Upstream/kernel#KernelTeam.2BAC8-KernelTeamBugPolicies.Overview_on_Reporting_Bugs_Upstream ?

Please provide a direct URL to your post once you have made it so that it may be tracked.

Thank you for your understanding.

Changed in linux (Ubuntu):
importance: Medium → Low
status: Confirmed → Triaged
tags: removed: regression-potential
Ritesh Khadgaray (khadgaray) wrote :

Posted this to linux-ext4 list, on my second attempt. They reject multipart ( containing html) messages.

http://thread.gmane.org/gmane.comp.file-systems.ext4/41969 .

Ritesh Khadgaray (khadgaray) wrote :

ubuntu should enabled discard by default.

-------------------------------------------
http://article.gmane.org/gmane.comp.file-systems.ext4/41974

From Theodore Ts'o via thunk.org

This is a hardware bug, unfortunately. And it's also the reason why
discard is not on by default.

These days, what I normally tell people is to not use the discard
mount option at all, and instead use the fstrim program, run out of
cron maybe once a week or even every night if you are anal. (But for
most workloads, once a week is plenty.) The main place where the
discard option makes sense is if you are using a very expensive PCIe
attached flash device. Those devices are much more likely to have a
competently implemented DISCARD command, and they generally don't
destroy performance forcing a queue flush for every single DISCARD
request.

However, in your case, if discard commands are causing on-disk
corruption, I'm not sure I can even in good conscience recommend using
fstrim.

> Device Model: Crucial_CT960M500SSD1
> Serial Number: 1335094BE7CA
> LU WWN Device Id: 5 00a075 1094be7ca
> Firmware Version: MU03

Instead, all I can do is suggest that you consider whether you should
replace your SSD. Historically, I've stuck with Intel SSD's because
they are the ones that have tended to be the most reliable. Intel has
unfortunately, been slow to market because they insist on testing
their products extensively and only releasing them when they are
solid, which has cost them market share. Unfortuantely, the market
doesn't always reward quality. More recently, I've started using
Samsung SSD's. I have a Samsung 840 PRO and the Intel 525 240GB mSATA
SSD's in my laptop, and so far, I've not had any problems with either.
They are definitely not the cheapest nor the most performant devices
in head-to-head testing, but that's not the only dimension that I care
about....

More (somewhat depressing) investigations about the quality of SSD's
these days:

https://plus.google.com/+MarcMERLIN/posts/Us8yjK9SPs6
http://lkcl.net/reports/ssd_analysis.html
https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault

                                                - Ted

P.S. Some really crappy SSD devices have brick'ed themselves when
they are given a heavy discard load, particularly one which is mixed
with other traffic, and this is what the "discard" mount option
provides. Note that if the fstrim command is executed while you are
also trying to put the device under heavy read/write workloads, it
could also result in the same kind of corruption and/or brick'ing of
the SSD. Which is why I hesitate to recommend switching to fstrim for
a device which is known to mishandle the DISCARD command, and to
suggest simply not using the DISCARD feature at all --- and if this
results in increased performance lost or increased write wear, to just
replace the SSD as an inferior quality product before it does any
further damage to your data

Martin Pitt (pitti) wrote :

For the record, due to this bug we limited automatic fstrim to Samsung and Intel drives for now: https://launchpad.net/ubuntu/+source/util-linux/2.20.1-5.1ubuntu14

I can't say I like that much as it excludes a lot of other "good" SSDs from being trimmed by default, but better safe than sorry. We can extend the list of known-good models/brands, or get more evidence that this problem affects so few devices that we should rather use a blacklist approach, or we find that this SSD model is broken anyway and ignore the bug (for example, it's likely that this also throws up if more than application is doing large I/O on it at the same time?)

Dave Chiluk (chiluk) wrote :

@pitti As it appears that we are going for a whitelist.

I have
Device Model: Patriot Wildfire
Serial Number: PT1145A00024835
LU WWN Device Id: 0 000120 000000000
Firmware Version: 502ABBF0
User Capacity: 120,034,123,776 bytes [120 GB]

Device Model: OCZ-VERTEX3 MI
Serial Number: OCZ-D38A6W57990V6TG5
LU WWN Device Id: 5 e83a97 53ae3311c
Firmware Version: 2.25
User Capacity: 240,057,409,536 bytes [240 GB]

Model Family: Indilinx Barefoot_2/Everest/Martini based SSDs
Device Model: OCZ-VERTEX4
Serial Number: OCZ-RVC8K85817A17KIN
LU WWN Device Id: 5 e83a97 fcc5107e1
Firmware Version: 1.5.1
User Capacity: 256,060,514,304 bytes [256 GB]'

Device Model: SanDisk SSD i100 24GB
Serial Number: 122800133466
LU WWN Device Id: 5 001b44 7a30a515a
Firmware Version: 11.50.02
User Capacity: 24,015,495,168 bytes [24.0 GB]

So far none of the above have shown this problem, and I've been running with discard on for at least a year on all the above drives.

@cking
@pitti
If you guys want me to run any real tests on these devices let me know.

Dave Chiluk (chiluk) wrote :

Also it just occurred to me that white-listing by manufacturer may still not catch all problematic drives, as most third party manufacturers that source their controllers have drives with controllers from other suppliers as well.

For what it's worth the crucial drive mentioned above is using a Marvell SS9187. The only other manufacturer that has released a drive with that controller is plextor (at least that's what google tells me).

It's a fairly safe assumption that if it's broken for one oem it'll be broken for another.

@pitti I don't envy you.

Martin Pitt (pitti) wrote :

If we have evidence that Ritesh's SSD is the only (or one of very few) affected models, I'm also not opposed to just blacklisting that. But we really can't know. At least it's now fairly easy to enable in /etc/cron.weekly/fstrim on all models, but of course that whole TRIM business is still way too incomprehensible for the average user.

Ritesh, Dave, would you mind giving me the output of "sudo hdparm -I /dev/sda | head -n 10", so that I see the "Model number" as hdparm sees it? I'm not sure how you got the "Device Model:" strings, but they don't seem to be the same as hdparm's. Thanks!

Ritesh Khadgaray (khadgaray) wrote :

ATA device, with non-removable media
 Model Number: Crucial_CT960M500SSD1
 Serial Number: 1335094BE7CA
 Firmware Revision: MU03
 Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
 Used: unknown (minor revision code 0x0028)
 Supported: 9 8 7 6 5
 Likely used: 9
Configuration:
 Logical max current
 cylinders 16383 16383
 heads 16 16
 sectors/track 63 63
 --
 CHS current addressable sectors: 16514064
 LBA user addressable sectors: 268435455
 LBA48 user addressable sectors: 1875385008
 Logical Sector size: 512 bytes
 Physical Sector size: 4096 bytes
 Logical Sector-0 offset: 0 bytes
 device size with M = 1024*1024: 915715 MBytes
 device size with M = 1000*1000: 960197 MBytes (960 GB)
 cache/buffer size = unknown
 Form Factor: 2.5 inch
 Nominal Media Rotation Rate: Solid State Device
Capabilities:
 LBA, IORDY(can be disabled)
 Queue depth: 32
 Standby timer values: spec'd by Standard, with device specific minimum
 R/W multiple sector transfer: Max = 16 Current = 16
 Advanced power management level: 254
 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
      Cycle time: min=120ns recommended=120ns
 PIO: pio0 pio1 pio2 pio3 pio4
      Cycle time: no flow control=120ns IORDY flow control=120ns

Dave Chiluk (chiluk) wrote :

We were using smartctl -a /dev/sd<blah> from smartmontools.

ATA device, with non-removable media
 Model Number: OCZ-VERTEX3 MI
 Serial Number: OCZ-D38A6W57990V6TG5
 Firmware Revision: 2.25
 Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

ATA device, with non-removable media
 Model Number: Patriot Wildfire
 Serial Number: PT1145A00024835
 Firmware Revision: 502ABBF0
 Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

ATA device, with non-removable media
 Model Number: OCZ-VERTEX4
 Serial Number: OCZ-RVC8K85817A17KIN
 Firmware Revision: 1.5.1
 Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

ATA device, with non-removable media
 Model Number: OCZ-VERTEX4
 Serial Number: OCZ-RVC8K85817A17KIN
 Firmware Revision: 1.5.1
 Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

Also rtg, running a Samsung 840 evo using fstrim may also be experiencing a corruption bug, but he doesn't get any notice in his logs. We'll keep you posted on him once he figures out what is going on. His could just be related to running crazy kernels.

Martin Pitt (pitti) wrote :

Dave, thanks; so smartctl and hdparm indeed show the same model number, just wanted to confirm. I whitelisted these in http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/trusty/util-linux/trusty/revision/103

What kind of heavy simple test can I run on my Kingston and Kingspec SSDs to know if they also are affected by this kind of problems ?

A simple testing method should help build the whitelist/blacklist.

I have access to several SSDs from a lot of manufacturers, the only ones that are not already whitelisted are Kingston and Kingspec, I have some of them of various models, so I may be able to help.

I'm already using them with discard enabled, but I'm not usually doing anything crazy on them. If I should compile LibreOffice or do some kind of stress testing on them, I will if you ask me.

Martin Pitt (pitti) wrote :

A simple smoke test might be to run something like

  sudo dd if=/dev/sda of=/dev/zero &
  sudo find / >/dev/null 2>&1 &
  sudo grep -r . / >/dev/null &

and run fstrim-all in parallel. You can of course do the original thing of setting up a schroot and sbuild libreoffice in that (or just "sudo apt-get build-dep libreoffice; apt-get download -b libreoffice" on your workstation if you don't mind it getting cluttered with lots and lots of build depends).

Riccardo 'c10ud' (c10ud) wrote :

Hello, yesterday I installed trusty on my Kingston 60GB SSD (Model=KINGSTON SV300S37A60G, FwRev=525ABBF0)

After running the simple test with no error I added --no-model-check to /etc/cron.weekly/fstrim...so far so good

robi (robreg) wrote :

KINGSTON SV200S364G with fw E111008a seems OK too

robi (robreg) wrote :

Also KINGSTON SVP200S37A60G with fw 502ABBF0

I tested on a KINGSTON v300 and a v100 and everything works fine.

I used the discard mount option as it seems to be the most efficient to detect this misbehavior.
And then I started:

sudo apt-get build-dep libreoffice; apt-get source -b libreoffice

It worked fine, I also tried with periodic fstrim -v /
No problem either.

I'm now starting to do the same test on a Kingspec C3000.6 and several other models from the same manufacturer.

Todor Andreev (toshko3) wrote :

I could test my KINGSTON SV300S37A120G with firmware 521ABBF0.
I used Martin's test:

  sudo dd if=/dev/sda of=/dev/zero &
  sudo find / >/dev/null 2>&1 &
  sudo grep -r . / >/dev/null &

and manual "fstrim -v /" several times.
I did this for about 5 minutes.
No problems and "fsck.ext4 /dev/sda" said, CLEAN.
If Ritesh Khadgaray could report if this problem occurs with this test, we will be of use. Otherwise, discard my test.

Joshua (njj) wrote :

Has anyone already tried to report this issue to Crucial?

Joshua (njj) wrote :

@Ritesh, there is a newer firmware [1] available for your device, could you test if this fixes the issue?

[1] http://forum.crucial.com/t5/Solid-State-Drives-SSD/Feedback-Thread-Firmware-MU05-for-the-M500/td-p/146872

Joshua (njj) wrote :

@Offtopic: Isn't it possible to edit previous posts? Because it feels like an monologue to post news comments all the time.

However I opened a thread on the Crucial forum [1] and a nice guy referred me to another thread [2] which tells that the bug is fixed with firmware MU05. And there a two link to kernel mailinglist posts which are blacklisting all Crucial SSDs for queued trim [3] and reverting this for MU05 [4]. Is this the same "whiteliste" the Ubuntu people are talking about? When I want to test trim with newest firmware MU05 is it enough to add "--no-model-check" or do I also have to use a special kernel?

[1] http://forum.crucial.com/t5/Solid-State-Drives-SSD/M500-on-Linux-with-fstrim-or-discard-under-heavy-load-could/td-p/151522
[2] http://forum.crucial.com/t5/Solid-State-Drives-SSD/M500-M5x0-QUEUED-TRIM-data-corruption-alert-mostly-for-Linux/td-p/151028
[3] http://comments.gmane.org/gmane.linux.ide/56084
[4] http://www.spinics.net/lists/linux-ide/msg48361.html

Martin Spacek (mspacek) on 2014-06-30
summary: - htree_dirblock_to_tree:920: inode #53629599: block 214443464: comm rm:
- bad entry in directory: rec_len % 4 != 0 - offset=0(0),
- inode=1667681412, rec_len=45654, name_len=39
+ fstrim corrpution on some SSDs
summary: - fstrim corrpution on some SSDs
+ fstrim corrupution on some SSDs
Download full text (4.2 KiB)

I've installed Xubuntu 14.04 on a brand new Crucial MX100 SSD (CT512MX100SSD1, original MU01 firmware, no newer firmware available) on a Thinkpad W510, and saw the note in the cron.weekly/fstrim file. After running fstrim manually the first two times, I got this:

$ sudo fstrim -v /
/: 498557419520 bytes were trimmed
$ sudo fstrim -v /
/: 0 bytes were trimmed

I also ran some of Martin Pitt's load tests for a few minutes while calling fstrim -v repeatedly. No problems.

Since Crucial updated the firmware to MU05 on 2014-03-25 for the affected M500, and since the MX100 drive apparently has a slightly newer controller...

"These new Crucial MX100 SSD series features the Marvell 88SS9189 controller, a minor upgrade from the 88SS9187 that was found in the M500 drive."
http://www.legitreviews.com/crucial-mx100-256gb-512gb-ssd-review_143984

...I'll be brave and add --no-model-check to the "exec fstrim-all" line in the cron fstrim file. If I have any problems, I'll report back.

from hdparm -I:

ATA device, with non-removable media
 Model Number: Crucial_CT512MX100SSD1
 Serial Number: xxxxxxxxxxx
 Firmware Revision: MU01
 Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
 Used: unknown (minor revision code 0x0028)
 Supported: 9 8 7 6 5
 Likely used: 9
Configuration:
 Logical max current
 cylinders 16383 16383
 heads 16 16
 sectors/track 63 63
 --
 CHS current addressable sectors: 16514064
 LBA user addressable sectors: 268435455
 LBA48 user addressable sectors: 1000215216
 Logical Sector size: 512 bytes
 Physical Sector size: 4096 bytes
 Logical Sector-0 offset: 0 bytes
 device size with M = 1024*1024: 488386 MBytes
 device size with M = 1000*1000: 512110 MBytes (512 GB)
 cache/buffer size = unknown
 Form Factor: 2.5 inch
 Nominal Media Rotation Rate: Solid State Device
Capabilities:
 LBA, IORDY(can be disabled)
 Standby timer values: spec'd by Standard, with device specific minimum
 R/W multiple sector transfer: Max = 16 Current = 16
 Advanced power management level: 254
 DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
      Cycle time: min=120ns recommended=120ns
 PIO: pio0 pio1 pio2 pio3 pio4
      Cycle time: no flow control=120ns IORDY flow control=120ns
Commands/features:
 Enabled Supported:
    * SMART feature set
      Security Mode feature set
    * Write cache
    * Look-ahead
    * Host Protected Area feature set
    * WRITE_BUFFER command
    * READ_BUFFER command
    * NOP cmd
    * DOWNLOAD_MICROCODE
    * Advanced Power Management feature set
      SET_MAX security extension
    * 48-bit Address feature set
    * Device Configuration Overlay feature set
    * Mandatory FLUSH_CACHE
    * FLUSH_CACHE_EXT
    * SMART error logging
    * SMART self-test
    * General Purpose Logging feature set
    * WRITE_{DMA|MULTIPLE}_FUA_EXT
    * 64-bit World wide name
    * IDLE_IMMEDIATE with UNLOAD
      Write-Read-Verify feature set
    * WRITE_UNCORRECTABLE_EXT command
    * {READ,WRITE}_DMA_EXT_GPL commands
    * Segmented DOWNLOAD_MICROCODE
    ...

Read more...

Martin Spacek, thank you for your comment. So your hardware and problem may be tracked, could you please file a new report with Ubuntu by executing the following in a terminal while booted into the default Ubuntu kernel (not a mainline one) via:
ubuntu-bug linux

For more on this, please read the official Ubuntu documentation:
Ubuntu Bug Control and Ubuntu Bug Squad: https://wiki.ubuntu.com/Bugs/BestPractices#X.2BAC8-Reporting.Focus_on_One_Issue
Ubuntu Kernel Team: https://wiki.ubuntu.com/KernelTeam/KernelTeamBugPolicies#Filing_Kernel_Bug_reports
Ubuntu Community: https://help.ubuntu.com/community/ReportingBugs#Bug_reporting_etiquette

When opening up the new report, please feel free to subscribe me to it.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

summary: - fstrim corrupution on some SSDs
+ htree_dirblock_to_tree:920: inode #53629599: block 214443464: comm rm:
+ bad entry in directory: rec_len % 4 != 0 - offset=0(0),
+ inode=1667681412, rec_len=45654, name_len=39
Martin Spacek (mspacek) wrote :

@penalvch,

Huh? I'm not currently having this problem. Just reporting that my Crucial CT512MX100SSD1 currently doesn't seem to exhibit the problem, and therefore should maybe be added to the whitelist.

If you still want me to file a separate bug report (to get more hardware details?), please let me know.

Martin Spacek (mspacek) wrote :

Note that there are reports on the Crucial forum that the Crucial M550 has the same problem, and therefore because it uses the same controller (and firmware?), the MX100 does too:

http://forum.crucial.com/t5/Solid-State-Drives-SSD/M500-M5x0-QUEUED-TRIM-data-corruption-alert-mostly-for-Linux/m-p/154146#M43579

I'm not so brave any more, so I've removed the --no-model-check to the "exec fstrim-all" line in the cron fstrim file for my MX100.

Joshua (njj) wrote :

@Martin: The "--no-model-check" has no effect at the moment anyway because trim is disabled for these crucial SSDs in all recent kernel versions.

Martin Spacek (mspacek) wrote :

@njj: I'm not so sure. According to a post from hmh just a few minutes ago on the above posted Crucial forum thread, the MX100 is not blacklisted in the kernel. Also, when I run "fstrim -v /", I get a response that bytes were trimmed. When I run it again, I get 0 bytes trimmed (see #53). Is this expected behaviour for a trim-disabled SSD?

Allard Pruim (allardpruim) wrote :

@Martin: I'm also using a Crucial M500 which had the MU03 firmware. I updated to MU05 and I did run trim manually. When I execute the command I get a message which saying there a bytes trimmed. When I run it again I also get the 0 bytes trimmed message. I don't know that this is normal behavior. Also when I run the trim command with my SSD I don't get any strange messages in my logs as far as I can see.

Allard Pruim (allardpruim) wrote :

@Martin, When I look in my dmesg log I found out that Ubuntu is ineed blocking the trim.

allard@ubuntu-pc-allard:~$ dmesg | grep ata1
[ 0.949268] ata1: SATA max UDMA/133 abar m2048@0xfe30c000 port 0xfe30c100 irq 50
[ 1.439698] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 1.440024] ata1.00: supports DRM functions and may not be fully accessible
[ 1.442830] ata1.00: disabling queued TRIM support
[ 1.442831] ata1.00: ATA-9: Crucial_CT120M500SSD1, MU05, max UDMA/133
[ 1.442833] ata1.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 1.446452] ata1.00: supports DRM functions and may not be fully accessible
[ 1.449364] ata1.00: disabling queued TRIM support
[ 1.452636] ata1.00: configured for UDMA/133

Martin Spacek (mspacek) wrote :
Download full text (5.3 KiB)

@Allard, OK, that's good to know that a message shows up in dmseg saying the queued trim is disabled. I don't have that message for my MX100, so it seems trim isn't disabled in the kernel for the MX100:

$ dmesg | grep ata1
[ 1.995490] ata1: SATA max UDMA/133 abar m2048@0xf2627000 port 0xf2627100 irq 54
[ 2.313452] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2.314803] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 2.314811] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 2.314976] ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 (SET FEATURES) rejected by device (Stat=0x51 Err=0x04)
[ 2.314978] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 2.315360] ata1.00: supports DRM functions and may not be fully accessible
[ 2.319343] ata1.00: ATA-9: Crucial_CT512MX100SSD1, MU01, max UDMA/133
[ 2.319349] ata1.00: 1000215216 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[ 2.325081] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 2.325089] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 2.325152] ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 (SET FEATURES) rejected by device (Stat=0x51 Err=0x04)
[ 2.325157] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 2.325524] ata1.00: supports DRM functions and may not be fully accessible
[ 2.333873] ata1.00: configured for UDMA/133
[ 35.495513] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 35.496830] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 35.496832] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 35.496875] ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 (SET FEATURES) rejected by device (Stat=0x51 Err=0x04)
[ 35.496876] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 35.497154] ata1.00: supports DRM functions and may not be fully accessible
[ 35.506398] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 35.506399] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 35.506439] ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 (SET FEATURES) rejected by device (Stat=0x51 Err=0x04)
[ 35.506441] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 35.506712] ata1.00: supports DRM functions and may not be fully accessible
[ 35.514807] ata1.00: configured for UDMA/133
[ 72.563250] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 72.564536] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 72.564538] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 72.564596] ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 (SET FEATURES) rejected by device (Stat=0x51 Err=0x04)
[ 72.564597] ata1.00: ACPI cmd ef/10:03:00:00:00:a0 (SET FEATURES) filtered out
[ 72.564868] ata1.00: supports DRM functions and may not be fully accessible
[ 72.574059] ata1.00: ACPI cmd ef/02:00:00:00:00:a0 (SET FEATURES) succeeded
[ 72.574061] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[ 72.574108] ata1.00: ACPI cmd ef/5f:00:00:00:00:a0 (SET FEATURE...

Read more...

Hi,

I also ran Martin's test on my XPS 13 (9333) SDD and I didn't detect any issue.

This is hdparm's output:

$ sudo hdparm -I /dev/sda | head -n 10

/dev/sda:

ATA device, with non-removable media
 Model Number: LITEONIT LMT-256M6M mSATA 256GB
 Serial Number: TW0XXM305508541F5115
 Firmware Revision: DM8110F
 Transport: Serial, ATA8-AST, SATA II Extensions, SATA Rev 2.6, SATA Rev 3.0
Standards:
 Used: ATA/ATAPI-7 T13 1532D revision 4a

And this is the link on the Manufacturer website: http://www.liteonssd.com/index.php?option=com_zoo&view=item&category_id=0&item_id=2484&Itemid=154

FWIW, this drive seems to be using the same Marvell 88SS9187 Flash controller than the Crucial MX100 SSD series.

Martin Spacek (mspacek) wrote :

@Jeffrey

AFAIK, the Crucial M500 uses the Marvell 88SS9187 controller, while the M550 and MX100 both use the Marvell 88SS9189 controller, and therefore probably share the same firmware (or maybe the MX100 firmware was forked off of the M550 firmware):

"These new Crucial MX100 SSD series features the Marvell 88SS9189 controller, a minor upgrade from the 88SS9187 that was found in the M500 drive."

http://www.legitreviews.com/crucial-mx100-256gb-512gb-ssd-review_143984

"The short summary is that the MX100 builds on the same architecture as the M500 and M550. The only fundamental difference is the NAND inside as the controller is the same Marvell 88SS9189 silicon as found inside the M550. The 9189 is a minor upgrade over the 9187 and what it does is provide better support for DevSLP along with some bandwidth optimizations."

http://www.anandtech.com/show/8066/crucial-mx100-256gb-512gb-review

@Martin you're right, thanks for the correction.

This means my drive has the same controller as OP's faulty Crucial drive. How probable is it that the corruption problem has anything to do with the flash controller (as opposed to, let's say, the drive's firmware)?

Regards,

Martin Spacek (mspacek) wrote :

@Jeffrey, I don't know for sure, but I'd say it's likely that two drives from the same manufacturer using the same controller will at the very least be both derived from the same firmware code base, and their firmware will therefore likely have a lot of bugs (and features) in common. Since your drive is from another manufacturer (or is it just a relabelled Crucial drive?), it's maybe more likely to have significantly different firmware, and maybe less likely to have the corruption bug. My bet would be the corruption problem is a firmware problem, one that could be fixed. If it is indeed a silicon problem, it could be worked around in the firmware, or features could be disabled to ensure data integrity.

juanmanuel (rockerito99) wrote :

OMG!
Has anyone besides the OP reproduced the issue on these drives?

I'm not even sure that his FS corruption, in this particular instance, is because of the hardware layer:

       The Original Poster (Ritesh), states in post #31 (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1259829/comments/31) that he is using an encryption layer below EXT4 for the root partition.

It is totally NOT recommended to use TRIM or DISCARD together with a software encryption level solution, and it might be causing the problems that the OP encountered. The ext4 errors problems are easy to explain (fs corruption due to messing with "empty space" on which the encryption solution relied). The read error problems, could also be easily explained: a corrupted filesystem which has corrupted pointers to invalid regions of the disk, will provoke ata read errors.

All this is because the encryption layer might store internal info on more blocks than the EXT4 fs reports. Fstrim/discard only looks at used blocks from the EXT4 filesystem's perspective. The result is that one could end up trimming a block used by LUKS (dm-crypt), and corrupt a little or a lot of data, depending on luck. (unless luks/dm-crypt guarantees that all used blocks are also mapped by EXT4).

Cheers,
Juan Manuel Cabo

Martin Pitt (pitti) wrote :

This device is blacklisted now in the kernel: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/drivers/ata/libata-core.c Thus this should be fixed for utopic.

I dropped our whitelisting now as the kernel already does the blacklisting and it does not seem to affect lots of drives:

util-linux (2.25-8ubuntu3) utopic; urgency=medium
  [...]
  * Drop debian/fstrim-all{,.8}, and adjust debian/fstrim-all.cron to call
    "fstrim --all" instead. Drop our whitelisting. There do not seem to be
    many broken drives around with broken TRIM support, and the kernel already
    has a blacklist (tree/drivers/ata/libata-core.c, ATA_HORKAGE_NO_NCQ_TRIM).
 [...]
 -- Martin Pitt <email address hidden> Wed, 20 Aug 2014 07:18:28 +0200

Changed in linux (Ubuntu):
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.