Ubuntu
linux package

htree_dirblock_to_tree:920: inode #53629599: block 214443464: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1667681412, rec_len=45654, name_len=39

Bug #1259829 reported by Ritesh Khadgaray on 2013-12-11

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	linux (Ubuntu)	Fix Released	Low	Unassigned

Bug Description

fs goes into read-only mode while building LibreOffice.

WORKAROUND: Disable discard option - /dev/mapper/volumegroup-root / ext4 discard,noatime,nodiratime,errors=remount-ro 0 1

disabling ncq has no effect.

$ dmesg
...
[ 2045.473249] virbr0: port 1(vnet0) entered forwarding state
[ 2045.473283] IPv6: ADDRCONF(NETDEV_CHANGE): virbr0: link becomes ready
[10660.961381] perf samples too long (2505 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
[11822.935891] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629599: block 214443464: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1667681412, rec_len=45654, name_len=39
[11822.935896] Aborting journal on device dm-1-8.
[11822.935998] EXT4-fs (dm-1): Remounting filesystem read-only
[11822.960425] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629605: block 214443466: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=2707156714, rec_len=19312, name_len=162
[11850.985003] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629557: block 214443458: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=512948573, rec_len=8858, name_len=176
[11850.985276] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629465: block 214443455: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=160939375, rec_len=26085, name_len=126
[11850.985499] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629325: block 214443451: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2322664969, rec_len=33791, name_len=132
[11850.985927] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629467: block 214443456: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=954332768, rec_len=21653, name_len=30
[11850.986409] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629074: block 214443433: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=2061605548, rec_len=4984, name_len=3
[11850.986831] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53628835: block 214443432: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=3523041938, rec_len=53167, name_len=41
[11850.987001] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629098: block 214443436: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=4225920287, rec_len=35138, name_len=75
[11850.987466] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629275: block 214443449: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=923253145, rec_len=44001, name_len=144
[11850.988115] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629270: block 214443448: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1892288796, rec_len=55247, name_len=58
[11851.042303] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629300: block 214443450: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=1809316884, rec_len=4208, name_len=195
[11851.042938] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629401: block 214443453: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=4223616103, rec_len=36326, name_len=130
[11851.045745] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629406: block 214443454: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=415237227, rec_len=24702, name_len=59
[11851.125849] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629101: block 214443437: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=23292377, rec_len=28820, name_len=135
[11851.126086] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629543: block 214443457: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=2624043515, rec_len=58633, name_len=250
[11851.126292] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629244: block 214443446: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=194839831, rec_len=15430, name_len=123
[11851.126529] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629345: block 214443452: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=3316601052, rec_len=49124, name_len=10
[11851.126751] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629080: block 214443434: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=2978861680, rec_len=55340, name_len=75
[11851.127326] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629254: block 214443447: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=2927900409, rec_len=17708, name_len=67
[11851.191574] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53628436: block 214443431: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=3175406590, rec_len=48803, name_len=238
[11851.191832] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629561: block 214443459: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=880872205, rec_len=23750, name_len=202
[11851.192102] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629565: block 214443460: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=3673769297, rec_len=36964, name_len=8
[11851.193231] EXT4-fs error (device dm-1): htree_dirblock_to_tree:920: inode #53629577: block 214443462: comm rm: bad entry in directory: directory entry across range - offset=0(0), inode=164744460, rec_len=27548, name_len=230
[ritesh@x230t libreoffice-4.1.2~rc3]$ ^C
[ritesh@x230t libreoffice-4.1.2~rc3]$ smartctl -a /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.12.0-7-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

[ritesh@x230t libreoffice-4.1.2~rc3]$ sudo smartctl -a /dev/sda
sudo: unable to open /var/lib/sudo/ritesh/4: No such file or directory
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.12.0-7-generic] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model: Crucial_CT960M500SSD1
Serial Number: 1335094BE7CA
LU WWN Device Id: 5 00a075 1094be7ca
Firmware Version: MU03
User Capacity: 960,197,124,096 bytes [960 GB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: Solid State Device
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-2, ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Dec 11 11:38:48 2013 IST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x80) Offline data collection activity
     was never started.
     Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
     without error or no self-test has ever
     been run.
Total time to complete Offline
data collection: ( 4470) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
     Auto Offline data collection on/off support.
     Suspend Offline collection upon new
     command.
     Offline surface scan supported.
     Self-test supported.
     Conveyance Self-test supported.
     Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
     power-saving mode.
     Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
     General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 74) minutes.
Conveyance self-test routine
recommended polling time: ( 3) minutes.
SCT capabilities: (0x0035) SCT Status supported.
     SCT Feature Control supported.
     SCT Data Table supported.

SMART Attributes Vendor Specific ID# ATTRIBUTE_NAME 1 Raw_Read_Error_Rate 5 9 Power_On_Hours 12 Power_Cycle_Count 171 Unknown_Attribute 172 Unknown_Attribute 173 Unknown_Attribute 174 Unknown_Attribute 180 Unused_Uncorrectable 199 UDMA_CRC_Error_Count 202 Unknown_SSD_Attribute 206 Unknown_SSD_Attribute 210 Unknown_Attribute 246 Unknown_Attribute 247 Unknown_Attribute 248 Unknown_Attribute Data Structure revision number: 16
SMART Attributes with Thresholds:
FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
0x002f 100 100 000 Pre-fail Always - 21
Reallocated_Sector_Ct 0x0033 100 100 000 Pre-fail Always - 0
0x0032 100 100 000 Old_age Always - 287
0x0032 100 100 000 Old_age Always - 32
0x0032 100 100 000 Old_age Always - 0
0x0032 100 100 000 Old_age Always - 0
0x0032 100 100 000 Old_age Always - 1
0x0032 100 100 000 Old_age Always - 13
/>Rsvd_Blk_Cnt_Tot 0x0033 000 000 000 Pre-fail Always - 16523
0x0032 100 100 000 Old_age Always - 0
0x0032 100 100 000 Old_age Always - 0
0x0032 100 100 000 Old_age Always - 0
0x0022 056 050 000 Old_age Always - 44 (Min/Max 22/50)
/>Event_Count 0x0032 100 100 000 Old_age Always - 16
/>Pending_Sector 0x0032 100 100 000 Old_age Always - 0
0x0030 100 100 000 Old_age Offline - 0
0x0032 100 100 000 Old_age Always - 0
0x0031 100 100 000 Pre-fail Offline - 0
0x000e 100 100 000 Old_age Always - 0
0x0032 100 100 000 Old_age Always - 4
0x0032 100 100 --- Old_age Always - 2001841864
0x0032 100 100 --- Old_age Always - 63823477
0x0032 100 100 --- Old_age Always - 84291328

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Vendor (0xff) Completed without error 00% 279 -
# 2 Extended offline Completed without error 00% 273 -
# 3 Vendor (0xff) Completed without error 00% 90 -
# 4 Vendor (0xff) Completed without error 00% 5 -
# 5 Short offline Aborted by host 00% 2 -
# 6 Vendor (0xff) Completed without error 00% 2 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
    1 0 0 Not_testing
    2 0 0 Not_testing
    3 0 0 Not_testing
    4 0 0 Not_testing
    5 0 0 Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

[ritesh@x230t libreoffice-4.1.2~rc3]$ uname -a
Linux x230t 3.12.0-7-generic #15-Ubuntu SMP Sun Dec 8 23:39:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ mount
/dev/mapper/ubuntu--vg-root on / type ext4 (rw,noatime,errors=remount-ro,discard)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/cgroup type tmpfs (rw)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /sys/firmware/efi/efivars type efivarfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /tmp type tmpfs (rw,noatime,mode=1777)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
none on /sys/fs/pstore type pstore (rw)
tmpfs on /var/tmp type tmpfs (rw,noatime,mode=1777)
tmpfs on /var/log type tmpfs (rw,noatime,mode=0755)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,relatime,freezer)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,relatime,perf_event)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,relatime,hugetlb)
/dev/sda2 on /boot type ext2 (rw)
/dev/sda1 on /boot/efi type vfat (rw)
systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd)
gvfsd-fuse on /run/user/1000/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,user=ritesh)

fsck/badblock (read only) says disk is clean.
---
ApportVersion: 2.12.7-0ubuntu1
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/controlC1: ritesh 2141 F.... pulseaudio
/dev/snd/controlC0: ritesh 2141 F.... pulseaudio
CRDA:
country IN:
  (2402 - 2482 @ 40), (N/A, 20)
  (5170 - 5250 @ 40), (N/A, 20)
  (5250 - 5330 @ 40), (N/A, 20), DFS
  (5735 - 5835 @ 40), (N/A, 20)
CurrentDesktop: Unity
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=UUID=f9f2cec2-98f6-4f05-850a-9f60676c3299
MachineType: ASUSTeK Computer Inc. K43SA
MarkForUpload: True
Package: linux (not installed)
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.12.0-7-generic root=/dev/mapper/ubuntu--vg-root ro radeon.dpm=1 quiet splash
ProcVersionSignature: Ubuntu 3.12.0-7.15-generic 3.12.4
RelatedPackageVersions:
linux-restricted-modules-3.12.0-7-generic N/A
linux-backports-modules-3.12.0-7-generic N/A
linux-firmware 1.117
StagingDrivers: rts5139
Tags: trusty staging
Uname: Linux 3.12.0-7-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip kvm libvirtd lpadmin plugdev sambashare sudo
WifiSyslog:

dmi.bios.date: 11/17/2011
dmi.bios.vendor: American Megatrends Inc.
dmi.bios.version: K43SA.211
dmi.board.asset.tag: ATN12345678901234567
dmi.board.name: K43SA
dmi.board.vendor: ASUSTeK Computer Inc.
dmi.board.version: 1.0
dmi.chassis.asset.tag: No Asset Tag
dmi.chassis.type: 10
dmi.chassis.vendor: ASUSTeK Computer Inc.
dmi.chassis.version: 1.0
dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvrK43SA.211:bd11/17/2011:svnASUSTeKComputerInc.:pnK43SA:pvr1.0:rvnASUSTeKComputerInc.:rnK43SA:rvr1.0:cvnASUSTeKComputerInc.:ct10:cvr1.0:
dmi.product.name: K43SA
dmi.product.version: 1.0
dmi.sys.vendor: ASUSTeK Computer Inc.
---
ApportVersion: 2.12.7-0ubuntu3
Architecture: amd64
CurrentDesktop: Unity
DistroRelease: Ubuntu 14.04
InstallationDate: Installed on 2013-12-19 (1 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20131021.1)
MarkForUpload: True
Package: linux (not installed)
Tags: trusty
Uname: Linux 3.13.0-999-generic x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo

See original description

Tags:

Revision history for this message

Brad Figg (brad-figg) wrote on 2013-12-11: Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1259829

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status:	New → Incomplete
tags:	added: trusty

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: AlsaInfo.txt

AlsaInfo.txt Edit (29.4 KiB, text/plain)

apport information

tags:	added: apport-collected staging
description:	updated

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: BootDmesg.txt

BootDmesg.txt Edit (93.3 KiB, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: CurrentDmesg.txt

CurrentDmesg.txt Edit (21.5 KiB, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: IwConfig.txt

IwConfig.txt Edit (545 bytes, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: Lspci.txt

Lspci.txt Edit (12.0 KiB, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: Lsusb.txt

Lsusb.txt Edit (644 bytes, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: ProcCpuinfo.txt

ProcCpuinfo.txt Edit (7.2 KiB, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: ProcEnviron.txt

ProcEnviron.txt Edit (325 bytes, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: ProcInterrupts.txt

#10

ProcInterrupts.txt Edit (4.8 KiB, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: ProcModules.txt

#11

ProcModules.txt Edit (5.9 KiB, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: PulseList.txt

#12

PulseList.txt Edit (22.7 KiB, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: RfKill.txt

#13

RfKill.txt Edit (240 bytes, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: UdevDb.txt

#14

UdevDb.txt Edit (148.3 KiB, text/plain)

apport information

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11:

#15

disabling "discard" option fixed this for me.

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-11: UdevLog.txt

#16

UdevLog.txt Edit (346.8 KiB, text/plain)

apport information

Ritesh Khadgaray (khadgaray) on 2013-12-11

Changed in linux (Ubuntu):
status:	Incomplete → New

Revision history for this message

Brad Figg (brad-figg) wrote on 2013-12-11: Status changed to Confirmed

#17

This change was made by a bot.

Changed in linux (Ubuntu):
status:	New → Confirmed

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2013-12-11:

#18

Did this issue occur in a previous version of Ubuntu, or is this a new issue?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.12 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.13-rc3-trusty/

Changed in linux (Ubuntu):
importance:	Undecided → Medium
status:	Confirmed → Incomplete

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-12:

#19

> Did this issue occur in a previous version of Ubuntu, or is this a new issue?

This is a new disk, with the latest firmware ( purchased on 25th Nov). This issue was first seen when I tried to build libreoffice ( the build goes from 256mb to 32gb , and goes down to 18gb) .

Worked fine when building mozilla/firefox, and other packages.

> If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.
I have tested this with mainline kernel (3.13 rc3), and 3.12 from trusty. The issue is reproducible on both with discard option enabled.

tags:	added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Joseph Salisbury (jsalisbury) wrote on 2013-12-12:

#20

Can you also test the 3.2 final kernel to see if this is a regression or not:

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.2-precise/

penalvch (penalvch) on 2013-12-17

tags:	added: latest-bios-211 regression-potential
Changed in linux (Ubuntu):
status:	Confirmed → Incomplete

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-18:

#21

Unable to reproduce this issue anymore. Another odd thing which I noticed "fstrim -v /" , now returns "0 bytes" trimmed on multiple run. This was not the case earlier.

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-19:

#23

The system did not reboot; after enabling discard and running fstrim/discard; due to corrupted partition table. I was unable to test this any further.

The disk seems to be fine,based on badblock check. memtest86 seems to show a clean system.

Ritesh Khadgaray (khadgaray) on 2013-12-20

Changed in linux (Ubuntu):
status:	Incomplete → New

Revision history for this message

Brad Figg (brad-figg) wrote on 2013-12-20:

#24

This change was made by a bot.

Changed in linux (Ubuntu):
status:	New → Confirmed

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-21: ProcEnviron.txt

#25

ProcEnviron.txt Edit (100 bytes, text/plain)

apport information

description:

updated

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-21:

#26

Saw something interesting with 3.13 daily build kernel
Linux K43SA 3.13.0-999-generic #201312200414 SMP Fri Dec 20 09:16:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[ 17.017210] psmouse serio4: elantech: assuming hardware version 3 (with firmware version 0x450f01)
[ 17.029210] psmouse serio4: elantech: Synaptics capabilities query result 0x78, 0x15, 0x0c.
[ 17.088588] input: ETPS/2 Elantech Touchpad as /devices/platform/i8042/serio4/input/input12
[ 18.284033] ata1: log page 10h reported inactive tag 0
[ 18.284039] ata1.00: exception Emask 0x1 SAct 0x30 SErr 0x0 action 0x0
[ 18.284040] ata1.00: irq_stat 0x40000008
[ 18.284042] ata1.00: failed command: READ FPDMA QUEUED
[ 18.284046] ata1.00: cmd 60/08:20:00:00:db/00:00:0b:00:00/40 tag 4 ncq 4096 in
[ 18.284046] res 40/00:2c:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[ 18.284048] ata1.00: status: { DRDY }
[ 18.284050] ata1.00: failed command: SEND FPDMA QUEUED
[ 18.284053] ata1.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq 512 out
[ 18.284053] res 40/00:2c:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[ 18.284054] ata1.00: status: { DRDY }
[ 18.284292] ata1.00: supports DRM functions and may not be fully accessible
[ 18.290634] ata1.00: supports DRM functions and may not be fully accessible
[ 18.296726] ata1.00: configured for UDMA/133
[ 18.296729] ata1.00: device reported invalid CHS sector 0
[ 18.296735] sd 0:0:0:0: [sda]
[ 18.296737] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 18.296738] sd 0:0:0:0: [sda]
[ 18.296739] Sense Key : Aborted Command [current] [descriptor]
[ 18.296742] Descriptor sense data with sense descriptors (in hex):
[ 18.296743] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[ 18.296749] 00 00 00 00
[ 18.296751] sd 0:0:0:0: [sda]
[ 18.296753] Add. Sense: No additional sense information
[ 18.296762] sd 0:0:0:0: [sda] CDB:
[ 18.296763] Write same(16): 93 08 00 00 00 00 00 1b 36 00 00 00 00 b0 00 00
[ 18.296770] end_request: I/O error, dev sda, sector 1783296
[ 18.296777] EXT4-fs (dm-1): discard request in group:1 block:1984 count:22 failed with -5
[ 18.296778] ata1: EH complete
[ 18.667971] wlan0: authenticate with 00:1b:57:ba:8a:a5

Saw something interesting with 3.13 daily build kernel
Linux K43SA 3.13.0-999-generic #201312200414 SMP Fri Dec 20 09:16:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

[   17.017210] psmouse serio4: elantech: assuming hardware version 3 (with firmware version 0x450f01)
[   17.029210] psmouse serio4: elantech: Synaptics capabilities query result 0x78, 0x15, 0x0c.
[   17.088588] input: ETPS/2 Elantech Touchpad as /devices/platform/i8042/serio4/input/input12
[   18.284033] ata1: log page 10h reported inactive tag 0
[   18.284039] ata1.00: exception Emask 0x1 SAct 0x30 SErr 0x0 action 0x0
[   18.284040] ata1.00: irq_stat 0x40000008
[   18.284042] ata1.00: failed command: READ FPDMA QUEUED
[   18.284046] ata1.00: cmd 60/08:20:00:00:db/00:00:0b:00:00/40 tag 4 ncq 4096 in
[   18.284046]          res 40/00:2c:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[   18.284048] ata1.00: status: { DRDY }
[   18.284050] ata1.00: failed command: SEND FPDMA QUEUED
[   18.284053] ata1.00: cmd 64/01:28:00:00:00/00:00:00:00:00/a0 tag 5 ncq 512 out
[   18.284053]          res 40/00:2c:00:00:00/00:00:00:00:00/a0 Emask 0x1 (device error)
[   18.284054] ata1.00: status: { DRDY }
[   18.284292] ata1.00: supports DRM functions and may not be fully accessible
[   18.290634] ata1.00: supports DRM functions and may not be fully accessible
[   18.296726] ata1.00: configured for UDMA/133
[   18.296729] ata1.00: device reported invalid CHS sector 0
[   18.296735] sd 0:0:0:0: [sda]  
[   18.296737] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[   18.296738] sd 0:0:0:0: [sda]  
[   18.296739] Sense Key : Aborted Command [current] [descriptor]
[   18.296742] Descriptor sense data with sense descriptors (in hex):
[   18.296743]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
[   18.296749]         00 00 00 00 
[   18.296751] sd 0:0:0:0: [sda]  
[   18.296753] Add. Sense: No additional sense information
[   18.296762] sd 0:0:0:0: [sda] CDB: 
[   18.296763] Write same(16): 93 08 00 00 00 00 00 1b 36 00 00 00 00 b0 00 00
[   18.296770] end_request: I/O error, dev sda, sector 1783296
[   18.296777] EXT4-fs (dm-1): discard request in group:1 block:1984 count:22 failed with -5
[   18.296778] ata1: EH complete
[   18.667971] wlan0: authenticate with 00:1b:57:ba:8a:a5

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-21:

#27

I dont see these error message with 3.12 kernel, or if I boot with ncq disabled.

$ echo 1 > /sys/block/sda/device/queue_depth

Additionally, fstrim seems to return the same upon reboot

first run
# fstrim -v /
/: 920118112256 bytes were trimmed

second run ( after reboot )
/: 920175951872 bytes were trimmed

third run (after reboot)
fstrim -v /boot/
/boot/: 75911168 bytes were trimmed
root@K43SA:/home/ritesh# fstrim -v /
/: 920183271424 bytes were trimmed

fstrim does claim 0 bytes to trim, after each subsequent run

__self:
http://askubuntu.com/questions/133946/are-these-sata-errors-dangerous/133960#133960
https://bbs.archlinux.org/viewtopic.php?id=147189
http://www.itechlounge.net/2013/07/linux-ata-failed-command-read-fpdma-queued/
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1094446

penalvch (penalvch) on 2013-12-21

tags:

added: kernel-bug-exists-upstream-v3.13-rc3
removed: kernel-bug-exists-upstream

penalvch (penalvch) on 2013-12-21

description:

updated

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-22:

#28

dmesg Edit (101.6 KiB, text/plain)

[ 0.000000] Linux version 3.13.0-999-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201312200414 SMP Fri Dec 20 09:16:44 UTC 2013

discard enabled., I am able to reproduce this. Copy files over twioce ( and deleting the first copy) .

Revision history for this message

penalvch (penalvch) wrote on 2013-12-22:

#29

Ritesh Khadgaray, thank you for your testing. I would avoid using the mainline daily folder as this would be a downstream construct, and upstream may not be terribly interested in it. Given you tested v3.13-rc3, this would be fine.

Despite this, would you mind testing a 3.2.x Ubuntu kernel series for regression purposes following https://wiki.ubuntu.com/Kernel/KernelBisection#Bisecting_Ubuntu_kernel_versions ?

Changed in linux (Ubuntu):
status:	Confirmed → Incomplete

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-24:

#30

Download full text (5.3 KiB)

[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.2.53-030253-generic (apw@gomeisa) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1
ubuntu5) ) #201311281435 SMP Thu Nov 28 19:36:21 UTC 2013
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.2.53-030253-generic root=/dev/mapper/ubuntu--vg-root ro
 recovery nomodeset
[    0.000000] KERNEL supported cpus:
...
[    1.175970] SCSI subsystem initialized
[    1.176000] libata version 3.00 loaded.
[    1.176033] usbcore: registered new interface driver usbfs
[    1.176043] usbcore: registered new interface driver hub
[    1.176061] usbcore: registered new device driver usb
[    2.047965] ahci 0000:00:1f.2: version 3.0
[    2.047976] ahci 0000:00:1f.2: PCI INT B -> GSI 19 (level, low) -> IRQ 19
[    2.048835] ahci 0000:00:1f.2: irq 45 for MSI/MSI-X
[    2.062968] ahci 0000:00:1f.2: AHCI 0001.0300 32 slots 6 ports 6 Gbps 0x5 impl SATA mode
[    2.063847] ahci 0000:00:1f.2: flags: 64bit ncq sntf pm led clo pio slum part ems apst 
[    2.064665] ahci 0000:00:1f.2: setting latency timer to 64
[    2.071412] scsi0 : ahci
[    2.072281] scsi1 : ahci
[    2.073122] scsi2 : ahci
[    2.073944] scsi3 : ahci
[    2.074745] scsi4 : ahci
[    2.075535] scsi5 : ahci
[    2.076425] ata1: SATA max UDMA/133 abar m2048@0xdff06000 port 0xdff06100 irq 45
[    2.077157] ata2: DUMMY
[    2.077871] ata3: SATA max UDMA/133 abar m2048@0xdff06000 port 0xdff06200 irq 45
[    2.078607] ata4: DUMMY
[    2.079339] ata5: DUMMY
[    2.080055] ata6: DUMMY
[    2.081032] Fixed MDIO Bus: probed
[    2.081743] tun: Universal TUN/TAP device driver, 1.6
[    2.082431] tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
[    2.083154] PPP generic driver version 2.4.2
...
[    2.271428] rtc_cmos 00:06: setting system clock to 2013-12-24 15:50:13 UTC (1387900213)
[    2.273757] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    2.274368] EDD information not available.
[    2.398657] ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[    2.399336] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    2.400529] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[    2.401154] ata1.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
[    2.401206] ata1.00: ACPI cmd ef/90:03:00:00:00:a0 (SET FEATURES) succeeded
[    2.401527] ata1.00: supports DRM functions and may not be fully accessible
[    2.402229] ata1.00: ATA-9: Crucial_CT960M500SSD1, MU03, max UDMA/133
[    2.402871] ata1.00: 1875385008 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
[    2.404074] ata1.00: ACPI cmd f5/00:00:00:00:00:a0 (SECURITY FREEZE LOCK) filtered out
[    2.404721] ata1.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
[    2.404772] ata1.00: ACPI cmd ef/90:03:00:00:00:a0 (SET FEATURES) succeeded
[    2.405012] ata1.00: supports DRM functions and may not be fully accessible
[    2.405767] ata1.00: configured for UDMA/133
[    2.406557] scsi 0:0:0:0: Direct-Access     ATA      Crucial_CT960M50 MU03 PQ: 0 ANSI: 5
[    2.407392] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    2.407427] sd 0:0:0:0: [sda] 1875385008 512-byte logical blocks: (960 GB/894 GiB)
[    2.407437] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    2.407784] sd 0:0:0:0: [sda] Write Protect is off
[    2.407795] sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    2.407904] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
[    2.409888]  sda: sda1 sda2 sda3
[    2.411297] ata3.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
[    2.412274] ata3.00: ACPI cmd ef/90:03:00:00:00:a0 (SET FEATURES) succeeded
[    2.412451] sd 0:0:0:0: [sda] Attached SCSI disk
[    2.415232] ata3.00: ATAPI: HL-DT-ST BDDVDRW CT30F, YT04, max UDMA/133
[    2.420305] ata3.00: ACPI cmd ef/10:06:00:00:00:a0 (SET FEATURES) succeeded
[    2.421301] ata3.00: ACPI cmd ef/90:03:00:00:00:a0 (SET FEATURES) succeeded
[    2.422606] usb 1-1: new high-speed USB device number 2 using ehci_hcd
[    2.424203] ata3.00: configured for UDMA/133
[    2.427691] scsi 2:0:0:0: CD-ROM            HL-DT-ST BDDVDRW CT30F    YT04 PQ: 0 ANSI: 5
[    2.432803] sr0: scsi3-mmc drive: 24x/24x writer dvd-ram cd/rw xa/form2 cdda tray
[    2.433555] cdrom: Uniform CD-ROM driver Revision: 3.20
[    2.434370] sr 2:0:0:0: Attached scsi CD-ROM sr0
[    2.434438] sr 2:0:0:0: Attached scsi generic sg1 type 5
[    2.436374] Freeing unused kernel memory: 924k freed
...
[ 6584.458441] EXT4-fs error (device dm-1): htree_dirblock_to_tree:587: inode #12849936: block 51389249: comm gvfsd-trash: bad entry in directory: directory entry across blocks - offset=0(12288), inode=2021400416, rec_len=26628, name_len=28
[ 6584.458450] Aborting journal on device dm-1-8.
[ 6584.458454] EXT4-fs (dm-1): ext4_da_writepages: jbd2_start: 1023 pages, ino 32779068; err -30
[ 6584.458902] EXT4-fs (dm-1): Remounting filesystem read-only
[ 6608.989429] show_signal_msg: 54 callbacks suppressed
[ 6608.989434] unity-scope-loa[6242]: segfault at 0 ip 00007f8bccc29cf0 sp 00007fff5da61ea8 error 4 in libc-2.18.so[7f8bccbb8000+1bc000]
[ 6611.000707] unity-scope-loa[6248]: segfault at 0 ip 00007ffdc7c58cf0 sp 00007fffe5bf8418 error 4 in libc-2.18.so[7ffdc7be7000+1bc000]

disabling discard fixes this for me on 3.2 and above kernel.  "rm" seems to trigger this bug . I have not yet been able to trigger this bug with discard disabled.

Changed in linux (Ubuntu):
status:	Incomplete → Confirmed

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-24:

#31

fs layout

/boot -> sda2, ext4 with discard
/boot/efi -> sda1, vfat
/ -> sda3_crypt ( luks with lvm w/ discard enabled )

I enable discard as described using - http://askubuntu.com/a/122206/125818 .

penalvch (penalvch) on 2013-12-24

description:

updated

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2013-12-24:

#32

smartctl logs Edit (11.5 KiB, text/plain)

The run, after the crash is seen

$ fstrim -v /
/: 601705070592 bytes were trimmed

Additionally, it seems trim is not required. I could be wrong though.
http://forum.crucial.com/t5/Solid-State-Drives-SSD/M500-on-OS-X-Let-SSD-sort-itself-out-or-enable-TRIM/td-p/127854q

description:	updated
description:	updated

Revision history for this message

penalvch (penalvch) wrote on 2013-12-24:

#33

Ritesh Khadgaray, thank you for your testing results. Given you tested this in a recent mainline kernel (although not the most recent), and you have a SSD (relatively newer HW from kernel perspective), I'm going to mark this Traiged for now. Hence, the issue you are reporting is an upstream one. Could you please report this problem through the appropriate channel (I would check in with linux-ext4 and CC "Theodore Ts'o" <email address hidden> and Andreas Dilger <email address hidden>) by following the instructions _verbatim_ at https://wiki.ubuntu.com/Bugs/Upstream/kernel#KernelTeam.2BAC8-KernelTeamBugPolicies.Overview_on_Reporting_Bugs_Upstream ?

Please provide a direct URL to your post once you have made it so that it may be tracked.

Thank you for your understanding.

Changed in linux (Ubuntu):
importance:	Medium → Low
status:	Confirmed → Triaged
tags:	removed: regression-potential

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2014-01-02:

#34

Posted this to linux-ext4 list, on my second attempt. They reject multipart ( containing html) messages.

http://thread.gmane.org/gmane.comp.file-systems.ext4/41969 .

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2014-01-06:

#35

ubuntu should enabled discard by default.

-------------------------------------------
http://article.gmane.org/gmane.comp.file-systems.ext4/41974

From Theodore Ts'o via thunk.org

This is a hardware bug, unfortunately. And it's also the reason why
discard is not on by default.

These days, what I normally tell people is to not use the discard
mount option at all, and instead use the fstrim program, run out of
cron maybe once a week or even every night if you are anal. (But for
most workloads, once a week is plenty.) The main place where the
discard option makes sense is if you are using a very expensive PCIe
attached flash device. Those devices are much more likely to have a
competently implemented DISCARD command, and they generally don't
destroy performance forcing a queue flush for every single DISCARD
request.

However, in your case, if discard commands are causing on-disk
corruption, I'm not sure I can even in good conscience recommend using
fstrim.

> Device Model: Crucial_CT960M500SSD1
> Serial Number: 1335094BE7CA
> LU WWN Device Id: 5 00a075 1094be7ca
> Firmware Version: MU03

Instead, all I can do is suggest that you consider whether you should
replace your SSD. Historically, I've stuck with Intel SSD's because
they are the ones that have tended to be the most reliable. Intel has
unfortunately, been slow to market because they insist on testing
their products extensively and only releasing them when they are
solid, which has cost them market share. Unfortuantely, the market
doesn't always reward quality. More recently, I've started using
Samsung SSD's. I have a Samsung 840 PRO and the Intel 525 240GB mSATA
SSD's in my laptop, and so far, I've not had any problems with either.
They are definitely not the cheapest nor the most performant devices
in head-to-head testing, but that's not the only dimension that I care
about....

More (somewhat depressing) investigations about the quality of SSD's
these days:

https://plus.google.com/+MarcMERLIN/posts/Us8yjK9SPs6
http://lkcl.net/reports/ssd_analysis.html
https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault

- Ted

P.S. Some really crappy SSD devices have brick'ed themselves when
they are given a heavy discard load, particularly one which is mixed
with other traffic, and this is what the "discard" mount option
provides. Note that if the fstrim command is executed while you are
also trying to put the device under heavy read/write workloads, it
could also result in the same kind of corruption and/or brick'ing of
the SSD. Which is why I hesitate to recommend switching to fstrim for
a device which is known to mishandle the DISCARD command, and to
suggest simply not using the DISCARD feature at all --- and if this
results in increased performance lost or increased write wear, to just
replace the SSD as an inferior quality product before it does any
further damage to your data

ubuntu should enabled discard by default.

-------------------------------------------
http://article.gmane.org/gmane.comp.file-systems.ext4/41974

From Theodore Ts'o via thunk.org

This is a hardware bug, unfortunately.  And it's also the reason why
discard is not on by default.

These days, what I normally tell people is to not use the discard
mount option at all, and instead use the fstrim program, run out of
cron maybe once a week or even every night if you are anal.  (But for
most workloads, once a week is plenty.)  The main place where the
discard option makes sense is if you are using a very expensive PCIe
attached flash device.  Those devices are much more likely to have a
competently implemented DISCARD command, and they generally don't
destroy performance forcing a queue flush for every single DISCARD
request.

However, in your case, if discard commands are causing on-disk
corruption, I'm not sure I can even in good conscience recommend using
fstrim.

> Device Model: Crucial_CT960M500SSD1
> Serial Number: 1335094BE7CA
> LU WWN Device Id: 5 00a075 1094be7ca
> Firmware Version: MU03

Instead, all I can do is suggest that you consider whether you should
replace your SSD.  Historically, I've stuck with Intel SSD's because
they are the ones that have tended to be the most reliable.  Intel has
unfortunately, been slow to market because they insist on testing
their products extensively and only releasing them when they are
solid, which has cost them market share.  Unfortuantely, the market
doesn't always reward quality.  More recently, I've started using
Samsung SSD's.  I have a Samsung 840 PRO and the Intel 525 240GB mSATA
SSD's in my laptop, and so far, I've not had any problems with either.
They are definitely not the cheapest nor the most performant devices
in head-to-head testing, but that's not the only dimension that I care
about....

More (somewhat depressing) investigations about the quality of SSD's
these days:

https://plus.google.com/+MarcMERLIN/posts/Us8yjK9SPs6
http://lkcl.net/reports/ssd_analysis.html
https://www.usenix.org/conference/fast13/understanding-robustness-ssds-under-power-fault

- Ted

P.S.  Some really crappy SSD devices have brick'ed themselves when
they are given a heavy discard load, particularly one which is mixed
with other traffic, and this is what the "discard" mount option
provides.  Note that if the fstrim command is executed while you are
also trying to put the device under heavy read/write workloads, it
could also result in the same kind of corruption and/or brick'ing of
the SSD.  Which is why I hesitate to recommend switching to fstrim for
a device which is known to mishandle the DISCARD command, and to
suggest simply not using the DISCARD feature at all --- and if this
results in increased performance lost or increased write wear, to just
replace the SSD as an inferior quality product before it does any
further damage to your data

Revision history for this message

Martin Pitt (pitti) wrote on 2014-01-22:

#36

For the record, due to this bug we limited automatic fstrim to Samsung and Intel drives for now: https://launchpad.net/ubuntu/+source/util-linux/2.20.1-5.1ubuntu14

I can't say I like that much as it excludes a lot of other "good" SSDs from being trimmed by default, but better safe than sorry. We can extend the list of known-good models/brands, or get more evidence that this problem affects so few devices that we should rather use a blacklist approach, or we find that this SSD model is broken anyway and ignore the bug (for example, it's likely that this also throws up if more than application is doing large I/O on it at the same time?)

Revision history for this message

Dave Chiluk (chiluk) wrote on 2014-01-23:

#37

@pitti As it appears that we are going for a whitelist.

I have
Device Model: Patriot Wildfire
Serial Number: PT1145A00024835
LU WWN Device Id: 0 000120 000000000
Firmware Version: 502ABBF0
User Capacity: 120,034,123,776 bytes [120 GB]

Device Model: OCZ-VERTEX3 MI
Serial Number: OCZ-D38A6W57990V6TG5
LU WWN Device Id: 5 e83a97 53ae3311c
Firmware Version: 2.25
User Capacity: 240,057,409,536 bytes [240 GB]

Model Family: Indilinx Barefoot_2/Everest/Martini based SSDs
Device Model: OCZ-VERTEX4
Serial Number: OCZ-RVC8K85817A17KIN
LU WWN Device Id: 5 e83a97 fcc5107e1
Firmware Version: 1.5.1
User Capacity: 256,060,514,304 bytes [256 GB]'

Device Model: SanDisk SSD i100 24GB
Serial Number: 122800133466
LU WWN Device Id: 5 001b44 7a30a515a
Firmware Version: 11.50.02
User Capacity: 24,015,495,168 bytes [24.0 GB]

So far none of the above have shown this problem, and I've been running with discard on for at least a year on all the above drives.

@cking
@pitti
If you guys want me to run any real tests on these devices let me know.

Revision history for this message

Dave Chiluk (chiluk) wrote on 2014-01-23:

#38

Also it just occurred to me that white-listing by manufacturer may still not catch all problematic drives, as most third party manufacturers that source their controllers have drives with controllers from other suppliers as well.

For what it's worth the crucial drive mentioned above is using a Marvell SS9187. The only other manufacturer that has released a drive with that controller is plextor (at least that's what google tells me).

It's a fairly safe assumption that if it's broken for one oem it'll be broken for another.

@pitti I don't envy you.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-01-27:

#39

If we have evidence that Ritesh's SSD is the only (or one of very few) affected models, I'm also not opposed to just blacklisting that. But we really can't know. At least it's now fairly easy to enable in /etc/cron.weekly/fstrim on all models, but of course that whole TRIM business is still way too incomprehensible for the average user.

Ritesh, Dave, would you mind giving me the output of "sudo hdparm -I /dev/sda | head -n 10", so that I see the "Model number" as hdparm sees it? I'm not sure how you got the "Device Model:" strings, but they don't seem to be the same as hdparm's. Thanks!

Revision history for this message

Ritesh Khadgaray (khadgaray) wrote on 2014-01-27:

#40

ATA device, with non-removable media
Model Number: Crucial_CT960M500SSD1
Serial Number: 1335094BE7CA
Firmware Revision: MU03
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0
Standards:
Used: unknown (minor revision code 0x0028)
Supported: 9 8 7 6 5
Likely used: 9
Configuration:
Logical max current
cylinders 16383 16383
heads 16 16
sectors/track 63 63
--
CHS current addressable sectors: 16514064
LBA user addressable sectors: 268435455
LBA48 user addressable sectors: 1875385008
Logical Sector size: 512 bytes
Physical Sector size: 4096 bytes
Logical Sector-0 offset: 0 bytes
device size with M = 1024*1024: 915715 MBytes
device size with M = 1000*1000: 960197 MBytes (960 GB)
cache/buffer size = unknown
Form Factor: 2.5 inch
Nominal Media Rotation Rate: Solid State Device
Capabilities:
LBA, IORDY(can be disabled)
Queue depth: 32
Standby timer values: spec'd by Standard, with device specific minimum
R/W multiple sector transfer: Max = 16 Current = 16
Advanced power management level: 254
DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
Cycle time: min=120ns recommended=120ns
PIO: pio0 pio1 pio2 pio3 pio4
Cycle time: no flow control=120ns IORDY flow control=120ns

Revision history for this message

Dave Chiluk (chiluk) wrote on 2014-01-28:

#41

We were using smartctl -a /dev/sd<blah> from smartmontools.

ATA device, with non-removable media
Model Number: OCZ-VERTEX3 MI
Serial Number: OCZ-D38A6W57990V6TG5
Firmware Revision: 2.25
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

ATA device, with non-removable media
Model Number: Patriot Wildfire
Serial Number: PT1145A00024835
Firmware Revision: 502ABBF0
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

ATA device, with non-removable media
Model Number: OCZ-VERTEX4
Serial Number: OCZ-RVC8K85817A17KIN
Firmware Revision: 1.5.1
Transport: Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6, SATA Rev 3.0

Also rtg, running a Samsung 840 evo using fstrim may also be experiencing a corruption bug, but he doesn't get any notice in his logs. We'll keep you posted on him once he figures out what is going on. His could just be related to running crazy kernels.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-03-10:

#42

Dave, thanks; so smartctl and hdparm indeed show the same model number, just wanted to confirm. I whitelisted these in http://bazaar.launchpad.net/~ubuntu-branches/ubuntu/trusty/util-linux/trusty/revision/103

Revision history for this message

Fabien Lusseau (fabien-beosfrance) wrote on 2014-04-02:

#43

What kind of heavy simple test can I run on my Kingston and Kingspec SSDs to know if they also are affected by this kind of problems ?

A simple testing method should help build the whitelist/blacklist.

I have access to several SSDs from a lot of manufacturers, the only ones that are not already whitelisted are Kingston and Kingspec, I have some of them of various models, so I may be able to help.

I'm already using them with discard enabled, but I'm not usually doing anything crazy on them. If I should compile LibreOffice or do some kind of stress testing on them, I will if you ask me.

Revision history for this message

Martin Pitt (pitti) wrote on 2014-04-03:

#44

A simple smoke test might be to run something like

  sudo dd if=/dev/sda of=/dev/zero &
  sudo find / >/dev/null 2>&1 &
  sudo grep -r . / >/dev/null &

and run fstrim-all in parallel. You can of course do the original thing of setting up a schroot and sbuild libreoffice in that (or just "sudo apt-get build-dep libreoffice; apt-get download -b libreoffice" on your workstation if you don't mind it getting cluttered with lots and lots of build depends).

Revision history for this message

Riccardo 'c10ud' (c10ud) wrote on 2014-04-16:

#45

Hello, yesterday I installed trusty on my Kingston 60GB SSD (Model=KINGSTON SV300S37A60G, FwRev=525ABBF0)

After running the simple test with no error I added --no-model-check to /etc/cron.weekly/fstrim...so far so good

Revision history for this message

robi (robreg) wrote on 2014-04-18:

#46

KINGSTON SV200S364G with fw E111008a seems OK too

Revision history for this message

robi (robreg) wrote on 2014-04-18:

#47

Also KINGSTON SVP200S37A60G with fw 502ABBF0

Revision history for this message

Fabien Lusseau (fabien-beosfrance) wrote on 2014-04-18:

#48

I tested on a KINGSTON v300 and a v100 and everything works fine.

I used the discard mount option as it seems to be the most efficient to detect this misbehavior.
And then I started:

sudo apt-get build-dep libreoffice; apt-get source -b libreoffice

It worked fine, I also tried with periodic fstrim -v /
No problem either.

I'm now starting to do the same test on a Kingspec C3000.6 and several other models from the same manufacturer.

Revision history for this message

Todor Andreev (toshko3) wrote on 2014-05-21:

#49

I could test my KINGSTON SV300S37A120G with firmware 521ABBF0.
I used Martin's test:

  sudo dd if=/dev/sda of=/dev/zero &
  sudo find / >/dev/null 2>&1 &
  sudo grep -r . / >/dev/null &

and manual "fstrim -v /" several times.
I did this for about 5 minutes.
No problems and "fsck.ext4 /dev/sda" said, CLEAN.
If Ritesh Khadgaray could report if this problem occurs with this test, we will be of use. Otherwise, discard my test.

Revision history for this message

Joshua (njj) wrote on 2014-05-26:

#50

Has anyone already tried to report this issue to Crucial?

Revision history for this message

Joshua (njj) wrote on 2014-05-26:

#51

@Ritesh, there is a newer firmware [1] available for your device, could you test if this fixes the issue?

[1] http://forum.crucial.com/t5/Solid-State-Drives-SSD/Feedback-Thread-Firmware-MU05-for-the-M500/td-p/146872

Revision history for this message

Joshua (njj) wrote on 2014-05-27:

#52

@Offtopic: Isn't it possible to edit previous posts? Because it feels like an monologue to post news comments all the time.

However I opened a thread on the Crucial forum [1] and a nice guy referred me to another thread [2] which tells that the bug is fixed with firmware MU05. And there a two link to kernel mailinglist posts which are blacklisting all Crucial SSDs for queued trim [3] and reverting this for MU05 [4]. Is this the same "whiteliste" the Ubuntu people are talking about? When I want to test trim with newest firmware MU05 is it enough to add "--no-model-check" or do I also have to use a special kernel?

[1] http://forum.crucial.com/t5/Solid-State-Drives-SSD/M500-on-Linux-with-fstrim-or-discard-under-heavy-load-could/td-p/151522
[2] http://forum.crucial.com/t5/Solid-State-Drives-SSD/M500-M5x0-QUEUED-TRIM-data-corruption-alert-mostly-for-Linux/td-p/151028
[3] http://comments.gmane.org/gmane.linux.ide/56084
[4] http://www.spinics.net/lists/linux-ide/msg48361.html

Martin Spacek (mspacek) on 2014-06-30

summary:	- htree_dirblock_to_tree:920: inode #53629599: block 214443464: comm rm: - bad entry in directory: rec_len % 4 != 0 - offset=0(0), - inode=1667681412, rec_len=45654, name_len=39 + fstrim corrpution on some SSDs
summary:	- fstrim corrpution on some SSDs + fstrim corrupution on some SSDs

Revision history for this message

Martin Spacek (mspacek) wrote on 2014-06-30: Re: fstrim corrupution on some SSDs

#53

Download full text (4.2 KiB)

I've installed Xubuntu 14.04 on a brand new Crucial MX100 SSD (CT512MX100SSD1, original MU01 firmware, no newer firmware available) on a Thinkpad W510, and saw the note in the cron.weekly/fstrim file. After running fstrim manually the first two times, I got this:

$ sudo fstrim -v /
/: 498557419520 bytes were trimmed
$ sudo fstrim -v /
/: 0 bytes were trimmed

I also ran some of Martin Pitt's load tests for a few minutes while calling fstrim -v repeatedly. No problems.

Since Crucial updated the firmware to MU05 on 2014-03-25 for the affected M500, and since the MX100 drive apparently has a slightly newer controller...

"These new Crucial MX100 SSD series features the Marvell 88SS9189 controller, a minor upgrade from the 88SS9187 that was found in the M500 drive."
http://www.legitreviews.com/crucial-mx100-256gb-512gb-ssd-review_143984

...I'll be brave and add --no-model-check to the "exec fstrim-all" line in the cron fstrim file. If I have any problems, I'll report back.

from hdparm -I:

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

linux-kernel-bugs #71371
[RESOLVED OBSOLETE] Edit

Bug watches keep track of this bug in other bug trackers.

Changed in linux (Ubuntu):
status:	Triaged → Fix Released

Ubuntulinux package

htree_dirblock_to_tree:920: inode #53629599: block 214443464: comm rm: bad entry in directory: rec_len % 4 != 0 - offset=0(0), inode=1667681412, rec_len=45654, name_len=39

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

Ubuntu
linux package