btrfs discard issue after power event

Bug #1634377 reported by mrturtledev on 2016-10-18
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
btrfs-tools (Ubuntu)
Undecided
Unassigned

Bug Description

----Overview----

Automation scripts testing SSD firmware over power transitions during interoperability testing, with the following procedure:

1) Create 4 25% partitions (varying file systems) and mount as secondary data drive
2) BTRFS partition mounted with discard flag via /etc/fstab
3) Create 10G unique data pattern file on root fs
4) Copy to each target
5) Verify each target
6) Perform power transition (restart, shutdown, sleep, or hibernate)
7) Verify each target
8) Remove target file
9) Copy file from the internal to each target again
10) Verify targets
11) Perform power transition
..etc

BTRFS fails at step 10. The machine has come up from the power event, verified the target files, deleted the target files, copied from the internal again, and fails verifying the freshly copied file.

----Failure----

On failure we see the fio verify threads fail with invalid header (data is ALWAYS "101" when expecting fios ACCA header, I assume a quirk of FIO), dmesg has csum failed messages

 csum failed ino 262 off 9985851392 csum 1474905414 expected csum 210901362

and the file is readable to a certain point, at which it will yield I/O error when attempting to dd.

root@xxxxx:$ dd if=/mnt/g/restart-3.bin of=/tmp/fio bs=512 count=1 skip=19503615
1+0 records in
1+0 records out
512 bytes copied, 0.000311177 s, 1.6 MB/s

root@xxxxx:$ dd if=/mnt/g/restart-3.bin of=/tmp/fio bs=512 count=1 skip=19503616
dd: error reading '/mnt/g/restart-3.bin': Input/output error
0+0 records in
0+0 records out
0 bytes copied, 0.000773759 s, 0.0 kB/s

Here we see that both files claim to be the right size but restart-3.bin is unreadable after the offset above.

-rw-r--r-- 1 root root 10737418240 Oct 17 17:34 restart-1.bin
-rw-r--r-- 1 root root 10737418240 Oct 17 17:44 restart-3.bin

This fails on Ubuntu Server 16.04 with btrfs-progs 4.4 and 4.8, and now Ubuntu Server 16.10. Removing the discard flag from btrfs entry in fstab will result in failure to reproduce, also removing the power event will also result in a failure to reproduce.

----Reproducibility----

Ubuntu Server 16.04 / BTRFS-PROGS 4.4 : 100% within 10 restarts, 25-30 reproductions
Ubuntu Server 16.04 / BTRFS-PROGS 4.8 : 100% within 10 restarts, 5 reproductions
Ubuntu Server 16.10 / BTRFS-PROGS 4.7 : 100% within 10 restarts, 3 reproductions

----System Information----

Distro : ubuntu 16.10
Kernel : Linux 4.8.0-22-generic #24-Ubuntu SMP Sat Oct 8 09:15:00 UTC 2016 x86_64 x86_64 x86_64

CPU : Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz (1261.444)
CPUCores: 4
Model : Gigabyte Technology Co., Ltd. Z170M-D3H-CF
BIOS : American Megatrends Inc. F2

--DUT Controller Info---
PCI Bus ID : 0000:00:17.0
Device Path: /sys/bus/pci/devices//0000:00:17.0
Module Name: ahci
Module Vers: 3.0

---DUT Controller Bus---
00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-H SATA controller [AHCI mode] [8086:a102] (rev 31) (prog-if 01 [AHCI 1.0])

(END)

---DUT Layout---
/dev/sdb4 ext4 110G 23G 82G 22% /mnt/i
/dev/sdb1 ext4 110G 39G 66G 37% /mnt/f
/dev/sdb2 btrfs 112G 33G 79G 30% /mnt/g
/dev/sdb3 xfs 112G 25G 88G 22% /mnt/h

btrfs-tools:
  Installed: 4.7.3-1
  Candidate: 4.7.3-1
  Version table:
 *** 4.7.3-1 500
        500 http://gb.archive.ubuntu.com/ubuntu yakkety/main amd64 Packages
        100 /var/lib/dpkg/status

---- Logs ----
17-10 17:43:41 | --------------------------------------------------------
17-10 17:43:41 | loopy : restart 3 - pre-power copy
17-10 17:43:41 | --------------------------------------------------------
17-10 17:43:41 | Copying from /systemtest/files/loopy_small.bin to /mnt/f/restart-3.bin
17-10 17:43:41 | Started tag cp-_mnt_f_restart-3.bin [2922]
17-10 17:43:41 | Copying from /systemtest/files/loopy_small.bin to /mnt/g/restart-3.bin
17-10 17:43:41 | Started tag cp-_mnt_g_restart-3.bin [2933]
17-10 17:43:41 | Copying from /systemtest/files/loopy_small.bin to /mnt/h/restart-3.bin
17-10 17:43:41 | Started tag cp-_mnt_h_restart-3.bin [2944]
17-10 17:43:41 | Copying from /systemtest/files/loopy_small.bin to /mnt/i/restart-3.bin
17-10 17:43:41 | Started tag cp-_mnt_i_restart-3.bin [2955]
17-10 17:43:41 | --------------------------------------
17-10 17:43:41 | Monitoring 4 pids for 999 minutes
17-10 17:44:15 | PID 2933 - cp-_mnt_g_restart-3.bin - Finished. Exit: 0
17-10 17:45:07 | PID 2944 - cp-_mnt_h_restart-3.bin - Finished. Exit: 0
17-10 17:45:12 | PID 2922 - cp-_mnt_f_restart-3.bin - Finished. Exit: 0
17-10 17:45:13 | PID 2955 - cp-_mnt_i_restart-3.bin - Finished. Exit: 0
17-10 17:45:14 | All tags exhausted
17-10 17:45:14 | --------------------------------------
17-10 17:45:14 |
17-10 17:45:25 |
17-10 17:45:25 | --------------------------------------------------------
17-10 17:45:25 | loopy : restart 3 - pre-power verification
17-10 17:45:25 | --------------------------------------------------------
17-10 17:45:25 | Verifying /mnt/f/restart-3.bin
17-10 17:45:25 | Started tag restart-_mnt_f_-pre [5031]
17-10 17:45:25 | Verifying /mnt/g/restart-3.bin
17-10 17:45:25 | Started tag restart-_mnt_g_-pre [5045]
17-10 17:45:25 | Verifying /mnt/h/restart-3.bin
17-10 17:45:25 | Started tag restart-_mnt_h_-pre [5059]
17-10 17:45:25 | Verifying /mnt/i/restart-3.bin
17-10 17:45:25 | Started tag restart-_mnt_i_-pre [5073]
17-10 17:45:25 | --------------------------------------
17-10 17:45:25 | Monitoring 4 pids for 999 minutes
17-10 17:46:40 | PID 5045 - restart-_mnt_g_-pre - FAILED. Exit: 1
17-10 17:46:40 | FAILED: 5045 has failed.
17-10 17:46:40 | --------------------------------------
17-10 17:46:40 | ERROR: Failed during restart 3 pre-power event verification [Line:499]

BTRFS warning (device sdb2): csum failed ino 262 off 9985851392 csum 1474905414 expected csum 210901362
BTRFS warning (device sdb2): csum failed ino 262 off 9985982464 csum 1218422395 expected csum 1497608406
BTRFS warning (device sdb2): csum failed ino 262 off 9986113536 csum 3058027576 expected csum 25891403

mrturtledev (mrturtledev) wrote :
mrturtledev (mrturtledev) wrote :

This also fails on Antergos kernel 4.8.2-1-ARCH with btrfs-progs 4.8.1

Will update this bug if/when patches go in from the btrfs mailing list.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers