mdadm+lvm+cryptsetup+big disks: "bio too big device"

Bug #1382335 reported by Tapani Tarvainen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I installed two new 6TB disks (WD Red) as RAID1 and pmove'd stuff over from old
5x3TB RAID10 array to it (actually the move is still in progress, going to take some
17 hours or so) and I get lots of these (4971 so far) in dmesg:

[58693.807553] bio too big device dm-7 (664 > 8)
[58694.037060] bio too big device dm-7 (1024 > 8)

Looks like an old kernel bug that was supposedly fixed in 2012:
https://lkml.org/lkml/2012/10/15/398

The LV that apparently triggered that is 3TB in size (90% full) and
encrypted with cryptsetup.

I don't know if this is reproducible but I can try - it'd just take some
two days every time (starting with restoring 3TB from backup).

Ubuntu 14.04.1, 3.13.0-37-generic x86_64 kernel,
Intel DQ57TM mobo, LSI 9211-8i SAS/SATA controller.

Tags: trusty
Revision history for this message
Tapani Tarvainen (ubuntu-tapani) wrote :

Looks like I misfiled this as grub bug. Actually it's probably in the kernel.

apport-cli --save output attached.

affects: grub2 (Ubuntu) → ubuntu
tags: added: trusty
affects: ubuntu → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1382335

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.18 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18-rc1-utopic/

Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Tapani Tarvainen (ubuntu-tapani) wrote :

I'll try to reproduce the bug with a bit smaller data set first (and non-production data), as testing this with the real thing takes rather too long (about two days per test). Also, I haven't yet determined if this resulted in data corruption (I will, by comparing data with backup, but it will also take several days).

(I assume I can ignore the apport-collect request by robot - if the apport-cli output isn't sufficient, let me know. The machine is headless, however, so running apport-collect is a bit difficult.)

Revision history for this message
Tapani Tarvainen (ubuntu-tapani) wrote :

Comparing moved data against backup I found exactly one corrupt file (out of about 1.4 million), despite thousands of those "bio too big" messages. Also, I found one "buffer I/O error" in the logs. So I suspect the "bio too big" messages probably did not cause data corruption at all (even though the earlier bug causing same message reportedly did). So it would be less bad than I originally thought.

Also, I have been unable to reproduce the error with smaller dataset and can't easily recreate the original situation or even one of almost same size anymore, unless I buy more disks. I may do that ahead of schedule just to retest this, but at the moment I can't test if mainline kernel would've made a difference.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.