severe ecryptfs corruption

Bug #521523 reported by Thomas O. on 2010-02-13
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
eCryptfs
Critical
Tyler Hicks
ecryptfs-utils (Ubuntu)
Critical
Unassigned

Bug Description

I am experiencing filesystem corruption with ecryptfs on my /home partition, which is separate from /.

I am attempting to back up important stuff, but struggling to do so (it is incredibly slow and 100% cpu usage on both cores)

In general, the following errors are occurring:
 - Random disk IO errors (mostly during writing)
 - Very slow access times for files on the file system. In most cases, files seem intact, which is good...
 - 100% cpu on one or both cores from various programs: rsyslogd, dd are the most common offenders.
 - Programs such as firefox and chrome, when quitting, do not exit, but persist, and IO on the CPU is continuous from this point onwards (there is no way to stop this IO usage but reboot)
 - dmesg shows hundreds of thousands of ecryptfs errors (see attached file)

I will be reinstalling soon (and am thankful I noticed this before it caused a major problem), but I am aware that this may be a problem affecting more than me...

Thomas O. (thomas-tgohome) wrote :

Attachment of dmesg log, note event log is much larger than this but only last N entries or so are recorded.

description: updated
Thomas O. (thomas-tgohome) wrote :

I have just noticed that my USB drive is showing a write speed of 594MiB/s on Conky. It is disconnected. I am not sure if this is related.

Monkey (monkey-libre) wrote :

I´ve assigned this bug to the ecryptfs-utils package. Thank You for making Ubuntu better.

affects: ubuntu → ecryptfs-utils (Ubuntu)
Thomas O. (thomas-tgohome) wrote :

I'm getting more strange errors like this, but it seems to have improved on my system. I think the bug might be triggered by low disk space since when this bug occurred I had less than 200 MB of space available.

[ 104.399013] Valid eCryptfs headers not found in file header region or xattr region
[ 104.399019] Either the lower file is not in a valid eCryptfs format, or the key could not be retrieved. Plaintext passthrough mode is not enabled; returning -EIO
[ 104.399138] Valid eCryptfs headers not found in file header region or xattr region
[ 104.399141] Either the lower file is not in a valid eCryptfs format, or the key could not be retrieved. Plaintext passthrough mode is not enabled; returning -EIO
[ 109.572511] wlan0: no IPv6 routers present
[ 119.714109] iwlagn 0000:04:00.0: iwl_tx_agg_start on ra = 00:24:2c:78:ff:4d tid = 0
[ 241.450544] Valid eCryptfs headers not found in file header region or xattr region
[ 241.450551] Either the lower file is not in a valid eCryptfs format, or the key could not be retrieved. Plaintext passthrough mode is not enabled; returning -EIO
[ 257.999752] Valid eCryptfs headers not found in file header region or xattr region
[ 257.999757] Either the lower file is not in a valid eCryptfs format, or the key could not be retrieved. Plaintext passthrough mode is not enabled; returning -EIO

I am getting more of these errors than any others.

Dustin Kirkland  (kirkland) wrote :

These messages from eCryptfs are usually benign, and result from 0-byte files being written to the underlying filesystem.

Look for 0-byte files:
  find $HOME/.Private -size 0

In most cases, these can be simply deleted and your error messages will stop.

Dustin Kirkland  (kirkland) wrote :

Make that:

find $HOME/.Private/ -size 0

Dustin Kirkland  (kirkland) wrote :

But you're claiming corruption...

Do you in fact have files that were okay when you wrote them to eCryptfs, but are now garbled somehow?

Changed in ecryptfs-utils (Ubuntu):
status: New → Incomplete
Thomas O. (thomas-tgohome) wrote :

In one case, many files could not be copied across to my memory stick (with 2 out of 4 gigs free, NTFS) and each continually reported "Input/Output error" or "Unknown error" (I think this was one of them.)

I am not sure if this is corruption, but the file could not be completely read, the data might be there somewhere, but it wasn't when I tried to copy it. After resolving my disk space issue the file was fine. So I think when the disk space is low, ecryptfs freaks out, and this is bad. Imagine editing a file and getting only halve of the content read and saved.

Chris Halse Rogers (raof) wrote :

I seem to have hit this also, although in my case it was causing desktop-couch to fail to startup:
└─(16:41:%)── ls ~/.cache/desktop-couch/log -lah ──(Mon,Feb22)─┘
total 84K
drwxr-xr-x 2 chris chris 4.0K 2010-02-04 10:20 .
drwx------ 3 chris chris 4.0K 2010-02-22 11:28 ..
-rw-r--r-- 1 chris chris 12K 2009-12-14 09:27 desktop-couch-replication.log
-rw-r--r-- 1 chris chris 6.5K 2009-11-04 22:10 desktop-couch-replication.log.2009-11-04
-rw-r--r-- 1 chris chris 38K 2010-02-22 11:36 desktop-couch-startup.log

└─(16:41:%)── less .cache/desktop-couch/log/desktop-couch-replication.log ──(Mon,Feb22)─┘
.cache/desktop-couch/log/desktop-couch-replication.log: Input/output error

I've stupidly deleted that file, though.

Thomas O. (thomas-tgohome) wrote :

I have one file which is corrupted, my VirtualBox config file. VirtualBox reports an XML error, and from looking at the file, I see it is truncated. This happened when my disk space was low. I am going to be reformatting ASAP (in the weekend, I cannot afford to lose my laptop during the week) but I will not be using ecryptfs again.

It seems that when ecryptfs experiences a low disk space condition, files get truncated. So far only VirtualBox has been hit. That can be repaired reasonably easily, by deleting the config and letting VB create it again. But I worry that a more important file will be corrupted eventually, so I have backed up all important stuff on to another, non-ecryptfs HDD.

The files only get corrupted if they are in use at the time of the disk space condition. Attached is the config file which was damaged, note that it is truncated.

Thomas O. (thomas-tgohome) wrote :

Anybody? I think this is fairly serious. I've decided to upgrade when 10.04 comes out, but I've already backed up stuff. Currently I'm avoiding this bug by keeping at least 200 MB disk space free. If I run out of disk space, then files get truncated, and then I have to repair the damage.

Launchpad Janitor (janitor) wrote :

[Expired for ecryptfs-utils (Ubuntu) because there has been no activity for 60 days.]

Changed in ecryptfs-utils (Ubuntu):
status: Incomplete → Expired
Paolo Bonzini (bonzini) wrote :

I have many GB of free space and I get random stuff added at the end of files. git repositories seem to be more susceptible to this. Next time it happens I'll make sure to check how many bytes are added and other suspicious things (e.g. whether the full size becomes close to a 4kb multiple or something like that).

Dustin Kirkland  (kirkland) wrote :

I'm un-expiring this bug since Paolo just responded to it.

Paolo (or anyone else) -- can you give us precise reproduce instructions? ie, what git repository you're working with, and how to make the error occur?

Also, what kernel and what version of ecryptfs-utils are you using?

Changed in ecryptfs-utils (Ubuntu):
status: Expired → Confirmed
status: Confirmed → Incomplete
Paolo Bonzini (bonzini) wrote :

Any git repository I use heavily (mostly kernel, Xen and qemu in my case, but it is not relevant I think) will corrupt at least once a day, as proved from the fact that I just reproduced it. :) Probably interactive rebases and switching branches help.

However, I don't know how to make the error occur. I created the repositories in a non-ecryptfs dir and then rsynced a lot of files to ecryptfs -- however I don't believe this is related to the bug.

The kernel is 2.6.35.10.

My suspicion about 4kb multiples is correct:

-rw-rw-r-- 1 pbonzini users 20480 Sep 22 17:04 ui/sdl_keysym.h
-rw-rw-r-- 1 pbonzini users 12288 Sep 22 17:04 ui/sdl_zoom.c
-rw-rw-r-- 1 pbonzini users 12288 Sep 22 17:04 ui/sdl_zoom.h
-rw-rw-r-- 1 pbonzini users 16384 Sep 22 17:04 ui/sdl_zoom_template.h
-rw-rw-r-- 1 pbonzini users 28672 Feb 2 15:23 ui/spice-core.c
-rw-rw-r-- 1 pbonzini users 20480 Nov 29 12:35 ui/spice-display.c
-rw-rw-r-- 1 pbonzini users 12288 Nov 29 12:35 ui/spice-display.h
-rw-rw-r-- 1 pbonzini users 16384 Oct 14 12:59 ui/spice-input.c

and even better, umount + mount fixes it---so somehow the stat cache in ecryptfs is getting corrupted.

Paolo Bonzini (bonzini) wrote :

I'm not using Ubuntu, so adding parent ecryptfs product.

Changed in ecryptfs:
status: New → Confirmed

Thanks, Paolo. What kernel are you running?

Tyler, do you use git-on-ecryptfs very often? Have you seen this behavior?

Paolo Bonzini (bonzini) wrote :

2.6.35.10 is the Fedora 14 kernel. I can upgrade to a newer 2.6.37 kernel if you want me to test that, but I don't see anything relevant in the log.

Changed in ecryptfs:
assignee: nobody → Tyler Hicks (tyhicks)
importance: Undecided → Critical
Changed in ecryptfs-utils (Ubuntu):
importance: Undecided → Critical
Paolo Bonzini (bonzini) wrote :

I'm seeing errors like this in /var/log/messages at the same time as corruptions:

Feb 9 14:03:26 yakj kernel: [135241.310137] ecryptfs_read_and_validate_header_region: Error reading header region; rc = [-4]
Feb 9 14:03:26 yakj kernel: [135241.310146] ecryptfs_decrypt_page: Error attempting to read lower page; rc = [-4]
Feb 9 14:03:26 yakj kernel: [135241.310147] ecryptfs_readpage: Error decrypting page; rc = [-4]

-4 is EINTR.

Serge Hallyn (serge-hallyn) wrote :

Quoting Paolo Bonzini (<email address hidden>):
> I'm seeing errors like this in /var/log/messages at the same time as
> corruptions:
>
> Feb 9 14:03:26 yakj kernel: [135241.310137] ecryptfs_read_and_validate_header_region: Error reading header region; rc = [-4]
> Feb 9 14:03:26 yakj kernel: [135241.310146] ecryptfs_decrypt_page: Error attempting to read lower page; rc = [-4]
> Feb 9 14:03:26 yakj kernel: [135241.310147] ecryptfs_readpage: Error decrypting page; rc = [-4]
>
> -4 is EINTR.

That sounds to me like the ecryptfs should retry. The reading of the lower
page just happened to get interrupted by a signal handler. Something like:

From bd6289242f82df4ef254b3e59d38dc39b9b2879d Mon Sep 17 00:00:00 2001
From: Serge E. Hallyn <email address hidden>
Date: Mon, 14 Feb 2011 14:48:22 +0000
Subject: [PATCH 1/1] ecryptfs: retry ecryptfs_read_and_validate_header_region on -EINTR

Signed-off-by: Serge E. Hallyn <email address hidden>
---
 fs/ecryptfs/crypto.c | 6 ++++--
 1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index bfd8b68..81d5fa7 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -1211,8 +1211,10 @@ int ecryptfs_read_and_validate_header_region(char *data,

  if (crypt_stat->extent_size == 0)
   crypt_stat->extent_size = ECRYPTFS_DEFAULT_EXTENT_SIZE;
- rc = ecryptfs_read_lower(data, 0, crypt_stat->extent_size,
- ecryptfs_inode);
+ do {
+ rc = ecryptfs_read_lower(data, 0, crypt_stat->extent_size,
+ ecryptfs_inode);
+ } while (rc == -EINTR);
  if (rc < 0) {
   printk(KERN_ERR "%s: Error reading header region; rc = [%d]\n",
          __func__, rc);
--
1.7.0.4

totally untested.

-serge

Serge Hallyn (serge-hallyn) wrote :

Paolo, which release are you on? I will put a kernel with the proposed fix for your release into a ppa so you can test.

Paolo Bonzini (bonzini) wrote :

I'll apply the patch myself, thanks. But I think hunks are missing for the other two functions.

Paolo Bonzini (bonzini) wrote :

Here is the actual patch I am testing.

tags: added: patch
Gioele Barabucci (gioele) wrote :

May this be the related to bug #509180? All of the symptoms are the same.

Paolo Bonzini (bonzini) wrote :

Yes, it is.

Dustin Kirkland  (kirkland) wrote :

Tyler, can you have a look at the attached patch?

Another user in Bug #509180 claims that this patch solves his corruption issues...

Tyler Hicks (tyhicks) wrote :

This bug report is all over the place. I posted a fix for bug #509180 in that bug report and pieces of this bug report do seem related.

I'm not sure I like loops to handle -EINTR in kernel code. That's normally for user space to handle and I don't see why this is any different. I may consider one retry after an -EINTR error from vfs_read(), but not a loop.

I'm fairly confident that the fix I posted in bug #509180 will solve the issues that the -EINTR patch aims to address. Let me know if anyone feels differently.

Serge Hallyn (serge-hallyn) wrote :

Quoting Tyler Hicks (<email address hidden>):
> I'm not sure I like loops to handle -EINTR in kernel code. That's
> normally for user space to handle and I don't see why this is any

But you don't let it, bc ecryptfs_lookup_and_interpose_lower
doesn't expose that it got -EINTR. So either loop, or pass
the return value to userspace. Current code is broken.

Tyler Hicks (tyhicks) wrote :

On Thu Feb 24, 2011 at 02:40:03PM -0000, Serge Hallyn <email address hidden> wrote:
> Quoting Tyler Hicks (<email address hidden>):
> > I'm not sure I like loops to handle -EINTR in kernel code. That's
> > normally for user space to handle and I don't see why this is any
>
> But you don't let it, bc ecryptfs_lookup_and_interpose_lower
> doesn't expose that it got -EINTR. So either loop, or pass
> the return value to userspace. Current code is broken.

It doesn't sound like you looked at the patch I posted in bug #509180.

Sorry for jumping around between bug reports, but 509180 is the
original report for the interrupted metadata read issue and this one
should likely be marked as a duplicate.

Quoting Tyler Hicks (<email address hidden>):
> On Thu Feb 24, 2011 at 02:40:03PM -0000, Serge Hallyn <email address hidden> wrote:
> > But you don't let it, bc ecryptfs_lookup_and_interpose_lower
> > doesn't expose that it got -EINTR. So either loop, or pass
> > the return value to userspace. Current code is broken.
>
> It doesn't sound like you looked at the patch I posted in bug #509180.

Indeed not :)

> Sorry for jumping around between bug reports, but 509180 is the
> original report for the interrupted metadata read issue and this one
> should likely be marked as a duplicate.

Ok - your comment in this report made it sound like you wanted
to leave the code as is.

Your patch propagates the error code, so +1 from me, thanks.

-serge

Paolo Bonzini (bonzini) wrote :

Agreed, let's close this bug.

Dustin Kirkland  (kirkland) wrote :

I'm going to make this bug a duplicate of Bug #509180, which seems to have the better discussion around the actual issue. Thanks.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers