LibreOffice Productivity Suite

encrypted swap corrupts application stack/heap [was: soffice.bin SIGSEGV cppu::throwException()]

Reported by Scott Kitterman on 2011-03-30
574
This bug affects 75 people
Affects Status Importance Assigned to Milestone
LibreOffice Productivity Suite
Won't Fix
Critical
ecryptfs-utils (Ubuntu)
Critical
Tyler Hicks
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned
Oneiric
Critical
Tyler Hicks
libreoffice (Ubuntu)
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned
Oneiric
High
Unassigned
linux (Ubuntu)
Undecided
Unassigned
Lucid
High
Colin King
Maverick
High
Unassigned
Natty
High
Tyler Hicks
Oneiric
Undecided
Unassigned
openoffice.org (Ubuntu)
Undecided
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned
Oneiric
Undecided
Unassigned

Bug Description

Binary package hint: libreoffice

1) lsb_release -rd
Description: Ubuntu 11.04
Release: 11.04

2) apt-cache policy libreoffice-calc
libreoffice-calc:
  Installed: 1:3.3.3-1ubuntu2
  Candidate: 1:3.3.3-1ubuntu2
  Version table:
 *** 1:3.3.3-1ubuntu2 0
        100 /var/lib/dpkg/status
     1:3.3.2-1ubuntu5 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386 Packages
     1:3.3.2-1ubuntu4 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages

apt-cache policy libreoffice-writer
libreoffice-writer:
  Installed: 1:3.3.3-1ubuntu2
  Candidate: 1:3.3.3-1ubuntu2
  Version table:
 *** 1:3.3.3-1ubuntu2 0
        100 /var/lib/dpkg/status
     1:3.3.2-1ubuntu5 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386 Packages
     1:3.3.2-1ubuntu4 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages

3) What is expected to happen in a KDE Natty in a KDE session with the KDE integration active or GNOME is a Writer or Calc file untouched for a long period of time (ex. 1 hour+) is when one tries to edit it, the application does not crash.

4) What happens instead is it crashes. This is highly correlated to both EcryptfsInUse and resource constrained (Memory & CPU >> 50%) environments. Occurs with:

+ Intel drivers, Compiz not enabled, Writer open only bug 745836
+ binary ATI drivers, Compiz enabled, Calc open only bug 799047

ProblemType: Crash
DistroRelease: Ubuntu 11.04
Package: libreoffice-core 1:3.3.2-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.38-7.39-generic 2.6.38
Uname: Linux 2.6.38-7-generic i686
Architecture: i386
Date: Wed Mar 30 12:34:39 2011
Disassembly: => 0x100000: Cannot access memory at address 0x100000
EcryptfsInUse: Yes
ExecutablePath: /usr/lib/libreoffice/program/soffice.bin
ProcCmdline: /usr/lib/libreoffice/program/soffice.bin -writer -splash-pipe=5
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SegvAnalysis:
 Segfault happened at: 0x100000: Cannot access memory at address 0x100000
 PC (0x00100000) not located in a known VMA region (needed executable region)!
SegvReason: executing unknown VMA
Signal: 11SourcePackage: libreoffice
StacktraceTop:
 ?? ()
 cppu::throwException(com::sun::star::uno::Any const&) () from /usr/lib/libreoffice/program/../basis-link/program/../ure-link/lib/libuno_cppuhelpergcc3.so.3
 ucbhelper::cancelCommandExecution(com::sun::star::ucb::IOErrorCode, com::sun::star::uno::Sequence<com::sun::star::uno::Any> const&, com::sun::star::uno::Reference<com::sun::star::ucb::XCommandEnvironment> const&, rtl::OUString const&, com::sun::star::uno::Reference<com::sun::star::ucb::XCommandProcessor> const&) () from /usr/lib/libreoffice/program/../basis-link/program/libucbhelper4gcc3.so
 ?? () from /usr/lib/libreoffice/program/../basis-link/program/libucpfile1.so
 ?? () from /usr/lib/libreoffice/program/../basis-link/program/libucpfile1.so
Title: soffice.bin crashed with SIGSEGV in cppu::throwException()UpgradeStatus: Upgraded to natty on 2011-03-29 (0 days ago)
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare

Scott Kitterman (kitterman) wrote :

StacktraceTop:
 ?? ()
 throwException () from /usr/lib/libreoffice/program/../basis-link/program/../ure-link/lib/libuno_cppuhelpergcc3.so.3
 cancelCommandExecution () from /usr/lib/libreoffice/program/../basis-link/program/libucbhelper4gcc3.so
 throw_handler () from /usr/lib/libreoffice/program/../basis-link/program/libucpfile1.so
 endTask () from /usr/lib/libreoffice/program/../basis-link/program/libucpfile1.so

Changed in libreoffice (Ubuntu):
importance: Undecided → Medium
tags: removed: need-i386-retrace
visibility: private → public

looks like the call to FStatHelper::IsDocument() in
sal_Bool SvxAutoCorrect::CreateLanguageFile( LanguageType eLang, sal_Bool bNewFile ) in
editeng/source/misc/svxacorr.cxx
needs to catch exceptions.

Evan Huus (eapache) wrote :

Setting to confirmed given the number of dupes and the discussion upstream.

Changed in libreoffice (Ubuntu):
status: New → Confirmed
88 comments hidden view all 118 comments

If I leave LibreOffice with a document open for a while (doc, calc, any type), then resume typing in that document, it immediately crashes.

LibreOffice 3.3.2
OOO330m19 (Build:202)
tag libreoffice-3.3.2.2, Ubuntu package 1:3.3.2-1ubuntu2~maverick1

uname -a
Linux will 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux

dmesg output:
[24354.244064] soffice.bin[4180] general protection ip:7fa9d70d9f0e sp:7fff592c1a80 error:0 in libuno_cppuhelpergcc3.so.3[7fa9d70ba000+9e000]

From postings it seems this also affects 32 bit users and is not specific to LO either. It was happening to me on OOO before I migrated.

Ubuntu specific, for Björn?

Just to clarify: by IDLE I mean not using LO application - I go off and do other tasks on other apps. Then when I come back and resume typing into LO it crashes. Other apps all remain stable.

description: updated
Rolf Leggewie (r0lf) on 2011-06-20
tags: added: lucid
89 comments hidden view all 118 comments
Rolf Leggewie (r0lf) wrote :

This is 100% reproducible for me on lucid given sufficient time has elapsed. Let me know if there is any further information I can provide. I'd be happy to run valgrind but would need some guidance.

tags: added: lo33 metabug
90 comments hidden view all 118 comments

Created attachment 48444
gdb backtrace and all thread backtrace

remember the bug : work sometime on OOo, then do something else, come back after a while on OOo -> crash (sometime). or do CTRL+s after having spent sometime on other stuff -> crash(sometime).

From my past experience, this kind of bug started first with the version 3 of openoffice. a try with IBM lotus symphony was resulting in the same crash!!!
finally I though libreoffice was not affected... but that is not the case.

I use ubuntu 10.10 (32b) when the problem started with compiz. (disabling opengl for OOO doesn't affect the bug).
I now have on ubuntu 10.10 libreoffice 3.1 and it seems not to be affected by this bug (I was not able to reproduce it).
I also recently got Ubuntu 10.04.... libreoffice 3.3 crash... 3.4 also!!!
The backtrace seems explicit... the trouble come from uno, ure. Except that is a .net like stuff, I have no more idea how it works, and what part it play in Libreoffice, OOo or lotus symphony.

However it seems related to file io with an exception badly uncatched prducing a SIGSEGV.

My guess is that this bug should be redirected to uno framwork. Also ubuntu distro might also play a role in it.

If you need testing for this bug, I have a document (but professional, not to share) that seems to trigger the bug almost garanted within an hour.

(In reply to comment #3)
> Created an attachment (id=48444) [details]
> gdb backtrace and all thread backtrace
>
> remember the bug : work sometime on OOo, then do something else, come back
> after a while on OOo -> crash (sometime). or do CTRL+s after having spent
> sometime on other stuff -> crash(sometime).
>
> From my past experience, this kind of bug started first with the version 3 of
> openoffice. a try with IBM lotus symphony was resulting in the same crash!!!
> finally I though libreoffice was not affected... but that is not the case.
>
> I use ubuntu 10.10 (32b) when the problem started with compiz. (disabling
> opengl for OOO doesn't affect the bug).
> I now have on ubuntu 10.10 libreoffice 3.1 and it seems not to be affected by
> this bug (I was not able to reproduce it).
> I also recently got Ubuntu 10.04.... libreoffice 3.3 crash... 3.4 also!!!

OOOps, mistake in ubuntu version
read Ubuntu 10.04 as 11.04, sorry...

90 comments hidden view all 118 comments

Possibly related to bug 185600 and freedesktop bug 37121.
also possibly related: http://ubuntuforums.org/showthread.php?t=1237608

Additional information needed (because of the observed behaviour from other bugs):
Is this reproducable:

- on compiz
- without compiz

- on nvidia binary drivers
- on non-nvidia binary drivers

- with LibreOffice Draw open
- with only LibreOffice Writer/Calc open

=> set to incomplete

Changed in libreoffice (Ubuntu):
status: Confirmed → Incomplete
Scott Kitterman (kitterman) wrote :

I use KDE without compiz and my systems are all Intel, so compiz and nvidia have nothing to do with it.

Changed in libreoffice (Ubuntu):
status: Incomplete → Confirmed
Evan Huus (eapache) wrote :

I've seen this with only Writer open, so it's not particularly related to Draw either.

The observation in https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-180/+bug/185600/comments/42 was that it stops happening, when Draw is open too.

@Evan: As you see the bug: Could you try if it disappear if you have Draw open too?

Evan Huus (eapache) wrote :

That'll teach me to only read the summary :)

I'll see what I can do to test, but given the required time to reproduce a single instance it might be a while before I can give a definite yes/no.

Rolf Leggewie (r0lf) wrote :

I am observing this problem in OOo.org Writer and Calc on my lucid system. And indeed, having a Draw window open seems to prevent the occurrence of this problem.

Rolf Leggewie (r0lf) wrote :

The Draw window seems to help somewhat but not completely prevent crashes after inactivity.

85 comments hidden view all 118 comments

gdb backtrace and valgrind from 10.10 (Mint) 64 bit.

Created attachment 49669
gdb backtrace taken from 10.10 (mint), 64bit

2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:32:27 UTC 2010 x86_64 GNU/Linuxlibreoffice-core 1:3.3.2-1ubuntu2~maverick1

Created attachment 49670
valgrind taken from 10.10 (mint), 64bit

2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:32:27 UTC 2010 x86_64 GNU/Linuxlibreoffice-core 1:3.3.2-1ubuntu2~maverick1

I am seeing the exact same behavior for many months, also since OOo. I have previously submitted two gdb traces which might be useful to people debugging this issue - below.

I am seeing this issue both in Writer and Impress very often, and it makes working with these applications extremely unpredictable and unpleasant. Maybe 50% of the times I start the application, I need to go through document recovery first.

https://bugs.freedesktop.org/show_bug.cgi?id=35424

description: updated
87 comments hidden view all 118 comments
Scott Kitterman (kitterman) wrote :

Use gnumeric isn't a workaround.

description: updated

Scott Kitterman, I'm not here to argue with you. Gnumeric is an acceptable workaround, though it is agreed not an ideal one, as one may use a spreadsheet application in Ubuntu without it crashing every minute. :) Providing similar workaround information is helpful to other Ubuntu users who just want the functionality but do not care what program delivers it.

description: updated
Scott Kitterman (kitterman) wrote :

I gather you are fairly new to the Ubuntu community. There is a long standing tradition in Ubuntu not to have reversion wars, so rather than revert my reversion, you should have asked someone else to take a look at it. BTW, if you are "not here to argue with [me]" you demonstrate it in a very odd way. I will ask someone else to review it. A few thoughts for you in the meantime:

1. I filed the bug, so I think it's a reasonable definition of an acceptable workaround that it would work for me. Virtually all of my work on office type documents involves email them to and receiving them from other people in MS Office formats. I need broad, reliable support for the MS Office formats that only OOo/LO provide (this is unfortunate, but it's where we are now).

2. Gnumeric doesn't integrate with my desktop environment. If I didn't care about #1, kchart or kword would be better choices.

3. This isn't about a crash "every minute". It's about what happens when you leave a document sit idle for some time and come back, so this doesn't make LO unusable. It just means I'm glad document recovery works as well as it does. Switching to a different office suite is totally unnecessary because of this bug.

Marking Importance to High based on easily and frequently reproduced crash among a multitude of duplicates.

Changed in libreoffice (Ubuntu):
status: Confirmed → Triaged
assignee: nobody → Björn Michaelsen (bjoern-michaelsen)
assignee: Björn Michaelsen (bjoern-michaelsen) → nobody
importance: Medium → High

In an attempt to work around this OOo crash problem I maxed out my system's RAM so it no longer needs swap memory. I have had no further crashes regardless of how long documents and spread sheets are left open. I believe that this is strong evidence that the failure is caused once data is moved from RAM to swap memory space. I did do a rather unscientific query to see if this problem exists on other distro's i.e. SUSE, Fedora etc. but did not find a similar bug report which leads me to believe that this issue is Ubuntu specific.

Changed in openoffice.org (Ubuntu):
status: New → Confirmed
Changed in openoffice.org (Ubuntu):
status: Confirmed → Won't Fix
Rolf Leggewie (r0lf) wrote :

Is it really too much to ask for a short explanation when a ticket or task is closed? -> bug 825837

I guess in this case it's easy enough to guess that only libreoffice will receive the fix, but why do we need to guess? And if the fix is easily applicable to the package in lucid, what's the rush to decide now NOT to fix it?

Does this bug still show up in libreoffice-3.4.3-1ubuntu2?
https://launchpad.net/ubuntu/+source/libreoffice/1:3.4.3-1ubuntu2

Changed in libreoffice (Ubuntu):
status: Triaged → Incomplete

[This is an automated message.]
There are no new official OpenOffice.org releases in Ubuntu packaging anymore => Won't Fix

If the problem persists, please mark this bug as "also affects project Libreoffice" or "also affects distribution Libreoffice (Ubuntu)" if that has not happened already.

Please leave references to upstream OpenOffice.org bugs in place to allow cross pollination.

I don't know. The system where I mostly do office type document work is still running Natty. Putting back to confirmed so it doesn't time out.

Changed in libreoffice (Ubuntu):
status: Incomplete → Confirmed
zmago (zmago-fluks) wrote :

And how long does it take otherwise to fix that bug? Because it's really nasty bug. Is not possible to work reliably with office suite in Ubuntu.

@zmago: You can help fixing this bug by providing the asked information:
 - Is there a reliable reproducable scenario ("sometimes crashes after hours" is neither reliable nor reproducible)
 - Does this still happen with 3.4.3?

Just to reiterate -- this needs a good reproducible scenario. I just failed to repoduce this with the available info:
According to reports so far , this should happen on 3.3.3 on natty when no Draw Window is open and the following steps are performed:
- Open a Writer document
- type some text
- leave windows alone for > 1hour
- return to window and type something again.

Shahar Or (mightyiam) wrote :

Next time I have this I try to understand if there's a pattern.

zmago (zmago-fluks) wrote :

Thank you. I will try to figure out the pattern. Same problem is on my netbook and desktop... on both i have ecryptfs turned on in user profile where crash is happening. But on some laptop I'm not using folder encryption so there is also no crashing. Temporary solution for me is chaging autosave to 5 min or even less. But I will tell you more when I will put attention to the crashing pattern. I'm not used of that community is listening you and that you also can help in some kind of a way :) So I will do what I can.

Cheers.

ricardo (rh-) wrote :

Hi,
I was strugeling with this bug over a year. What I figured out is that it always crashed when I had some documents open and I was not working with the libreoffice for some time. When I came back to write a word o to save the document o to paste information the libreoffice crashed.
I had the home-directory encrypted.
When the .libreoffice-directory is encrypted the libreoffice will often crash. So I moved this directory to a non-encryped directoy and there were no more crashes.
I had the crashes with all versions of libreoffice and openoffice 3.0 and higher. This was the version when y encryped my home-directory upgrading to ubuntu 10.04.
I hope this will help to fix this bug soon.
I guess there is a timeout while openoffice is waiting for some information from the encrypted config-directory. As the information is not coming on time the program crashes by whatever.

My solution was after waiting for over one year of a bugfix to decrypt my home-directory. This is what I did today. Hope that this will fix this bug for me.
Bye

Rolf Leggewie (r0lf) wrote :

As previously said, I am also affected by this bug. I do have an encrypted (ecryptfs) home partition, so this may indeed be necessary. For me, the crashes are easily reproducible.

Shahar Or (mightyiam) wrote :

I also have encrypted home.

indrek (indrek-seppo) wrote :

As is already mentioned in this thread it seems to be dependent on whether the computer is swapping. I only have the problem when my RAM is maxed out. I currently have plenty of RAM, thus it has happened rarely and only when I am doing something very RAM-intensive.

Evan Huus (eapache) wrote :

I've scanned through a whole bunch of the duplicates for this bug. Every single one of them has ecryptfs enabled, and several of them mention being low-ram at the time of the crash.

With those two requirements in mind, I'm currently creating a low-memory Natty VM that uses ecryptfs. Assuming I can easily reproduce the problem there, I'll upgrade the VM to Oneiric and try again.

I'll report any progress back here.

description: updated
Evan Huus (eapache) wrote :

Success! I can easily and quickly reproduce the problem in Natty. It isn't a time-related issue at all, simply resource-related.

Setup:
On Virtualbox 4.1.2, create a 32-bit Ubuntu VM with low memory (472MB in my case). Make sure your virtual hard-drive is at least 12GB so that Ubuntu allocates enough swap space to pull this off.
Install 11.04 from CD and install all updates, then reboot.
Vista-32 host if that makes any difference (which I doubt).

Steps to reproduce:
Open up LO writer, type a few words and save the file to disk. Leave the doc open, but minimized or otherwise inactive.
Open up every other app you can think of. System Monitor reported >200MB swapped for me.
Go back to LO and type a mis-spelled word, followed by a space.
Wait until the disk stops churning.
LO has crashed with this bug.

Once set up, it takes me less than ten minutes to produce another instance.

I stayed on the safe side and tried to replicate the most extreme of all reported conditions. It's likely that some of the steps I list are not actually required to reproduce this, but narrowing it down will take time.

I am currently upgrading the VM to oneiric to determine whether or not this is still an issue there.

I can easily reproduce this problem on my Toshiba L300D laptop. I am running Maverick with ecryptfs home directory. I just have to put the original 1GB RAM in, launch Ubuntu and OOo and then work with memory intensive applications. This causes swap to populate and then any action to any open OOo documents that are open will cause OOo to crash. It it much harder to force it to crash with 4GB of RAM so for now, that is my work-around. These crashes apparently affect both OOo and Libre the same.

SergeiS (sergei-redleafsoft) wrote :

Bjorn: this is definitely a ecryptfs related issue. The workaround I've found somewhere is to simply repoint HOME to an unincorporated directory like:

env HOME=/home/sergei/Unprotected libreoffice -calc

Which is not a fix but something to get by. OO and LO save all state there from that point, which might be a security risk.

Note to other users, in case you didn't realize, /home/sergei/Unprotected is a link to a directory outside of my encrypted home directory. You can make one by:

sudo mkdir /home/Unprotected
sudo chown your_login:your_login /home/Unprotected
ln -s /home/Unprotected ~

SergeiS (sergei-redleafsoft) wrote :

Sorry, not "unincorporated", but "unencrypted". Spellchecker's playing smart.

Evan Huus (eapache) wrote :

Something I missed in my previous comment: I selected the encrypted home option while installing Ubuntu in the VM.

Shahar Or (mightyiam) wrote :

Excellent work, folks!

Changed in ecryptfs-utils (Ubuntu):
importance: Undecided → High
status: New → Confirmed
Changed in libreoffice (Ubuntu):
status: Confirmed → Invalid
Changed in df-libreoffice:
status: New → Invalid
Changed in libreoffice (Ubuntu):
status: Invalid → Confirmed
summary: - soffice.bin crashed with SIGSEGV in cppu::throwException()
+ ecryptfs encrypted swap corrupts application stack/heap [was:
+ soffice.bin SIGSEGV cppu::throwException()]
Changed in libreoffice (Ubuntu):
status: Confirmed → Invalid
Changed in ecryptfs-utils (Ubuntu):
importance: High → Critical
Brad Figg (brad-figg) on 2011-09-23
Changed in linux (Ubuntu):
status: New → Incomplete
summary: - ecryptfs encrypted swap corrupts application stack/heap [was:
- soffice.bin SIGSEGV cppu::throwException()]
+ encrypted swap corrupts application stack/heap [was: soffice.bin SIGSEGV
+ cppu::throwException()]
Changed in ecryptfs-utils (Ubuntu Oneiric):
milestone: none → ubuntu-11.10
Tyler Hicks (tyhicks) on 2011-09-23
Changed in ecryptfs-utils (Ubuntu Oneiric):
assignee: nobody → Tyler Hicks (tyhicks)
64 comments hidden view all 118 comments

not really an libreoffice bug, but an issue with encrypted home/swap.

*** Bug 35424 has been marked as a duplicate of this bug. ***

Changed in df-libreoffice:
importance: Undecided → Unknown
status: Invalid → Unknown

*** Bug 33025 has been marked as a duplicate of this bug. ***

*** Bug 40766 has been marked as a duplicate of this bug. ***

Changed in df-libreoffice:
importance: Unknown → Critical
status: Unknown → Won't Fix
Rolf Leggewie (r0lf) on 2011-09-26
Changed in linux (Ubuntu Oneiric):
status: Incomplete → Fix Released
Changed in libreoffice (Ubuntu Maverick):
status: New → Invalid
Changed in libreoffice (Ubuntu Natty):
status: New → Invalid
Changed in linux (Ubuntu Natty):
status: New → Confirmed
importance: Undecided → High
Changed in openoffice.org (Ubuntu Natty):
status: New → Won't Fix
Changed in openoffice.org (Ubuntu Maverick):
status: New → Won't Fix
29 comments hidden view all 118 comments
Yaron Sheffer (yaronf) wrote :

I implemented the workaround in #71 and things were better at first (thanks Luca!). But today, after a suspend/resume, Writer went Boom as soon as I touched it. (I was using a feature - Track Changes - for the first time, which may have something to do with it). So as far as I'm concerned this is not a complete solution.

bug 825161 and bug 807759 are also dupes of this.

tags: added: rls-mgr-o-tracking
Dustin Kirkland  (kirkland) wrote :

Marking the ecryptfs-utils tasks invalid, as this turned out to be a kernel issue

Changed in ecryptfs-utils (Ubuntu Oneiric):
status: Confirmed → Invalid
Changed in ecryptfs-utils (Ubuntu Natty):
status: New → Invalid
Changed in ecryptfs-utils (Ubuntu Maverick):
status: New → Invalid
description: updated
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Maverick):
status: New → Confirmed
Rolf Leggewie (r0lf) wrote :

OK, according to the feedback here this does seem to be fixed in oneiric, great! But let's not get back to business just quite, yet. This is a very visible and highly annoying bug that affects a core package in three releases including the latest LTS. We need to identify the patch that fixed this and backport it.

Tyler? Dustin? Andy?

Changed in linux (Ubuntu Maverick):
importance: Undecided → High
milestone: none → maverick-updates
Changed in linux (Ubuntu Natty):
milestone: none → natty-updates
Rolf Leggewie (r0lf) wrote :

I encourage all people from the duplicates to visit https://bugs.launchpad.net/ubuntu/natty/+source/linux/+bug/745836/+affectsmetoo and indicate that they are affected as well. This should hopefully increase priority with the devs, too.

On 2011-09-30 00:23:31, Rolf Leggewie wrote:
> This is a very visible and highly annoying bug that affects a core
> package in three releases including the latest LTS. We need to identify
> the patch that fixed this and backport it.

I agree.

I've got a feeling that the fix for this is upstream commit
3b06b3ebf44170c90c893c6c80916db6e922b9f2 but when I wrote that patch it
depended on several other patches that aren't in the natty (or older)
kernels.

I started working on a backport, but it was getting too large. I've got
to rethink the approach and come up with a simpler fix that is suitable
for a backport.

Tyler Hicks (tyhicks) wrote :

This turned out to be a tricky one. It is definitely an eCryptfs bug. The upstream fix that I thought would solve this issue ended up not being the right fix. Instead, it turned out to be the following two commits:

bd4f0fe8bb7c73c738e1e11bc90d6e2cf9c6e20e
fed8859b3ab94274c986cbdf7d27130e0545f02c

However, I didn't write those patches as bug fixes. I was simply cleaning out some crufty looking code. It turned out to be buggy code, too.

Creating a file, extending that file, the file's pages being reclaimed, finally followed by reading the file is what triggers this. In the case of this bug report, the system being under memory pressure is what forced the file's pages out of the page cache.

The easiest way to reproduce the bug is with the following shell commands:

$ touch foo && truncate -s 4096 foo && sync && echo 1 | sudo tee /proc/sys/vm/drop_caches && hexdump -C foo

hexdump should show a file filled with zeroes, but it doesn't.

Data corruption is a possibility if the file is written to before the eCryptfs directory is unmounted.

It looks like all kernels before 2.6.39 are affected, possibly all the way back to the beginning of eCryptfs being merged upstream. Patch, with all the technical eCryptfs details in the commit message, to follow...

Shahar Or (mightyiam) wrote :

Wow, Tyler.

Good job. Can't wait for the fix release.

zmago (zmago-fluks) wrote :

Thank you Tyler! So this topic is now going to be solved finally?

omarly666 (omarly666) wrote :

nice if it's "fixed" :)
think i got this again when trying import from my sdcard into Shotwell yesterday.

drop me a line at 48045666@.... if you still want some output from my laptop

Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Natty):
assignee: nobody → Tyler Hicks (tyhicks)
status: Confirmed → In Progress
Tim Gardner (timg-tpi) wrote :

Tyler - this patch fixes the corruption caused by the reproducer in #86 using a 2.6.38 kernel. However, I'm not able to reproduce this using a 2.6.35 kernel. Do you have any advice as to whether this patch is really applicable on kernels older then 2.6.38 ?

Herton R. Krzesinski (herton) wrote :

This bug is awaiting verification that the kernel for Natty in -proposed solves the problem (2.6.38-13.52). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-natty' to 'verification-done-natty'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-natty
Tim Gardner (timg-tpi) wrote :

I'm clearing the Natty verification tag as I was able to reproduce the corruption and verify that this patch fixes the symptoms caused by the reproducer.

tags: added: verification-done-natty
removed: verification-needed-natty
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.38-13.52

---------------
linux (2.6.38-13.52) natty-proposed; urgency=low

  [Herton R. Krzesinski]

  * Release Tracking Bug
    - LP: #887379

  [ Konrad Rzeszutek Wilk ]

  * SAUCE: x86/paravirt: Partially revert "remove lazy mode in interrupts"
    - LP: #854050

  [ Ming Lei ]

  * SAUCE: [media] uvcvideo: Set alternate setting 0 on resume if the bus
    has been reset
    - LP: #816484

  [ Seth Forshee ]

  * SAUCE: acer-wmi: Add wireless quirk for Lenovo 3000 N200
    - LP: #857297

  [ Upstream Kernel Changes ]

  * Make TASKSTATS require root access, CVE-2011-2494
    - LP: #866021
    - CVE-2011-2494
  * proc: restrict access to /proc/PID/io, CVE-2011-2495
    - LP: #866025
    - CVE-2011-2495
  * proc: fix a race in do_io_accounting(), CVE-2011-2495
    - LP: #866025
    - CVE-2011-2495
  * staging: comedi: fix infoleak to userspace, CVE-2011-2909
    - LP: #869261
    - CVE-2011-2909
  * perf tools: do not look at ./config for configuration, CVE-2011-2905
    - LP: #869259
    - CVE-2011-2905
  * e1000e: workaround for packet drop on 82579 at 100Mbps
    - LP: #870127
  * eCryptfs: Remove unnecessary grow_file() function
    - LP: #745836
  * eCryptfs: Remove ECRYPTFS_NEW_FILE crypt stat flag
    - LP: #745836
  * block: blkdev_get() should access ->bd_disk only after success
    - LP: #857170
  * ipv6: restore correct ECN handling on TCP xmit
    - LP: #872179
  * nl80211: fix overflow in ssid_len - CVE-2011-2517
    - LP: #869245
    - CVE-2011-2517
  * ksm: fix NULL pointer dereference in scan_get_next_rmap_item() -
    CVE-2011-2183
    - LP: #869227
    - CVE-2011-2183
  * NLM: Don't hang forever on NLM unlock requests - CVE-2011-2491
    - LP: #869237
    - CVE-2011-2491
  * KVM: fix kvmclock regression due to missing clock update
    - LP: #795717
  * drm/i915: don't enable plane, pipe and PLL prematurely
    - LP: #812638
  * drm/i915: add pipe/plane enable/disable functions
    - LP: #812638
 -- Herton Ronaldo Krzesinski <email address hidden> Mon, 07 Nov 2011 22:11:51 -0200

Changed in linux (Ubuntu Natty):
status: In Progress → Fix Released
14 comments hidden view all 118 comments
Colin King (colin-king) wrote :

SRU justification for Lucid:

Impact:

The ECRYPTFS_NEW_FILE crypt_stat flag is set upon creation of a new
eCryptfs file. When the flag is set, eCryptfs reads directly from the
lower filesystem when bringing a page up to date. This means that no
offset translation (for the eCryptfs file metadata in the lower file)
and no decryption is performed. The flag is cleared just before the
first write is completed (at the beginning of ecryptfs_write_begin()).

It was discovered that if a new file was created and then extended with
truncate, the ECRYPTFS_NEW_FILE flag was not cleared. If pages
corresponding to this file are ever reclaimed, any subsequent reads
would result in userspace seeing eCryptfs file metadata and encrypted
file contents instead of the expected decrypted file contents.

Data corruption is possible if the file is written to before the
eCryptfs directory is unmounted. The data written will be copied into
pages which have been read directly from the lower file rather than
zeroed pages, as would be expected after extending the file with
truncate.

Fix: Clear the ECRYPTFS_NEW_FILE flags if set. Fix was originally from
Tyler Hicks and needed a little massaging to apply for the current Lucid,
see https://launchpadlibrarian.net/82254993/0001-eCryptfs-Clear-ECRYPTFS_NEW_FILE-flag-during-truncat.patch

Testcase:

foo && truncate -s 4096 foo && sync && echo 1 | sudo tee /proc/sys/vm/drop_caches && hexdump -C foo

and hexdump should show a file filled with zeroes. Without the fix the file
is full of garbage, whereas with the fix the file is full of zeros as
expected.

Andy Whitcroft (apw) on 2012-03-16
Changed in ecryptfs-utils (Ubuntu Lucid):
status: New → Invalid
Changed in libreoffice (Ubuntu Lucid):
status: New → Invalid
Changed in linux (Ubuntu Lucid):
status: New → Confirmed
assignee: nobody → Colin King (colin-king)
importance: Undecided → High
Rolf Leggewie (r0lf) on 2012-03-16
Changed in openoffice.org (Ubuntu Lucid):
status: New → Won't Fix
Andy Whitcroft (apw) on 2012-03-16
Changed in openoffice.org (Ubuntu Lucid):
status: Won't Fix → Invalid
no longer affects: libreoffice (Ubuntu)
Colin King (colin-king) wrote :

+ SRU for Maverick too.

Tim Gardner (timg-tpi) on 2012-03-16
Changed in linux (Ubuntu Maverick):
status: Confirmed → Fix Committed
Colin King (colin-king) wrote :

Tested and verified for 2.6.35-32.68 -proposed with ext2, ext3, ext4, xfs and btrfs lower

tags: added: verification-done-maverick
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel for Lucid in -proposed solves the problem (2.6.32-41.88). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lucid' to 'verification-done-lucid'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-lucid
Colin King (colin-king) wrote :

verified on lucid 2.6.32-41.88 -proposed with ext2, ext3, ext4, xfs lower file systems

tags: added: verification-done-lucid
removed: verification-needed-lucid
Rolf Leggewie (r0lf) wrote :

I can also verify that lucid 2.6.32-41.88 from lucid-proposed fixes the issue. Please release. Thank you.

Launchpad Janitor (janitor) wrote :
Download full text (4.8 KiB)

This bug was fixed in the package linux - 2.6.32-41.88

---------------
linux (2.6.32-41.88) lucid-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #966443

  [ Andy Whitcroft ]

  * [Config] restore build-% shortcut

  [ Tim Gardner ]

  * SAUCE: ubuntu drivers: use UMH_WAIT_PROC consistently
    - LP: #963685

  [ Upstream Kernel Changes ]

  * Revert "Revert "USB: xhci - fix unsafe macro definitions""
    - LP: #948139
  * Revert "Revert "USB: xhci - fix math in xhci_get_endpoint_interval()""
    - LP: #948139
  * Revert "Revert "xhci: Fix full speed bInterval encoding.""
    - LP: #948139
  * bsg: fix sysfs link remove warning
    - LP: #946928
  * hwmon: (f75375s) Fix bit shifting in f75375_write16
    - LP: #948139
  * lib: proportion: lower PROP_MAX_SHIFT to 32 on 64-bit kernel
    - LP: #948139
  * relay: prevent integer overflow in relay_open()
    - LP: #948139
  * mac80211: timeout a single frame in the rx reorder buffer
    - LP: #948139
  * kernel.h: fix wrong usage of __ratelimit()
    - LP: #948139
  * printk_ratelimited(): fix uninitialized spinlock
    - LP: #948139
  * hwmon: (f75375s) Fix automatic pwm mode setting for F75373 & F75375
    - LP: #948139
  * crypto: sha512 - Use binary and instead of modulus
    - LP: #948139
  * crypto: sha512 - Avoid stack bloat on i386
    - LP: #948139
  * crypto: sha512 - use standard ror64()
    - LP: #948139
  * SCSI: 3w-9xxx fix bug in sgl loading
    - LP: #948139
  * ARM: 7321/1: cache-v7: Disable preemption when reading CCSIDR
    - LP: #948139
  * ARM: 7325/1: fix v7 boot with lockdep enabled
    - LP: #948139
  * USB: Added Kamstrup VID/PIDs to cp210x serial driver.
    - LP: #948139
  * USB: Fix handoff when BIOS disables host PCI device.
    - LP: #948139
  * xhci: Fix encoding for HS bulk/control NAK rate.
    - LP: #948139
  * hdpvr: fix race conditon during start of streaming
    - LP: #948139
  * cdrom: use copy_to_user() without the underscores
    - LP: #948139
  * autofs: work around unhappy compat problem on x86-64
    - LP: #948139
  * Fix autofs compile without CONFIG_COMPAT
    - LP: #948139
  * compat: fix compile breakage on s390
    - LP: #948139
  * PM: Print a warning if firmware is requested when tasks are frozen
    - LP: #948139
  * firmware loader: allow builtin firmware load even if usermodehelper is
    disabled
    - LP: #948139
  * PM / Sleep: Fix freezer failures due to racy
    usermodehelper_is_disabled()
    - LP: #948139
  * PM / Sleep: Fix read_unlock_usermodehelper() call.
    - LP: #948139
  * Linux 2.6.32.58
    - LP: #948139
  * regset: Prevent null pointer reference on readonly regsets
    - LP: #949905
    - CVE-2012-1097
  * regset: Return -EFAULT, not -EIO, on host-side memory fault
    - LP: #949905
    - CVE-2012-1097
  * KVM: Remove ability to assign a device without iommu support
    - LP: #897812
    - CVE-2011-4347
  * eCryptfs: Copy up lower inode attrs after setting lower xattr
  * eCryptfs: Improve statfs reporting
    - LP: #885744
  * drm/i915: no lvds quirk for AOpen MP45
    - LP: #955078
  * drm/radeon/kms: fix MSI re-arm on rv370+
    - LP: #955078
  * Linux 2.6.32.58+drm33.24
    - LP: #955078
  ...

Read more...

Changed in linux (Ubuntu Lucid):
status: Confirmed → Fix Released
JC Hulce (soaringsky) wrote :

This bug affects Ubuntu 10.10, Maverick Meerkat. Maverick has reached end-of-life and is no longer supported, so I am closing the bugtask for Maverick. Please upgrade to a newer version of Ubuntu.
More information here: https://lists.ubuntu.com/archives/ubuntu-announce/2012-April/000158.html

Changed in linux (Ubuntu Maverick):
status: Fix Committed → Invalid
Displaying first 40 and last 40 comments. View all 118 comments or add a comment.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.