encrypted swap corrupts application stack/heap [was: soffice.bin SIGSEGV cppu::throwException()]

Bug #745836 reported by Scott Kitterman on 2011-03-30
574
This bug affects 76 people
Affects Status Importance Assigned to Milestone
LibreOffice
Won't Fix
Critical
ecryptfs-utils (Ubuntu)
Critical
Tyler Hicks
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned
Oneiric
Critical
Tyler Hicks
libreoffice (Ubuntu)
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned
Oneiric
High
Unassigned
linux (Ubuntu)
Undecided
Unassigned
Lucid
High
Colin Ian King
Maverick
High
Unassigned
Natty
High
Tyler Hicks
Oneiric
Undecided
Unassigned
openoffice.org (Ubuntu)
Undecided
Unassigned
Lucid
Undecided
Unassigned
Maverick
Undecided
Unassigned
Natty
Undecided
Unassigned
Oneiric
Undecided
Unassigned

Bug Description

Binary package hint: libreoffice

1) lsb_release -rd
Description: Ubuntu 11.04
Release: 11.04

2) apt-cache policy libreoffice-calc
libreoffice-calc:
  Installed: 1:3.3.3-1ubuntu2
  Candidate: 1:3.3.3-1ubuntu2
  Version table:
 *** 1:3.3.3-1ubuntu2 0
        100 /var/lib/dpkg/status
     1:3.3.2-1ubuntu5 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386 Packages
     1:3.3.2-1ubuntu4 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages

apt-cache policy libreoffice-writer
libreoffice-writer:
  Installed: 1:3.3.3-1ubuntu2
  Candidate: 1:3.3.3-1ubuntu2
  Version table:
 *** 1:3.3.3-1ubuntu2 0
        100 /var/lib/dpkg/status
     1:3.3.2-1ubuntu5 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386 Packages
     1:3.3.2-1ubuntu4 0
        500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages

3) What is expected to happen in a KDE Natty in a KDE session with the KDE integration active or GNOME is a Writer or Calc file untouched for a long period of time (ex. 1 hour+) is when one tries to edit it, the application does not crash.

4) What happens instead is it crashes. This is highly correlated to both EcryptfsInUse and resource constrained (Memory & CPU >> 50%) environments. Occurs with:

+ Intel drivers, Compiz not enabled, Writer open only bug 745836
+ binary ATI drivers, Compiz enabled, Calc open only bug 799047

ProblemType: Crash
DistroRelease: Ubuntu 11.04
Package: libreoffice-core 1:3.3.2-1ubuntu2
ProcVersionSignature: Ubuntu 2.6.38-7.39-generic 2.6.38
Uname: Linux 2.6.38-7-generic i686
Architecture: i386
Date: Wed Mar 30 12:34:39 2011
Disassembly: => 0x100000: Cannot access memory at address 0x100000
EcryptfsInUse: Yes
ExecutablePath: /usr/lib/libreoffice/program/soffice.bin
ProcCmdline: /usr/lib/libreoffice/program/soffice.bin -writer -splash-pipe=5
ProcEnviron:
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SegvAnalysis:
 Segfault happened at: 0x100000: Cannot access memory at address 0x100000
 PC (0x00100000) not located in a known VMA region (needed executable region)!
SegvReason: executing unknown VMA
Signal: 11SourcePackage: libreoffice
StacktraceTop:
 ?? ()
 cppu::throwException(com::sun::star::uno::Any const&) () from /usr/lib/libreoffice/program/../basis-link/program/../ure-link/lib/libuno_cppuhelpergcc3.so.3
 ucbhelper::cancelCommandExecution(com::sun::star::ucb::IOErrorCode, com::sun::star::uno::Sequence<com::sun::star::uno::Any> const&, com::sun::star::uno::Reference<com::sun::star::ucb::XCommandEnvironment> const&, rtl::OUString const&, com::sun::star::uno::Reference<com::sun::star::ucb::XCommandProcessor> const&) () from /usr/lib/libreoffice/program/../basis-link/program/libucbhelper4gcc3.so
 ?? () from /usr/lib/libreoffice/program/../basis-link/program/libucpfile1.so
 ?? () from /usr/lib/libreoffice/program/../basis-link/program/libucpfile1.so
Title: soffice.bin crashed with SIGSEGV in cppu::throwException()UpgradeStatus: Upgraded to natty on 2011-03-29 (0 days ago)
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare

Scott Kitterman (kitterman) wrote :

StacktraceTop:
 ?? ()
 throwException () from /usr/lib/libreoffice/program/../basis-link/program/../ure-link/lib/libuno_cppuhelpergcc3.so.3
 cancelCommandExecution () from /usr/lib/libreoffice/program/../basis-link/program/libucbhelper4gcc3.so
 throw_handler () from /usr/lib/libreoffice/program/../basis-link/program/libucpfile1.so
 endTask () from /usr/lib/libreoffice/program/../basis-link/program/libucpfile1.so

Changed in libreoffice (Ubuntu):
importance: Undecided → Medium
tags: removed: need-i386-retrace
visibility: private → public

looks like the call to FStatHelper::IsDocument() in
sal_Bool SvxAutoCorrect::CreateLanguageFile( LanguageType eLang, sal_Bool bNewFile ) in
editeng/source/misc/svxacorr.cxx
needs to catch exceptions.

Evan Huus (eapache) wrote :

Setting to confirmed given the number of dupes and the discussion upstream.

Changed in libreoffice (Ubuntu):
status: New → Confirmed

If I leave LibreOffice with a document open for a while (doc, calc, any type), then resume typing in that document, it immediately crashes.

LibreOffice 3.3.2
OOO330m19 (Build:202)
tag libreoffice-3.3.2.2, Ubuntu package 1:3.3.2-1ubuntu2~maverick1

uname -a
Linux will 2.6.35-28-generic #50-Ubuntu SMP Fri Mar 18 18:42:20 UTC 2011 x86_64 GNU/Linux

dmesg output:
[24354.244064] soffice.bin[4180] general protection ip:7fa9d70d9f0e sp:7fff592c1a80 error:0 in libuno_cppuhelpergcc3.so.3[7fa9d70ba000+9e000]

From postings it seems this also affects 32 bit users and is not specific to LO either. It was happening to me on OOO before I migrated.

Ubuntu specific, for Björn?

Just to clarify: by IDLE I mean not using LO application - I go off and do other tasks on other apps. Then when I come back and resume typing into LO it crashes. Other apps all remain stable.

description: updated
Rolf Leggewie (r0lf) on 2011-06-20
tags: added: lucid
Rolf Leggewie (r0lf) wrote :

This is 100% reproducible for me on lucid given sufficient time has elapsed. Let me know if there is any further information I can provide. I'd be happy to run valgrind but would need some guidance.

tags: added: lo33 metabug

Created attachment 48444
gdb backtrace and all thread backtrace

remember the bug : work sometime on OOo, then do something else, come back after a while on OOo -> crash (sometime). or do CTRL+s after having spent sometime on other stuff -> crash(sometime).

From my past experience, this kind of bug started first with the version 3 of openoffice. a try with IBM lotus symphony was resulting in the same crash!!!
finally I though libreoffice was not affected... but that is not the case.

I use ubuntu 10.10 (32b) when the problem started with compiz. (disabling opengl for OOO doesn't affect the bug).
I now have on ubuntu 10.10 libreoffice 3.1 and it seems not to be affected by this bug (I was not able to reproduce it).
I also recently got Ubuntu 10.04.... libreoffice 3.3 crash... 3.4 also!!!
The backtrace seems explicit... the trouble come from uno, ure. Except that is a .net like stuff, I have no more idea how it works, and what part it play in Libreoffice, OOo or lotus symphony.

However it seems related to file io with an exception badly uncatched prducing a SIGSEGV.

My guess is that this bug should be redirected to uno framwork. Also ubuntu distro might also play a role in it.

If you need testing for this bug, I have a document (but professional, not to share) that seems to trigger the bug almost garanted within an hour.

(In reply to comment #3)
> Created an attachment (id=48444) [details]
> gdb backtrace and all thread backtrace
>
> remember the bug : work sometime on OOo, then do something else, come back
> after a while on OOo -> crash (sometime). or do CTRL+s after having spent
> sometime on other stuff -> crash(sometime).
>
> From my past experience, this kind of bug started first with the version 3 of
> openoffice. a try with IBM lotus symphony was resulting in the same crash!!!
> finally I though libreoffice was not affected... but that is not the case.
>
> I use ubuntu 10.10 (32b) when the problem started with compiz. (disabling
> opengl for OOO doesn't affect the bug).
> I now have on ubuntu 10.10 libreoffice 3.1 and it seems not to be affected by
> this bug (I was not able to reproduce it).
> I also recently got Ubuntu 10.04.... libreoffice 3.3 crash... 3.4 also!!!

OOOps, mistake in ubuntu version
read Ubuntu 10.04 as 11.04, sorry...

Possibly related to bug 185600 and freedesktop bug 37121.
also possibly related: http://ubuntuforums.org/showthread.php?t=1237608

Additional information needed (because of the observed behaviour from other bugs):
Is this reproducable:

- on compiz
- without compiz

- on nvidia binary drivers
- on non-nvidia binary drivers

- with LibreOffice Draw open
- with only LibreOffice Writer/Calc open

=> set to incomplete

Changed in libreoffice (Ubuntu):
status: Confirmed → Incomplete
Scott Kitterman (kitterman) wrote :

I use KDE without compiz and my systems are all Intel, so compiz and nvidia have nothing to do with it.

Changed in libreoffice (Ubuntu):
status: Incomplete → Confirmed
Evan Huus (eapache) wrote :

I've seen this with only Writer open, so it's not particularly related to Draw either.

The observation in https://bugs.launchpad.net/ubuntu/+source/nvidia-graphics-drivers-180/+bug/185600/comments/42 was that it stops happening, when Draw is open too.

@Evan: As you see the bug: Could you try if it disappear if you have Draw open too?

Evan Huus (eapache) wrote :

That'll teach me to only read the summary :)

I'll see what I can do to test, but given the required time to reproduce a single instance it might be a while before I can give a definite yes/no.

Rolf Leggewie (r0lf) wrote :

I am observing this problem in OOo.org Writer and Calc on my lucid system. And indeed, having a Draw window open seems to prevent the occurrence of this problem.

Rolf Leggewie (r0lf) wrote :

The Draw window seems to help somewhat but not completely prevent crashes after inactivity.

gdb backtrace and valgrind from 10.10 (Mint) 64 bit.

Created attachment 49669
gdb backtrace taken from 10.10 (mint), 64bit

2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:32:27 UTC 2010 x86_64 GNU/Linuxlibreoffice-core 1:3.3.2-1ubuntu2~maverick1

Created attachment 49670
valgrind taken from 10.10 (mint), 64bit

2.6.35-22-generic #33-Ubuntu SMP Sun Sep 19 20:32:27 UTC 2010 x86_64 GNU/Linuxlibreoffice-core 1:3.3.2-1ubuntu2~maverick1

I am seeing the exact same behavior for many months, also since OOo. I have previously submitted two gdb traces which might be useful to people debugging this issue - below.

I am seeing this issue both in Writer and Impress very often, and it makes working with these applications extremely unpredictable and unpleasant. Maybe 50% of the times I start the application, I need to go through document recovery first.

https://bugs.freedesktop.org/show_bug.cgi?id=35424

description: updated
Scott Kitterman (kitterman) wrote :

Use gnumeric isn't a workaround.

description: updated

Scott Kitterman, I'm not here to argue with you. Gnumeric is an acceptable workaround, though it is agreed not an ideal one, as one may use a spreadsheet application in Ubuntu without it crashing every minute. :) Providing similar workaround information is helpful to other Ubuntu users who just want the functionality but do not care what program delivers it.

description: updated
Scott Kitterman (kitterman) wrote :

I gather you are fairly new to the Ubuntu community. There is a long standing tradition in Ubuntu not to have reversion wars, so rather than revert my reversion, you should have asked someone else to take a look at it. BTW, if you are "not here to argue with [me]" you demonstrate it in a very odd way. I will ask someone else to review it. A few thoughts for you in the meantime:

1. I filed the bug, so I think it's a reasonable definition of an acceptable workaround that it would work for me. Virtually all of my work on office type documents involves email them to and receiving them from other people in MS Office formats. I need broad, reliable support for the MS Office formats that only OOo/LO provide (this is unfortunate, but it's where we are now).

2. Gnumeric doesn't integrate with my desktop environment. If I didn't care about #1, kchart or kword would be better choices.

3. This isn't about a crash "every minute". It's about what happens when you leave a document sit idle for some time and come back, so this doesn't make LO unusable. It just means I'm glad document recovery works as well as it does. Switching to a different office suite is totally unnecessary because of this bug.

Marking Importance to High based on easily and frequently reproduced crash among a multitude of duplicates.

Changed in libreoffice (Ubuntu):
status: Confirmed → Triaged
assignee: nobody → Björn Michaelsen (bjoern-michaelsen)
assignee: Björn Michaelsen (bjoern-michaelsen) → nobody
importance: Medium → High

In an attempt to work around this OOo crash problem I maxed out my system's RAM so it no longer needs swap memory. I have had no further crashes regardless of how long documents and spread sheets are left open. I believe that this is strong evidence that the failure is caused once data is moved from RAM to swap memory space. I did do a rather unscientific query to see if this problem exists on other distro's i.e. SUSE, Fedora etc. but did not find a similar bug report which leads me to believe that this issue is Ubuntu specific.

Changed in openoffice.org (Ubuntu):
status: New → Confirmed
Changed in openoffice.org (Ubuntu):
status: Confirmed → Won't Fix
Rolf Leggewie (r0lf) wrote :

Is it really too much to ask for a short explanation when a ticket or task is closed? -> bug 825837

I guess in this case it's easy enough to guess that only libreoffice will receive the fix, but why do we need to guess? And if the fix is easily applicable to the package in lucid, what's the rush to decide now NOT to fix it?

Does this bug still show up in libreoffice-3.4.3-1ubuntu2?
https://launchpad.net/ubuntu/+source/libreoffice/1:3.4.3-1ubuntu2

Changed in libreoffice (Ubuntu):
status: Triaged → Incomplete

[This is an automated message.]
There are no new official OpenOffice.org releases in Ubuntu packaging anymore => Won't Fix

If the problem persists, please mark this bug as "also affects project Libreoffice" or "also affects distribution Libreoffice (Ubuntu)" if that has not happened already.

Please leave references to upstream OpenOffice.org bugs in place to allow cross pollination.

I don't know. The system where I mostly do office type document work is still running Natty. Putting back to confirmed so it doesn't time out.

Changed in libreoffice (Ubuntu):
status: Incomplete → Confirmed
zmago (zmago-fluks) wrote :

And how long does it take otherwise to fix that bug? Because it's really nasty bug. Is not possible to work reliably with office suite in Ubuntu.

@zmago: You can help fixing this bug by providing the asked information:
 - Is there a reliable reproducable scenario ("sometimes crashes after hours" is neither reliable nor reproducible)
 - Does this still happen with 3.4.3?

Just to reiterate -- this needs a good reproducible scenario. I just failed to repoduce this with the available info:
According to reports so far , this should happen on 3.3.3 on natty when no Draw Window is open and the following steps are performed:
- Open a Writer document
- type some text
- leave windows alone for > 1hour
- return to window and type something again.

Shahar Or (mightyiam) wrote :

Next time I have this I try to understand if there's a pattern.

zmago (zmago-fluks) wrote :

Thank you. I will try to figure out the pattern. Same problem is on my netbook and desktop... on both i have ecryptfs turned on in user profile where crash is happening. But on some laptop I'm not using folder encryption so there is also no crashing. Temporary solution for me is chaging autosave to 5 min or even less. But I will tell you more when I will put attention to the crashing pattern. I'm not used of that community is listening you and that you also can help in some kind of a way :) So I will do what I can.

Cheers.

ricardo (rh-) wrote :

Hi,
I was strugeling with this bug over a year. What I figured out is that it always crashed when I had some documents open and I was not working with the libreoffice for some time. When I came back to write a word o to save the document o to paste information the libreoffice crashed.
I had the home-directory encrypted.
When the .libreoffice-directory is encrypted the libreoffice will often crash. So I moved this directory to a non-encryped directoy and there were no more crashes.
I had the crashes with all versions of libreoffice and openoffice 3.0 and higher. This was the version when y encryped my home-directory upgrading to ubuntu 10.04.
I hope this will help to fix this bug soon.
I guess there is a timeout while openoffice is waiting for some information from the encrypted config-directory. As the information is not coming on time the program crashes by whatever.

My solution was after waiting for over one year of a bugfix to decrypt my home-directory. This is what I did today. Hope that this will fix this bug for me.
Bye

Rolf Leggewie (r0lf) wrote :

As previously said, I am also affected by this bug. I do have an encrypted (ecryptfs) home partition, so this may indeed be necessary. For me, the crashes are easily reproducible.

Shahar Or (mightyiam) wrote :

I also have encrypted home.

indrek (indrek-seppo) wrote :

As is already mentioned in this thread it seems to be dependent on whether the computer is swapping. I only have the problem when my RAM is maxed out. I currently have plenty of RAM, thus it has happened rarely and only when I am doing something very RAM-intensive.

Evan Huus (eapache) wrote :

I've scanned through a whole bunch of the duplicates for this bug. Every single one of them has ecryptfs enabled, and several of them mention being low-ram at the time of the crash.

With those two requirements in mind, I'm currently creating a low-memory Natty VM that uses ecryptfs. Assuming I can easily reproduce the problem there, I'll upgrade the VM to Oneiric and try again.

I'll report any progress back here.

description: updated
Evan Huus (eapache) wrote :

Success! I can easily and quickly reproduce the problem in Natty. It isn't a time-related issue at all, simply resource-related.

Setup:
On Virtualbox 4.1.2, create a 32-bit Ubuntu VM with low memory (472MB in my case). Make sure your virtual hard-drive is at least 12GB so that Ubuntu allocates enough swap space to pull this off.
Install 11.04 from CD and install all updates, then reboot.
Vista-32 host if that makes any difference (which I doubt).

Steps to reproduce:
Open up LO writer, type a few words and save the file to disk. Leave the doc open, but minimized or otherwise inactive.
Open up every other app you can think of. System Monitor reported >200MB swapped for me.
Go back to LO and type a mis-spelled word, followed by a space.
Wait until the disk stops churning.
LO has crashed with this bug.

Once set up, it takes me less than ten minutes to produce another instance.

I stayed on the safe side and tried to replicate the most extreme of all reported conditions. It's likely that some of the steps I list are not actually required to reproduce this, but narrowing it down will take time.

I am currently upgrading the VM to oneiric to determine whether or not this is still an issue there.

I can easily reproduce this problem on my Toshiba L300D laptop. I am running Maverick with ecryptfs home directory. I just have to put the original 1GB RAM in, launch Ubuntu and OOo and then work with memory intensive applications. This causes swap to populate and then any action to any open OOo documents that are open will cause OOo to crash. It it much harder to force it to crash with 4GB of RAM so for now, that is my work-around. These crashes apparently affect both OOo and Libre the same.

SergeiS (sergei-redleafsoft) wrote :

Bjorn: this is definitely a ecryptfs related issue. The workaround I've found somewhere is to simply repoint HOME to an unincorporated directory like:

env HOME=/home/sergei/Unprotected libreoffice -calc

Which is not a fix but something to get by. OO and LO save all state there from that point, which might be a security risk.

Note to other users, in case you didn't realize, /home/sergei/Unprotected is a link to a directory outside of my encrypted home directory. You can make one by:

sudo mkdir /home/Unprotected
sudo chown your_login:your_login /home/Unprotected
ln -s /home/Unprotected ~

SergeiS (sergei-redleafsoft) wrote :

Sorry, not "unincorporated", but "unencrypted". Spellchecker's playing smart.

Evan Huus (eapache) wrote :

Something I missed in my previous comment: I selected the encrypted home option while installing Ubuntu in the VM.

Shahar Or (mightyiam) wrote :

Excellent work, folks!

Changed in ecryptfs-utils (Ubuntu):
importance: Undecided → High
status: New → Confirmed

Ok, it tried again to reproduce this:
- Installed amd64 desktop natty in VirtualBox with little memory (512MB)
- Selected an encrypted home
- Fresh boot
- open Writer
- type some words
- open nautilus on another workspace
- return to Writer after >1 hours
=> no crash though here.

It would be great, if those experiencing the error could provide logfiles from /var/log to identify what is going wrong with ecrytfs.

Closing as Invalid in Libreoffice for now, as the root cause seems to be in ecryptfs. Please reopen, if Libreoffice is really at fault here.
Setting to confirmed/High in ecryptfs because of the number of dupes.

Changed in libreoffice (Ubuntu):
status: Confirmed → Invalid
Changed in df-libreoffice:
status: New → Invalid

Björn Michaelsen (bjoern-michaelsen), I was able to easily reproduce this in a high utilization native Natty environment with a blank Calc file. The only thing in /var/log was the following in kern.log:

Sep 21 12:37:54 monikerpc kernel: [906854.884810] soffice.bin[9674]: segfault at 100000 ip 00100000 sp bfdbf9fc error 4 in libm-2.13.so[110000+24000]

No other information in any of the other logs about encryptfs or soffice.

Changed in libreoffice (Ubuntu):
status: Invalid → Confirmed
zmago (zmago-fluks) wrote :

Exactly same is with my computers. segfault always appears in kernel log when libre office crashes. But hard is to find extra info which would be helpful to recognize cause for this problem. Any idea how to investigate even deeper?

zmago (zmago-fluks) wrote :

Well.. now when libre office crashed in log file was this:

soffice.bin[9804] general protection ip:7f73a09f9c09 sp:7fffb8f5f4a0 error:0 in libuno_cppuhelpergcc3.so.3[7f73a09db000+c8000]

Is little bit different than before when most of time i saw segfault.

How did this happen? i have 3 gb of ram. At the moment 1,2 gb used , 13,1 mb swaped. Before i was running KVM virtual machine which forced OS to swap. In this moment i was coping one big file from one folder to other... so disk was busy. Office crashed after 1 minute of usage.

I hope Microsoft did not inject this bug in libre office that would force us to buy MS office :)

cheers

Ian! D. Allen (idallen) wrote :

From: https://bugs.launchpad.net/ecryptfs/+bug/509180

Linux linux 2.6.38-11-generic #47-Ubuntu SMP Fri Jul 15 19:27:09 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

Description: Ubuntu 11.04
Codename: natty

After months of kernel and application crashes and corrupt files, I
was ready to abandon Ubuntu 11.04 and go to some other distribution.
Before doing that, I wrote a script to repeatedly md5sum the files in
my ecryptfs directory and compare the results. I went single-user to
ensure that nothing else was running and, sure enough, the script showed
that md5sums changed randomly on files that I never touched.

I went looking today and found this launchpad entry indicating that you
have known for 18 months (since 9.10!) that ecryptfs is broken and is
not suitable for production use.

Why didn't you disable it in recent releases? My 11.04 install offered
to encrypt my home directory during installation, yet you've known for
18 months that to do so would corrupt my system and crash it.

I am frustrated that nobody took prompt action to disable the use of
ecryptfs and to notify those of us using it.

zmago (zmago-fluks) wrote :

That's very interesting. Thank you!
Today when my office crashed I've had this new kernel update already which includes also this fix...

I forgot to add this to my post before:
Linux Kangaroo 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:17:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

The bug you're ranting about is fixed in 11.04, so that's not it.

Scott, is that fix in base 11.04? I have been away from my pc for a few days and my 11.04 ubuntu was still crashing libre office before this.

I believe that people see a crash when the system is inactive for a while due to inactive libre office apps being swapped out. High ram usage just accelerates the issue appearing.

This is the only app I have seen that seems to crash routinely due to ecryptfs, so I still thing partial blame (or poor exception handling) should be attributed to the open office stack.

Since I'm one of the contributors who has worked hard to try and identify the encrypted home issue and has stuck with these apps longer than necessary (I'm using google docs now as I can at least write a two page doc without 5 crashes) I would prefer you avoid just calling this a rant.

Anything I can do to help (simple instructions) would be useful. Debugging to this point has been impossible for me.
Phil

(sorry for any typos. Native English speaker using an iPhone. Another platform I plan to abandon due to forced obsolecence soon)

The particular bug (the ecryptfs one) you referred to is marked fixed. That
doesn't guarantee there aren't other issues with similar symptoms.

The ecryptfs bug wasn't fixed in Ubuntu 11.04 with kernel 2.6.38-11-generic x86_64 #47-Ubuntu in August.
I'll run my ecryptfs read-only tests again now using the current Ubuntu kernel I have, which is:
Linux linux 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:17:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

Lets reiterate the current status:
- This only happens on an machine under heavy load with data on ecryptfs
- the stacktrace show:
  a) this to happen at various locations (the stacktraces are different).
  b) even more important: some of the stacktraces show things, that cant possibly be, if the underlying system works correct.
See the linked upstream discussion of this between me and Caolán McNamara.
Quote:
"The bug report's stack is from svx autocorrect, which means that ucbhelper::cancelCommandExecution and cppu::throwException have successfully thrown exceptions at least a hundred times or so before the crash, so its not the case that it's e.g. the first throw or two through the uno bridge."
this means that the following:
"With an eip of 0x100000 (in the i386 bug reports) meaning the call goes into nirvana."
cant really happen unless there is some serious memory corruption around (or the stacktraces themselves are wrong, however in general, they look sane). The eip == 0x100000 is in a lot of stacktraces here while it should be a random (and different) number (pointing to the created ExceptionThrower object) instead.

As of now, there is _no_ indication that Libreoffice itself is at fault. it might demonstrate the problem more clearly, because:
a) it is such a big project.
b) such crashers appear to be random noise in other projects.

@penalvch: I am not very happy about the setting of the bug state to confirmed again without discussion. My change was quite intentional given the above evidence. At least it needs to be set back to "Incomplete" until something shows Libreoffice to be at fault in this IMHO. To prevent a bug-state-war I will ask bugcontrol to have a look at it and leave state as is for now.

As an additional note: All of the stacktraces are related to exception handling -- while not being a simple uncaught exception flying through -- the exception in the spell checking stacktrace with cancelCommandExecution() in it would have been caught at:

#12 0x00b42e2c in IsDocument () from /usr/lib/libreoffice/program/../basis-link/program/libsvlli.so
http://cgit.freedesktop.org/libreoffice/libs-gui/tree/svl/source/misc/fstathelper.cxx?h=libreoffice-3-4#n74

latest.

Ian! D. Allen (idallen) wrote :

My testing is showing that the ecryptfs bug is still there in a
fully-updated Ubuntu 11.04 on x86_64 with this kernel:

Linux linux 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:17:25 UTC 2011
x86_64 x86_64 x86_64 GNU/Linux

Running md5sum repeatedly over the same 17GB ecryptfs directory gives
different md5sums on some of the files on some of the runs.

Another fact: I am using an OCZ SSD for this file system.

Download full text (4.7 KiB)

Ian,
Out of interest, are the checksums that are different on zero byte files? I saw that hinted elsewhere.

Phil

On Sep 23, 2011, at 1:58 AM, "Ian! D. Allen" <email address hidden> wrote:

> My testing is showing that the ecryptfs bug is still there in a
> fully-updated Ubuntu 11.04 on x86_64 with this kernel:
>
> Linux linux 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:17:25 UTC 2011
> x86_64 x86_64 x86_64 GNU/Linux
>
> Running md5sum repeatedly over the same 17GB ecryptfs directory gives
> different md5sums on some of the files on some of the runs.
>
> Another fact: I am using an OCZ SSD for this file system.
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (633423).
> https://bugs.launchpad.net/bugs/745836
>
> Title:
> soffice.bin crashed with SIGSEGV in cppu::throwException()
>
> Status in LibreOffice Productivity Suite:
> Invalid
> Status in “ecryptfs-utils” package in Ubuntu:
> Confirmed
> Status in “libreoffice” package in Ubuntu:
> Confirmed
> Status in “openoffice.org” package in Ubuntu:
> Won't Fix
>
> Bug description:
> Binary package hint: libreoffice
>
> 1) lsb_release -rd
> Description: Ubuntu 11.04
> Release: 11.04
>
> 2) apt-cache policy libreoffice-calc
> libreoffice-calc:
> Installed: 1:3.3.3-1ubuntu2
> Candidate: 1:3.3.3-1ubuntu2
> Version table:
> *** 1:3.3.3-1ubuntu2 0
> 100 /var/lib/dpkg/status
> 1:3.3.2-1ubuntu5 0
> 500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386 Packages
> 1:3.3.2-1ubuntu4 0
> 500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages
>
> apt-cache policy libreoffice-writer
> libreoffice-writer:
> Installed: 1:3.3.3-1ubuntu2
> Candidate: 1:3.3.3-1ubuntu2
> Version table:
> *** 1:3.3.3-1ubuntu2 0
> 100 /var/lib/dpkg/status
> 1:3.3.2-1ubuntu5 0
> 500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386 Packages
> 1:3.3.2-1ubuntu4 0
> 500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages
>
> 3) What is expected to happen in a KDE Natty in a KDE session with the
> KDE integration active or GNOME is a Writer or Calc file untouched for
> a long period of time (ex. 1 hour+) is when one tries to edit it, the
> application does not crash.
>
> 4) What happens instead is it crashes. This is highly correlated to
> both EcryptfsInUse and resource constrained (Memory & CPU >> 50%)
> environments. Occurs with:
>
> + Intel drivers, Compiz not enabled, Writer open only bug 745836
> + binary ATI drivers, Compiz enabled, Calc open only bug 799047
>
> WORKAROUND: Use Gnumeric.
>
> apt-cache policy gnumeric
> gnumeric:
> Installed: 1.10.13-1ubuntu1
> Candidate: 1.10.13-1ubuntu1
> Version table:
> *** 1.10.13-1ubuntu1 0
> 500 http://us.archive.ubuntu.com/ubuntu/ natty/universe i386 Packages
> 100 /var/lib/dpkg/status
>
> ProblemType: Crash
> DistroRelease: Ubuntu 11.04
> Package: libreoffice-core 1:3.3.2-1ubuntu2
> ProcVersionSignature: Ubuntu 2.6.38-7.39-generic 2.6.38
> Uname: Linux 2.6.38-7-generic i686
> Architecture: i386
> Date: Wed Mar 30...

Read more...

I've just had a crash and I thought maybe it is this one or could help with this one. It is reported as Bug #857212 .

The corrupted ip in a lot of the stacktraces and the dependence on swapping makes it likely to be caused by the encrypted swap (which is enabled by default with encrypted home since 9.10). If that is broken it corrupts the memory image of Libreoffice an there is absolutely nothing LO can or should do about it.

Could those who see the issue (I still cant reproduce) retry it with an encrypted home, but unencrypted swap?

Ok, I was able to reproduce this here now. It is _not_ needed to be inactive on LO to reproduce this. What I did was:
- Create a VM with small RAM
- enable encrypted home (and thus also encrypted swap)
- reboot at least once (I might have missed that last time, it might be needed to get encrypted swap too)
- Open an empty Calc document
- minimize Calc
- Open Firefox
- open 10-20 tabs with huge images
- while it is swapping like mad, return to LO => pretty reliable crash

Download full text (14.5 KiB)

On Fri, Sep 23, 2011 at 08:35:55AM -0000, Phil Ayres wrote:
> Out of interest, are the checksums that are different on zero byte
> files? I saw that hinted elsewhere.

No. No zero size files. All are files with sizes from 125 to 133478
bytes (24-280 blocks). One unusual common factor is that all the files
whose checksums change have between 2 and 186 hard links. No single-link
files.

Below is the raw data collected from my tests today. I run an enhanced
md5sum on the directory and save the results. (Enhanced means the md5sum
is saved along with a Perl "stat" of the file - see "man perlfunc"
for the order of the 13 fields from "stat".) I then loop, running
the enhanced md5sum and comparing the new results with the saved.
When something differs, I output info about the saved and the new.
So what you see below is pairs of lines where the inode numbers (third
field) are the same between runs but the md5sums differ.

Not every run produces differences - it's about one run in four. A run
with differences usually shows only one or two differences, but one run
today turned up eight.

Most of the differences in md5sums are unique (only happened to that
inode once during my half-day of testing), but six inodes appear twice
and one appears three times (inode 5260333, the one with 186 hard links).

1f9656d3d4e6379cd02c81d6b5125a41 25 5125149 33188 2 777 777 0 875 1316720060 1001274528 1286771949 4096 24
4c19712378cc9074f5aa01c16a53dd8f 25 5125149 33188 2 777 777 0 875 1316720060 1001274528 1286771949 4096 24
464025ab8ecca844ef14927d2ebbfe7f 25 5124704 33188 2 777 777 0 147 1316720060 938293572 1286771949 4096 24
7806d1a26b5ce00b3079d97c29ff2bd6 25 5124704 33188 2 777 777 0 147 1316720060 938293572 1286771949 4096 24
19d7c14415e33af9cbd0e70fbb002e77 25 6685993 33056 2 777 777 0 10240 1316720036 1200422921 1286771775 4096 40
83a70208705f5934366b108aff9d3116 25 6685993 33056 2 777 777 0 10240 1316720036 1200422921 1286771775 4096 40
7c6351e26037a45d2279be4ffb4b6563 25 6686006 33056 2 777 777 0 10240 1316720036 1204740421 1286771775 4096 40
b7f958f96a2ec4a4835ed29fd2bbf70e 25 6686006 33056 2 777 777 0 10240 1316720036 1204740421 1286771775 4096 40
20df6ab0a39207fe7049ede1ea685f75 25 6686420 33056 2 777 777 0 10240 1316720036 1201029149 1286771775 4096 40
d222ff46ee1f1e489391e53c89f36909 25 6686420 33056 2 777 777 0 10240 1316720036 1201029149 1286771775 4096 40
22bfb8c1dd94b5f3813a2b25da67463f 25 5260333 33188 186 777 777 0 220 1316719636 1268985590 1306955878 4096 24
7faace8dee9768922d3dacc09723d1ae 25 5260333 33188 186 777 777 0 220 1316719636 1268985590 1306955878 4096 24
22bfb8c1dd94b5f3813a2b25da67463f 25 5260333 33188 186 777 777 0 220 1316719636 1268985590 1306955878 4096 24
7faace8dee9768922d3dacc09723d1ae 25 5260333 33188 186 777 777 0 220 1316719636 1268985590 1306955878 4096 24
260c0685b36144d0cad219a98a03079d 25 5124701 33188 2 777 777 0 125 1316720060 938295176 1286771948 4096 24
8e342f7915ece5de1b6a1924a2a00470 25 5124701 33188 2 777 777 0 125 1316720060 938295176 1286771948 4096 24
4ca1bba41e08525aa560ed866e51c24c 25 1705464 33248 2 777 777 0 12478 1316719610 1037991716 1286771882 4096 48
602838f84c4304c81e73066f47cd1b5d 25 1705464 33...

Nice work! Did you try with unencrypted swap? It could be totally unrelated to the ecryptfs.

Trying with disabled swap (but encrypted home), the issue vanishes:
- sudo bash
- swapoff -a
- cryptsetup remove /dev/mapper/cryptswap1
- fdisk -l /dev/sda <- find your swappartition there
- mkswap /dev/sda$SWAPPARTITION
- swapon -a

So encrypted swap corrupts the stack/heap of the application.

Swapping is not an userspace concern thus Libreoffice is _not_ at fault here. Closing as invalid in Libreoffice again. This does NOT mean that Libreoffice is not affected by this -- it means it cant be fixed there because it is not its fault.

summary: - soffice.bin crashed with SIGSEGV in cppu::throwException()
+ ecryptfs encrypted swap corrupts application stack/heap [was:
+ soffice.bin SIGSEGV cppu::throwException()]
Changed in libreoffice (Ubuntu):
status: Confirmed → Invalid

@Ian: As the bug you are seeing is about encrypted /home (or fs in general), could you file a different bug on this, so we can keep this bug for encrypted swap breaking applications?

Changed in ecryptfs-utils (Ubuntu):
importance: High → Critical

Eh, in the instructions in https://bugs.launchpad.net/ubuntu/+source/libreoffice/+bug/745836/comments/60 I missed:
- edit fstab
between mkswap and swapon -a.

Andy Whitcroft (apw) wrote :

If this is indeed triggered by encrypting swap then this would not be ecryptsfs by dmcrypt (even though setup by selecting ecryptfs home) and therefore a kernel issue.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 745836

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
summary: - ecryptfs encrypted swap corrupts application stack/heap [was:
- soffice.bin SIGSEGV cppu::throwException()]
+ encrypted swap corrupts application stack/heap [was: soffice.bin SIGSEGV
+ cppu::throwException()]
Dustin Kirkland  (kirkland) wrote :

Ian,

Can you give the output of:

 $ cat /proc/swaps
 $ free
 $ ls -alF /dev/mapper/cryptswap*
 $ grep swap /etc/fstab
 $ grep swap /etc/crypttab

Thanks.

Changed in ecryptfs-utils (Ubuntu Oneiric):
milestone: none → ubuntu-11.10
Tyler Hicks (tyhicks) on 2011-09-23
Changed in ecryptfs-utils (Ubuntu Oneiric):
assignee: nobody → Tyler Hicks (tyhicks)
Tyler Hicks (tyhicks) wrote :

I can reproduce the general protection fault mentioned in comment #44 pretty easily.

I ftraced the eCryptfs code while triggering the crash and don't see much going on other than a lookup(), the eCryptfs inode initialization functions being called, and the a call to ecryptfs_readpage(). This reminded me of an upstream race condition that I fixed not too long ago. I'm backporting those patches to the natty kernel to give it a shot.

The patches are already in the oneiric kernel so if someone has successfully reproduced this on oneiric, please speak up.

Cant reproduce so far on oneiric beta2 with encrypted swap and home.

Ian! D. Allen (idallen) wrote :

@Björn - yes, after this final comment I'll move to a different ecryptfs bug report (or file a new one if it doesn't exist yet).

@Dustin - here are your answers. I'm only using an ecryptfs test directory (reading only; no writing) that used to
be my HOME directory before I abandoned using ecryptfs. I'm not using encrypted swap.

$ uname -a
Linux linux 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:17:25 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/swaps
Filename Type Size Used Priority
/dev/sda2 partition 8499196 594192 -1

$ free
             total used free shared buffers cached
Mem: 8194980 7502036 692944 0 358220 5106780
-/+ buffers/cache: 2037036 6157944
Swap: 8499196 594188 7905008

$ ls -alF /dev/mapper/cryptswap*
[not using cryptswap]

$ grep swap /etc/fstab
UUID=02d21245-167c-48ab-811a-77b0ed45ca0e none swap sw 0 0

$ grep swap /etc/crypttab
[not using cryptswap]

Download full text (4.9 KiB)

I don't have the encrypted swap, only the home directory (the default Ubuntu
installer does it). Both OO and LO crash unless I overwrite HOME environment
to point to an unencrypted directory.
--
Regards, Sergei Serdyuk

RedLeaf Software LLC
web: http://redleafsoft.com
email: <email address hidden>
phone: 802-7350730

On Fri, Sep 23, 2011 at 8:30 AM, Andy Whitcroft <email address hidden> wrote:

> If this is indeed triggered by encrypting swap then this would not be
> ecryptsfs by dmcrypt (even though setup by selecting ecryptfs home) and
> therefore a kernel issue.
>
> ** Also affects: linux (Ubuntu)
> Importance: Undecided
> Status: New
>
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (579966).
> https://bugs.launchpad.net/bugs/745836
>
> Title:
> ecryptfs encrypted swap corrupts application stack/heap [was:
> soffice.bin SIGSEGV cppu::throwException()]
>
> Status in LibreOffice Productivity Suite:
> Invalid
> Status in “ecryptfs-utils” package in Ubuntu:
> Confirmed
> Status in “libreoffice” package in Ubuntu:
> Invalid
> Status in “linux” package in Ubuntu:
> Incomplete
> Status in “openoffice.org” package in Ubuntu:
> Won't Fix
>
> Bug description:
> Binary package hint: libreoffice
>
> 1) lsb_release -rd
> Description: Ubuntu 11.04
> Release: 11.04
>
> 2) apt-cache policy libreoffice-calc
> libreoffice-calc:
> Installed: 1:3.3.3-1ubuntu2
> Candidate: 1:3.3.3-1ubuntu2
> Version table:
> *** 1:3.3.3-1ubuntu2 0
> 100 /var/lib/dpkg/status
> 1:3.3.2-1ubuntu5 0
> 500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386
> Packages
> 1:3.3.2-1ubuntu4 0
> 500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages
>
> apt-cache policy libreoffice-writer
> libreoffice-writer:
> Installed: 1:3.3.3-1ubuntu2
> Candidate: 1:3.3.3-1ubuntu2
> Version table:
> *** 1:3.3.3-1ubuntu2 0
> 100 /var/lib/dpkg/status
> 1:3.3.2-1ubuntu5 0
> 500 http://us.archive.ubuntu.com/ubuntu/ natty-updates/main i386
> Packages
> 1:3.3.2-1ubuntu4 0
> 500 http://us.archive.ubuntu.com/ubuntu/ natty/main i386 Packages
>
> 3) What is expected to happen in a KDE Natty in a KDE session with the
> KDE integration active or GNOME is a Writer or Calc file untouched for
> a long period of time (ex. 1 hour+) is when one tries to edit it, the
> application does not crash.
>
> 4) What happens instead is it crashes. This is highly correlated to
> both EcryptfsInUse and resource constrained (Memory & CPU >> 50%)
> environments. Occurs with:
>
> + Intel drivers, Compiz not enabled, Writer open only bug 745836
> + binary ATI drivers, Compiz enabled, Calc open only bug 799047
>
> WORKAROUND: Use Gnumeric.
>
> apt-cache policy gnumeric
> gnumeric:
> Installed: 1.10.13-1ubuntu1
> Candidate: 1.10.13-1ubuntu1
> Version table:
> *** 1.10.13-1ubuntu1 0
> 500 http://us.archive.ubuntu.com/ubuntu/ natty/universe i386
> Packages
> 100 /var/lib/dpkg/status
>
> ProblemType: Crash
> DistroRelease: Ubuntu 11.04
> Package: libreoffice-core 1:3.3.2-1ubuntu2
> ProcVersionSignat...

Read more...

not really an libreoffice bug, but an issue with encrypted home/swap.

*** Bug 35424 has been marked as a duplicate of this bug. ***

Changed in df-libreoffice:
importance: Undecided → Unknown
status: Invalid → Unknown

*** Bug 33025 has been marked as a duplicate of this bug. ***

*** Bug 40766 has been marked as a duplicate of this bug. ***

Changed in df-libreoffice:
importance: Unknown → Critical
status: Unknown → Won't Fix
Luca Clivio (lucr) wrote :

hello all, here is my "solution".

i have been struggling for several months with the same problem, my system is a common ubuntu 11.04 with a default encrypted home and thus an encrypted swap:
luca@idle:~$ uname -a
Linux idle 2.6.38-11-generic #50-Ubuntu SMP Mon Sep 12 21:18:14 UTC 2011 i686 i686 i386 GNU/Linux

i have just 1gb of ram, so a continue swap is normal for me with several applications open.
libreoffice used to crash continuously after few minutes of idle periods or even few minutes i wasn't use it when open on a document, or a presentation, or a database, or a spreadsheet. every time i used to leave libreoffice opened few minutes without any activity on it it crashed, 100% of times.

now, following the suggestion of andrei but in a simpler way i tried to move outside the encrypted area just the personal configuration settings folder of libreoffice, this way:

(i am in my home folder)
luca@idle:~$ sudo mkdir /home/unencryted
luca@idle:~$ sudo chown luca.luca /home/unencrypted
luca@idle:~$ mv .libreoffice /home/unencrypted/
luca@idle:~$ ln -s /home/unencrypted/.libreoffice .

i don't need to set the HOME env in that unencrypted folder before starting libreoffice, what is above is everything i made.

now the problem seems definitely gone away, it doesn't seem to crash anymore

i hope this may help someone.
luca

Luca Clivio (lucr) wrote :

sorry, sergei, not andrei
also sorry, the first command contains a missing char, the correct is:

luca@idle:~$ sudo mkdir /home/unencrypted

cheers
luca

...
> now, following the suggestion of andrei but in a simpler way i tried to
> move outside the encrypted area just the personal configuration settings
> folder of libreoffice ...

I just did this and at least on a brief trial is seems to have cured a large
number of issues I was having with LO. In addition to this crash bug I've
been seeing severe performance issues on Natty with LO and on large documents
I've had trouble with my GPU getting wedged (I've not filed bugs on these as I
didn't figure a way to file them in a useful way). On a brief test, with this
directory moved out of the encrypted home directory things are much more
responsive.

Scott Kitterman (kitterman) wrote :

I've just upgraded my laptop to oneiric and this, on first inspection, seem to
be much better if not fixed.

Rolf Leggewie (r0lf) wrote :

To me it sounds a lot as if the kernel patches that Tyler mentioned earlier to address race conditions are the fix for this issue. They've apparently already landed in oneiric. Can we look into identifying and backporting them, please?

Changed in linux (Ubuntu Oneiric):
status: Incomplete → Fix Released
Rolf Leggewie (r0lf) wrote :

Scott, can you please nominate for natty, maverick and lucid against the kernel? Seems like I'm not allowed to.

Scott Kitterman (kitterman) wrote :

This was originally report on natty. It's still there. It went away when I upgraded to oneiric.

Changed in libreoffice (Ubuntu Maverick):
status: New → Invalid
Changed in libreoffice (Ubuntu Natty):
status: New → Invalid
Changed in linux (Ubuntu Natty):
status: New → Confirmed
importance: Undecided → High
Changed in openoffice.org (Ubuntu Natty):
status: New → Won't Fix
Changed in openoffice.org (Ubuntu Maverick):
status: New → Won't Fix
Rolf Leggewie (r0lf) wrote :

Thanks, please don't forget lucid.

Yaron Sheffer (yaronf) wrote :

I implemented the workaround in #71 and things were better at first (thanks Luca!). But today, after a suspend/resume, Writer went Boom as soon as I touched it. (I was using a feature - Track Changes - for the first time, which may have something to do with it). So as far as I'm concerned this is not a complete solution.

bug 825161 and bug 807759 are also dupes of this.

tags: added: rls-mgr-o-tracking
Dustin Kirkland  (kirkland) wrote :

Marking the ecryptfs-utils tasks invalid, as this turned out to be a kernel issue

Changed in ecryptfs-utils (Ubuntu Oneiric):
status: Confirmed → Invalid
Changed in ecryptfs-utils (Ubuntu Natty):
status: New → Invalid
Changed in ecryptfs-utils (Ubuntu Maverick):
status: New → Invalid
description: updated
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu Maverick):
status: New → Confirmed
Rolf Leggewie (r0lf) wrote :

OK, according to the feedback here this does seem to be fixed in oneiric, great! But let's not get back to business just quite, yet. This is a very visible and highly annoying bug that affects a core package in three releases including the latest LTS. We need to identify the patch that fixed this and backport it.

Tyler? Dustin? Andy?

Changed in linux (Ubuntu Maverick):
importance: Undecided → High
milestone: none → maverick-updates
Changed in linux (Ubuntu Natty):
milestone: none → natty-updates
Rolf Leggewie (r0lf) wrote :

I encourage all people from the duplicates to visit https://bugs.launchpad.net/ubuntu/natty/+source/linux/+bug/745836/+affectsmetoo and indicate that they are affected as well. This should hopefully increase priority with the devs, too.

Tyler Hicks (tyhicks) wrote :

On 2011-09-30 00:23:31, Rolf Leggewie wrote:
> This is a very visible and highly annoying bug that affects a core
> package in three releases including the latest LTS. We need to identify
> the patch that fixed this and backport it.

I agree.

I've got a feeling that the fix for this is upstream commit
3b06b3ebf44170c90c893c6c80916db6e922b9f2 but when I wrote that patch it
depended on several other patches that aren't in the natty (or older)
kernels.

I started working on a backport, but it was getting too large. I've got
to rethink the approach and come up with a simpler fix that is suitable
for a backport.

Tyler Hicks (tyhicks) wrote :

This turned out to be a tricky one. It is definitely an eCryptfs bug. The upstream fix that I thought would solve this issue ended up not being the right fix. Instead, it turned out to be the following two commits:

bd4f0fe8bb7c73c738e1e11bc90d6e2cf9c6e20e
fed8859b3ab94274c986cbdf7d27130e0545f02c

However, I didn't write those patches as bug fixes. I was simply cleaning out some crufty looking code. It turned out to be buggy code, too.

Creating a file, extending that file, the file's pages being reclaimed, finally followed by reading the file is what triggers this. In the case of this bug report, the system being under memory pressure is what forced the file's pages out of the page cache.

The easiest way to reproduce the bug is with the following shell commands:

$ touch foo && truncate -s 4096 foo && sync && echo 1 | sudo tee /proc/sys/vm/drop_caches && hexdump -C foo

hexdump should show a file filled with zeroes, but it doesn't.

Data corruption is a possibility if the file is written to before the eCryptfs directory is unmounted.

It looks like all kernels before 2.6.39 are affected, possibly all the way back to the beginning of eCryptfs being merged upstream. Patch, with all the technical eCryptfs details in the commit message, to follow...

Shahar Or (mightyiam) wrote :

Wow, Tyler.

Good job. Can't wait for the fix release.

zmago (zmago-fluks) wrote :

Thank you Tyler! So this topic is now going to be solved finally?

omarly666 (omarly666) wrote :

nice if it's "fixed" :)
think i got this again when trying import from my sdcard into Shotwell yesterday.

drop me a line at 48045666@.... if you still want some output from my laptop

Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Natty):
assignee: nobody → Tyler Hicks (tyhicks)
status: Confirmed → In Progress
Tim Gardner (timg-tpi) wrote :

Tyler - this patch fixes the corruption caused by the reproducer in #86 using a 2.6.38 kernel. However, I'm not able to reproduce this using a 2.6.35 kernel. Do you have any advice as to whether this patch is really applicable on kernels older then 2.6.38 ?

Herton R. Krzesinski (herton) wrote :

This bug is awaiting verification that the kernel for Natty in -proposed solves the problem (2.6.38-13.52). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-natty' to 'verification-done-natty'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-natty
Tim Gardner (timg-tpi) wrote :

I'm clearing the Natty verification tag as I was able to reproduce the corruption and verify that this patch fixes the symptoms caused by the reproducer.

tags: added: verification-done-natty
removed: verification-needed-natty
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 2.6.38-13.52

---------------
linux (2.6.38-13.52) natty-proposed; urgency=low

  [Herton R. Krzesinski]

  * Release Tracking Bug
    - LP: #887379

  [ Konrad Rzeszutek Wilk ]

  * SAUCE: x86/paravirt: Partially revert "remove lazy mode in interrupts"
    - LP: #854050

  [ Ming Lei ]

  * SAUCE: [media] uvcvideo: Set alternate setting 0 on resume if the bus
    has been reset
    - LP: #816484

  [ Seth Forshee ]

  * SAUCE: acer-wmi: Add wireless quirk for Lenovo 3000 N200
    - LP: #857297

  [ Upstream Kernel Changes ]

  * Make TASKSTATS require root access, CVE-2011-2494
    - LP: #866021
    - CVE-2011-2494
  * proc: restrict access to /proc/PID/io, CVE-2011-2495
    - LP: #866025
    - CVE-2011-2495
  * proc: fix a race in do_io_accounting(), CVE-2011-2495
    - LP: #866025
    - CVE-2011-2495
  * staging: comedi: fix infoleak to userspace, CVE-2011-2909
    - LP: #869261
    - CVE-2011-2909
  * perf tools: do not look at ./config for configuration, CVE-2011-2905
    - LP: #869259
    - CVE-2011-2905
  * e1000e: workaround for packet drop on 82579 at 100Mbps
    - LP: #870127
  * eCryptfs: Remove unnecessary grow_file() function
    - LP: #745836
  * eCryptfs: Remove ECRYPTFS_NEW_FILE crypt stat flag
    - LP: #745836
  * block: blkdev_get() should access ->bd_disk only after success
    - LP: #857170
  * ipv6: restore correct ECN handling on TCP xmit
    - LP: #872179
  * nl80211: fix overflow in ssid_len - CVE-2011-2517
    - LP: #869245
    - CVE-2011-2517
  * ksm: fix NULL pointer dereference in scan_get_next_rmap_item() -
    CVE-2011-2183
    - LP: #869227
    - CVE-2011-2183
  * NLM: Don't hang forever on NLM unlock requests - CVE-2011-2491
    - LP: #869237
    - CVE-2011-2491
  * KVM: fix kvmclock regression due to missing clock update
    - LP: #795717
  * drm/i915: don't enable plane, pipe and PLL prematurely
    - LP: #812638
  * drm/i915: add pipe/plane enable/disable functions
    - LP: #812638
 -- Herton Ronaldo Krzesinski <email address hidden> Mon, 07 Nov 2011 22:11:51 -0200

Changed in linux (Ubuntu Natty):
status: In Progress → Fix Released
Colin Ian King (colin-king) wrote :

SRU justification for Lucid:

Impact:

The ECRYPTFS_NEW_FILE crypt_stat flag is set upon creation of a new
eCryptfs file. When the flag is set, eCryptfs reads directly from the
lower filesystem when bringing a page up to date. This means that no
offset translation (for the eCryptfs file metadata in the lower file)
and no decryption is performed. The flag is cleared just before the
first write is completed (at the beginning of ecryptfs_write_begin()).

It was discovered that if a new file was created and then extended with
truncate, the ECRYPTFS_NEW_FILE flag was not cleared. If pages
corresponding to this file are ever reclaimed, any subsequent reads
would result in userspace seeing eCryptfs file metadata and encrypted
file contents instead of the expected decrypted file contents.

Data corruption is possible if the file is written to before the
eCryptfs directory is unmounted. The data written will be copied into
pages which have been read directly from the lower file rather than
zeroed pages, as would be expected after extending the file with
truncate.

Fix: Clear the ECRYPTFS_NEW_FILE flags if set. Fix was originally from
Tyler Hicks and needed a little massaging to apply for the current Lucid,
see https://launchpadlibrarian.net/82254993/0001-eCryptfs-Clear-ECRYPTFS_NEW_FILE-flag-during-truncat.patch

Testcase:

foo && truncate -s 4096 foo && sync && echo 1 | sudo tee /proc/sys/vm/drop_caches && hexdump -C foo

and hexdump should show a file filled with zeroes. Without the fix the file
is full of garbage, whereas with the fix the file is full of zeros as
expected.

Andy Whitcroft (apw) on 2012-03-16
Changed in ecryptfs-utils (Ubuntu Lucid):
status: New → Invalid
Changed in libreoffice (Ubuntu Lucid):
status: New → Invalid
Changed in linux (Ubuntu Lucid):
status: New → Confirmed
assignee: nobody → Colin King (colin-king)
importance: Undecided → High
Rolf Leggewie (r0lf) on 2012-03-16
Changed in openoffice.org (Ubuntu Lucid):
status: New → Won't Fix
Andy Whitcroft (apw) on 2012-03-16
Changed in openoffice.org (Ubuntu Lucid):
status: Won't Fix → Invalid
no longer affects: libreoffice (Ubuntu)
Colin Ian King (colin-king) wrote :

+ SRU for Maverick too.

Tim Gardner (timg-tpi) on 2012-03-16
Changed in linux (Ubuntu Maverick):
status: Confirmed → Fix Committed
Colin Ian King (colin-king) wrote :

Tested and verified for 2.6.35-32.68 -proposed with ext2, ext3, ext4, xfs and btrfs lower

tags: added: verification-done-maverick
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel for Lucid in -proposed solves the problem (2.6.32-41.88). Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-lucid' to 'verification-done-lucid'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-lucid
Colin Ian King (colin-king) wrote :

verified on lucid 2.6.32-41.88 -proposed with ext2, ext3, ext4, xfs lower file systems

tags: added: verification-done-lucid
removed: verification-needed-lucid
Rolf Leggewie (r0lf) wrote :

I can also verify that lucid 2.6.32-41.88 from lucid-proposed fixes the issue. Please release. Thank you.

Launchpad Janitor (janitor) wrote :
Download full text (4.8 KiB)

This bug was fixed in the package linux - 2.6.32-41.88

---------------
linux (2.6.32-41.88) lucid-proposed; urgency=low

  [Luis Henriques]

  * Release Tracking Bug
    - LP: #966443

  [ Andy Whitcroft ]

  * [Config] restore build-% shortcut

  [ Tim Gardner ]

  * SAUCE: ubuntu drivers: use UMH_WAIT_PROC consistently
    - LP: #963685

  [ Upstream Kernel Changes ]

  * Revert "Revert "USB: xhci - fix unsafe macro definitions""
    - LP: #948139
  * Revert "Revert "USB: xhci - fix math in xhci_get_endpoint_interval()""
    - LP: #948139
  * Revert "Revert "xhci: Fix full speed bInterval encoding.""
    - LP: #948139
  * bsg: fix sysfs link remove warning
    - LP: #946928
  * hwmon: (f75375s) Fix bit shifting in f75375_write16
    - LP: #948139
  * lib: proportion: lower PROP_MAX_SHIFT to 32 on 64-bit kernel
    - LP: #948139
  * relay: prevent integer overflow in relay_open()
    - LP: #948139
  * mac80211: timeout a single frame in the rx reorder buffer
    - LP: #948139
  * kernel.h: fix wrong usage of __ratelimit()
    - LP: #948139
  * printk_ratelimited(): fix uninitialized spinlock
    - LP: #948139
  * hwmon: (f75375s) Fix automatic pwm mode setting for F75373 & F75375
    - LP: #948139
  * crypto: sha512 - Use binary and instead of modulus
    - LP: #948139
  * crypto: sha512 - Avoid stack bloat on i386
    - LP: #948139
  * crypto: sha512 - use standard ror64()
    - LP: #948139
  * SCSI: 3w-9xxx fix bug in sgl loading
    - LP: #948139
  * ARM: 7321/1: cache-v7: Disable preemption when reading CCSIDR
    - LP: #948139
  * ARM: 7325/1: fix v7 boot with lockdep enabled
    - LP: #948139
  * USB: Added Kamstrup VID/PIDs to cp210x serial driver.
    - LP: #948139
  * USB: Fix handoff when BIOS disables host PCI device.
    - LP: #948139
  * xhci: Fix encoding for HS bulk/control NAK rate.
    - LP: #948139
  * hdpvr: fix race conditon during start of streaming
    - LP: #948139
  * cdrom: use copy_to_user() without the underscores
    - LP: #948139
  * autofs: work around unhappy compat problem on x86-64
    - LP: #948139
  * Fix autofs compile without CONFIG_COMPAT
    - LP: #948139
  * compat: fix compile breakage on s390
    - LP: #948139
  * PM: Print a warning if firmware is requested when tasks are frozen
    - LP: #948139
  * firmware loader: allow builtin firmware load even if usermodehelper is
    disabled
    - LP: #948139
  * PM / Sleep: Fix freezer failures due to racy
    usermodehelper_is_disabled()
    - LP: #948139
  * PM / Sleep: Fix read_unlock_usermodehelper() call.
    - LP: #948139
  * Linux 2.6.32.58
    - LP: #948139
  * regset: Prevent null pointer reference on readonly regsets
    - LP: #949905
    - CVE-2012-1097
  * regset: Return -EFAULT, not -EIO, on host-side memory fault
    - LP: #949905
    - CVE-2012-1097
  * KVM: Remove ability to assign a device without iommu support
    - LP: #897812
    - CVE-2011-4347
  * eCryptfs: Copy up lower inode attrs after setting lower xattr
  * eCryptfs: Improve statfs reporting
    - LP: #885744
  * drm/i915: no lvds quirk for AOpen MP45
    - LP: #955078
  * drm/radeon/kms: fix MSI re-arm on rv370+
    - LP: #955078
  * Linux 2.6.32.58+drm33.24
    - LP: #955078
  ...

Read more...

Changed in linux (Ubuntu Lucid):
status: Confirmed → Fix Released
JC Hulce (soaringsky) wrote :

This bug affects Ubuntu 10.10, Maverick Meerkat. Maverick has reached end-of-life and is no longer supported, so I am closing the bugtask for Maverick. Please upgrade to a newer version of Ubuntu.
More information here: https://lists.ubuntu.com/archives/ubuntu-announce/2012-April/000158.html

Changed in linux (Ubuntu Maverick):
status: Fix Committed → Invalid
To post a comment you must log in.