stale lock files freeze apps

Bug #1576989 reported by Michael Zanetti on 2016-04-30
112
This bug affects 22 people
Affects Status Importance Assigned to Milestone
Canonical System Image
High
Zoltan Balogh
qtbase-opensource-src (Ubuntu)
Undecided
Michael Zanetti
qtbase-opensource-src (Ubuntu RTM)
High
Timo Jyrinki

Bug Description

Debugging why the Notes app would freeze for some people, I found there were some stale .lock files around. Those .lock files are autocreated by QSettings (with QLockFile) when this file is edited. Problem is, that it sometimes seems to happen that the app is suspended by our lifecycle and then killed (either by closing from spread or OOM killed) and those .lock files stick around. I have implemented a workaround in the notes app[1] to clear up those stale locks when the app starts app, however, today I ran twice in a row into the issue that the very same happened to music app. It would just freeze on startup and never come back. Digging around a bit I found such a stale lock file on its config file which is created with the QML Settings element. This implies that every app using QSettings or QML Settings {} are potentially affected. I suspect many more app freezes can be traced down to this.

Now, there is some code in Qt to detect stale lock files [2], however, not sure why it doesn't detect those lock files as stale ones and decides to wait on them forever though. Probably a bug in the code or some of the wrong #ifdefs are triggered for our package builds.

[1] https://code.launchpad.net/~mzanetti/reminders-app/remove-stale-lock-files/+merge/293355
[2] http://code.qt.io/cgit/qt/qtbase.git/tree/src/corelib/io/qlockfile_unix.cpp

affects: canonical-devices-system-image → qtbase-opensource-src (Ubuntu)
description: updated
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in qtbase-opensource-src (Ubuntu):
status: New → Confirmed
Victor Thompson (vthompson) wrote :

Do we want affected apps to possibly implement your workaround as a stopgap until the real solution is implemented? If so, could you provide the original workaround?

Michael Zanetti (mzanetti) wrote :

I've linked the workaround in the description. At this point however, I would not yet say all apps should work around it themselves.

description: updated

Sorry for not answering to the bug https://bugs.launchpad.net/music-app/+bug/1556376 in time.

I can confirm that removing a stale

 /home/phablet/.config/com.ubuntu.music/com.ubuntu.music.conf.lock

solves the issues i had with the music app.

Thanks for the work and yay i have music again :-)

The stale file was there since september 2015 so i hab around six month no music while commuting:

phablet@ubuntu-phablet:~/.config/com.ubuntu.music$ ls -lart
total 12
-rw-rw-r-- 1 phablet phablet 102 Sep 4 2015 com.ubuntu.music.conf
-rw-r--r-- 1 phablet phablet 0 Sep 4 2015 com.ubuntu.music.conf.lock
drwxrwxr-x 2 phablet phablet 4096 Sep 4 2015 .
drwx------ 47 phablet phablet 4096 Apr 7 08:47 ..
phablet@ubuntu-phablet:~/.config/com.ubuntu.music$

Andrea Bernabei (faenil) wrote :

I had the same problem with the Telegram app, it wasn't starting, I just got blank screen after running the app.

Michael suggested looking for .lock files...
turns out I had .config/com.ubuntu.telegram/<number>/auth.lock

after removing that file Telegram app works again :)

Changed in canonical-devices-system-image:
assignee: nobody → Zoltan Balogh (bzoltan)
importance: Undecided → Critical
milestone: none → 11
status: New → Confirmed
Timo Jyrinki (timo-jyrinki) wrote :

Note that to OTA-7 we fixed one stale process lock files bug #1500444.

That was the 2015-05-26 commit from http://code.qt.io/cgit/qt/qtbase.git/log/src/corelib/io/qlockfile_unix.cpp

The only other patch that could sound worthwhile would be the syncing-to-disk commit http://code.qt.io/cgit/qt/qtbase.git/commit/src/corelib/io/qlockfile_unix.cpp?id=4accd865c24b7ed918cb705913478bab5aeb5e6e , and it does state "to avoid leaving an empty lock file on crash".. maybe on kill too? https://bugreports.qt.io/browse/QTBUG-44771

Timo Jyrinki (timo-jyrinki) wrote :

Ticket https://requests.ci-train.ubuntu.com/#/ticket/1368 created, a vivid build will be available later there.

Timo Jyrinki (timo-jyrinki) wrote :

Testing would be welcome at the silo, which has been in built state since Wednesday.

Changed in canonical-devices-system-image:
milestone: 11 → 12
Timo Jyrinki (timo-jyrinki) wrote :

QA managed to reproduce it still. However, from going through the git still, there is a untried combination of patches that we haven't yet backported so I will rebuild the silo.

Michael Zanetti (mzanetti) wrote :

tbh I was a bit sceptical about that previous patch anyways. If I read it correctly, it only tried to not leave stale lock files in some circumstances. Problem with our case is that it might happen while the app is suspended and then OOM killed which means there is no chance for the Qt code to clean it up and it will leave a stale lock file, no matter how clever it tries to be in cleaning them up.

What I think needs to happen is that when the app starts up again (or tries to obtain a lock) it needs to be clever enough to determine that this lock is a stale one from a previous and ignore it.

Timo Jyrinki (timo-jyrinki) wrote :

So this _might_ help: https://codereview.qt-project.org/#/c/110346/. The patch is in upstream 5.4 branch but not yet in our 5.4.1 based package. From bug comments "I was myself bitten by the bug. After a crash, plasma would not start anymore because of an empty kglobalshortcutsrc.lock. Hence extra motivation to fix the bug". I also added https://codereview.qt-project.org/#/c/107600/ to the mix.

It's in the 5.4.1+dfsg-2ubuntu11~vivid10~3 build (armhf build not yet complete).

Michael Zanetti (mzanetti) wrote :

This indeed looks much more like what we need. Definitely worth a try, yes.

Michael Zanetti (mzanetti) wrote :

Here's a click package which I think allows reproducing the issue:

http://notyetthere.org/data/locked-crasher.mzanetti_0.1_armhf.click

*IMPORTANT NOTE*
I *think* this faked lock files are exactly like the stale ones we see, however, as I don't have one handy right now, I can't guarantee it's exactly the same. If I come by such a real lock file I'll compare them and make sure this app produces the exact same. For now I think we can assume if a patch fixes this app, it will fix the issue.

Changed in qtbase-opensource-src (Ubuntu RTM):
assignee: nobody → Timo Jyrinki (timo-jyrinki)
importance: Undecided → High
status: New → Confirmed
Changed in canonical-devices-system-image:
importance: Critical → High
milestone: 12 → 13
Pat McGowan (pat-mcgowan) wrote :

Using silo 009 and the test app, the app hangs when pressing edit config

Timo Jyrinki (timo-jyrinki) wrote :

I've updated the silo to sync with another upload. I get same behavior with and without the silo:

1. On first run, edit config does not hang.
2. When pressing Fake stale lock file, the subsequent edit config selections do hang.
3. When the app does not start (after fake stale lock file), the second attempt to start the app does work.
4. Removing .config/locked-crasher.mzanetti/ restores the behavior to 1.

I've pinged Michael yesterday but no comment yet. I wouldn't land the changes if there's no demonstrated improvement.

Timo Jyrinki (timo-jyrinki) wrote :

This won't be fixed for OTA-13 because of the uncertainty with the patches even though the silo is there. More likely is to have a final fix on the new LTS Qt basis.

Changed in canonical-devices-system-image:
milestone: 13 → xenial
Timo Jyrinki (timo-jyrinki) wrote :

Michael later confirmed my findings that there's no certainty in the patches fixing the issue or not causing new issues, partially because the problem is so hard to reproduce reliably in the real use case.

Pat McGowan (pat-mcgowan) wrote :

I just ran into this with Contacts. The lock file was created in April but removing it restored the app. Seems it got corrupted on a recent run. I was testing startup time and opening and closing the apps.

Do we have any line on a fix for this? or is it no longer present for 5.6?

Antti Kaijanmäki (kaijanmaki) wrote :

This also affect i-network (bug #1615474).

Looking at the upstream bug https://bugreports.qt.io/browse/QTBUG-44771 this has been fixed in Qt 5.5.

Antti Kaijanmäki (kaijanmaki) wrote :

Oh, and this should be "easily" reproduced by first creating a QSettings INI file, then filling up the hard drive and then trying to write to the settings.

Timo Jyrinki (timo-jyrinki) wrote :

All the fixes we had picked for https://bileto.ubuntu.com/#/ticket/1411 for backporting 5.4 are in 5.5 and 5.6, but we didn't and don't have a good test case to prove if they fix our problems or not.

For reference, the patches we tried to pick but decided not to land for potential risks were:
QLockFile-Avoid-zero-sized-lock-file-on-write-error.patch
QLockFile-fix-deadlock-when-the-lock-file-is-corrupt.patch
QLockFile-sync-to-disk-to-avoid-leaving-an-empty-loc.patch

They do sound like they should be fixing at least part of the problems, and there were one or two more still in the 5.5/5.6 branches to the related files.

Pat McGowan (pat-mcgowan) wrote :

That was already tested and did not fix the problem per comment #15

Timo Jyrinki (timo-jyrinki) wrote :

Right, that test case could be tested now on xenial+overlay, even though it doesn't necessarily reflect what happens in real life use case.

Changed in canonical-devices-system-image:
milestone: x1 → 14
Timo Jyrinki (timo-jyrinki) wrote :

On staging armhf (the click test app is armhf) it's still pretty much as described in comment #15. It is possible to get the "edit comnfig" to hang, and after the fake stale lock file one will need a second start attempt so that app really starts.

So this is with Qt 5.6.1 which includes all the patches tried before and some more. It probably fixes Antti's issue and some others, but not Michael's test app at least.

Maybe the test case could be submitted upstream stating it'd be nice to get a fix to 5.6 series? (giving partial assigning to Michael to note this) There are also no newer commits that seem meaningful to qlockfile.cpp or qlockfile_unix.cpp, so it should be reproducable on even dev.

Changed in qtbase-opensource-src (Ubuntu):
assignee: nobody → Michael Zanetti (mzanetti)
Changed in canonical-devices-system-image:
milestone: 14 → x1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers