corrupted double-linked list probably cause by telegram scope

Bug #1472755 reported by errors.ubuntu.com bug bridge
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical System Image
Fix Released
High
Alejandro J. Cura
libqtelegram (Ubuntu)
Incomplete
High
Unassigned
unity-scope-mediascanner (Ubuntu)
Invalid
High
Alejandro J. Cura
unity-scopes-api (Ubuntu)
Fix Released
High
Michi Henning
unity-scopes-api (Ubuntu RTM)
Fix Released
Undecided
Unassigned

Bug Description

The Ubuntu Error Tracker has been receiving reports about a problem regarding unity-scope-mediascanner. This problem was most recently seen with version 1.7.16, the problem page at https://errors.ubuntu.com/problem/21d9e7ddf91a26b21abfb2758315ad41fcfd3fa9 contains more details.

"/usr/lib/arm-linux-gnueabihf/unity-scopes/scoperunner:*** Error in `/usr/lib/arm-linux-gnueabihf/unity-scopes/scoperunner': corrupted double-linked list: ADDR ***"

Tags: vivid wily

Related branches

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in unity-scope-mediascanner (Ubuntu):
status: New → Confirmed
Changed in unity-scope-mediascanner (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Pat McGowan (pat-mcgowan) wrote :

This is reported to happen at boot and a symptom is that the today scope is not initialized with the current date
These reports coincide with the release of ota4

Changed in unity-scope-mediascanner (Ubuntu):
assignee: nobody → Alejandro J. Cura (alecu)
importance: Medium → High
Changed in canonical-devices-system-image:
importance: Undecided → High
milestone: none → ww34-2015
status: New → Confirmed
Revision history for this message
Alejandro J. Cura (alecu) wrote :

Looking at the error tracker, all examples of the crash seem to come from the telegram scope. I'm reassigning there.

Changed in unity-scope-mediascanner (Ubuntu):
status: Confirmed → Invalid
Revision history for this message
Alejandro J. Cura (alecu) wrote :

dobey found that some of those crashes are coming from other scopes too (dashboard, video aggregator).
I'm moving the bug to scopes-api

Changed in unity-scopes-api (Ubuntu):
importance: Undecided → Critical
importance: Critical → High
status: New → Confirmed
Changed in canonical-devices-system-image:
assignee: nobody → Alejandro J. Cura (alecu)
Changed in unity-scopes-api (Ubuntu):
assignee: nobody → Michi Henning (michihenning)
Revision history for this message
Alejandro J. Cura (alecu) wrote :

Most of the errors are for the telegram scope, but here are a few that dobey found:

14:38:35 <dobey> https://errors.ubuntu.com/oops/d7fb3514-31ed-11e5-9eb1-fa163e525ba7 <- this one is dashboard
14:39:00 <dobey> https://errors.ubuntu.com/oops/36ab2c3c-3229-11e5-a54e-fa163e373683 is videoaggregator

Revision history for this message
Michi Henning (michihenning) wrote :

The stack trace indicates that the crash happens after main() completes, when global objects are destroy via at_exit().
The crash happens inside the destructor for boost::log::core. (Boost log uses a global singleton instance.) However, I suspect that this is co-incidental. Most likely, memory has been corrupted.

Unfortunately, valgrind does not work for arm:

$ valgrind /usr/lib/arm-linux-gnueabihf/unity-scopes/scoperunner
...
disInstr(arm): unhandled instruction: 0xEC510F1E
                 cond=14(0xE) 27:20=197(0xC5) 4:4=1 3:0=14(0xE)
==27306== valgrind: Unrecognised instruction at address 0x57f4ec8.
==27306== at 0x57F4EC8: ??? (in /lib/arm-linux-gnueabihf/libcrypto.so.1.0.0)

I've run a bunch of scopes on the desktop using valgrind on the scoperunner (and therefore the scopes). It reports no errors or leaked memory.

I strongly suspect that a bug in the scope itself is responsible. If anyone can find a way to reproduce this problem, that would be immensely useful. Running the telegram scope under valgrind might yield some clues.

Changed in libqtelegram (Ubuntu):
importance: Undecided → High
Changed in libqtelegram (Ubuntu):
assignee: nobody → Michał Karnicki (karni)
Bill Filler (bfiller)
Changed in canonical-devices-system-image:
milestone: ww34-2015 → ww40-2015
Changed in unity-scopes-api (Ubuntu):
status: Confirmed → Triaged
status: Triaged → Confirmed
Revision history for this message
Michał Karnicki (karni) wrote :

I believe I may have found the source of the problem. When the scope is aggregated, the author assumed they can access results[0], which may not be true if:
1) the user is signed in
2) but has no messages whatsoever

That'd mean the scope would crash for all users, who have set up Telegram account, but not really used it.

Michi, if you have time, perhaps you could verify my finding. In the mean time, I'm rewriting the scope source as the backing database schema has changed (after we moved to making use of TelegramQML plugin).

Changed in libqtelegram (Ubuntu):
status: New → Triaged
Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

I definitely did have https://errors.ubuntu.com/oops/0bb39a78-1c08-11e5-9c34-fa163e78b027 when I already had Telegram discussions.

Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

Some more recent that may be related (but are incomplete as reports):
https://errors.ubuntu.com/oops/72595200-5186-11e5-a598-fa163e5bb1a2
https://errors.ubuntu.com/oops/73424db6-5186-11e5-8e22-fa163e22e467

And a more recent full report:
https://errors.ubuntu.com/oops/8cea3adc-5567-11e5-a560-fa163e707a72

If it's about "no messages", it could be related to the fact that the app shows "No messages" initially at times when it's launched, before it populates with all the content there is.

Changed in canonical-devices-system-image:
milestone: ww40-2015 → ww46-2015
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

And another one from today while switching between scopes:
https://errors.ubuntu.com/oops/6fc6ca80-78ce-11e5-b660-fa163e707a72

Revision history for this message
Michi Henning (michihenning) wrote :

As I suggested previously, this is memory corruption. The "corrupted double-linked list" message shows up whenever glibc detects memory corruption. The fact that this always shows up with the telegram scope is a strong suggestion that the problem is caused by the telegram scope. (We regularly run scopes-api tests with valgrind, as well as various demo scopes, and they always come up clean. I'm almost certain that the problem is not in scopes-api.)

One thing that might help is to run with env var MALLOC_CHECK_=2. This forces an abort as soon as the corruption is detected, rather than waiting until something falls over the corrupted memory region.

Closing this for scopes-api.

Changed in unity-scopes-api (Ubuntu):
status: Confirmed → Invalid
Changed in canonical-devices-system-image:
assignee: Alejandro J. Cura (alecu) → Yuan-Chen Cheng (ycheng-twn)
description: updated
summary: - /usr/lib/arm-linux-gnueabihf/unity-scopes/scoperunner:*** Error in
- `/usr/lib/arm-linux-gnueabihf/unity-scopes/scoperunner': corrupted
- double-linked list: ADDR ***
+ corrupted double-linked list probably cause by telegram scope
Changed in canonical-devices-system-image:
status: Confirmed → Triaged
Changed in canonical-devices-system-image:
milestone: ww46-2015 → ww02-2016
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

errors.u.c is still reporting a significant number of crashes with telegram 2.0.4

Revision history for this message
Michał Karnicki (karni) wrote :

Since Jean-Baptiste raised this affects Telegram 2.0.4.* as well, perhaps we could get some fellow developers to review the code, as it seems we're out of luck trying to address this, and it's not really much code in the query.cpp

http://bazaar.launchpad.net/~libqtelegram-team/telegram-app/telegram/files/head:/telegram/scope/

Either I'm doing something wrong, or the problem is not within Telegram. (Yes, I have read all the previous comments.)

Revision history for this message
Michi Henning (michihenning) wrote :

Did you run the code under valgrind or with MALLOC_CHECK_=2 ?

Is there any way for me to reproduce the issue? I'd like to help, but it's difficult without any more info.

Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

Got this on OTA-8 / Telegram 2.

Revision history for this message
Michi Henning (michihenning) wrote :

Timo, any idea what it was you did to make the problem show up?

Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

@Michi it happens when I have the Telegram setting enabled in Today scope, but otherwise I don't know. I didn't do anything Telegram related if I recall correctly, but Unity 8 just crashed and it was apparently caused by this scoperunner crash.

I used to disable the setting but enabled it again after OTA-8 + Telegram 2 upgrades to see if it still happens. It then didn't happen for 3 days or so but finally did.

Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

Note that this particular crash was traceable to the https://errors.ubuntu.com/problem/21d9e7ddf91a26b21abfb2758315ad41fcfd3fa9 problem, but I also had already one other unrelated unity8 crash with OTA-8 that lacks "Disassembly" field and thus there's no 'problem' page for that. So there are definitely other shell crashers too.

Revision history for this message
Michał Karnicki (karni) wrote :

Hi guys, sorry for late reply. I've been pulled to Tg unrelated assignment and was neck down in documentation. First of all, thank you for your input, I appreciate this wasn't just left here.

Michi, I'm sorry, but I have not had chance to run this with valgrind on the desktop. I'll have to refresh my memory how to run this on desktop, I recall it was fairly easy. How do I test a scope in aggregation, should I run the Today scope with the Telegram scope installed on the system?

Regarding what Timo mentioned - I also read a while back disabling Telegram in Today scope "worked around" the problem.

Revision history for this message
Michi Henning (michihenning) wrote :
Download full text (3.1 KiB)

I've spent a lot of hours on this so far, without getting closer :-(

Up front, Timo, if there are shell crashes, that's almost certainly an unrelated problem. I don't see (even in theory) how a crashing scope could be responsible for that. For starters, the scope is a separate unrelated process. When a scope crashes in the middle of executing a query, the shell eventually gets a "query failed" message. But, in this case, the scope crashes *after* it has returned from main. The crash is caused by the C++ runtime clean-up, which uses an atexit handler to invoke global destructors. So, by the time the telegram scope dies, whatever queries (if any) have completed already.

As to the crash, I had a look at the code and can't spot anything off-hand that looks wrong. I also built the scope for the desktop and tinkered with it. A simple run under valgrind comes up clean. However, that really doesn't prove anything because, for the scope to do something interesting, it needs a DB with some data in it, by the looks of things. So, my test didn't exercise much of the code.

As far as I can see, there no unit tests of any kind for the scope, so it's difficult to exercise the scope and see whether valgrind comes up with anything. Michael, I need some help getting the scope to run on the desktop with representative data so I can instrument it and see whether something comes up.

I have stared at the unity-scopes code for hours and I can't find anything that looks fishy. I'm positive that I'm finalising the boost::log machinery correctly.

For what it's worth, the crash happens when boost::log destroys the global singleton core instance. (That's a global in boost::log, not in unity-scopes; there are no globals in unity-scopes.) All of the crashes are in the exact same place. Basically, core is a class that stores a pimpl, and the destructor just calls "delete pimpl; pimpl = nullptr;" The call to delete causes the crash.

The corrupt linked list message is emitted by glib whenever any sort of memory corruption is detected. This could be a double-free, or it could be because something has crapped over the heap earlier. It's not possible to tell with the information we have.

One thing that strikes me is that this crash happens only for the telegram scope, and we are not seeing it elsewhere. It's got to be something that is specific to the telegram scope. (This doesn't mean that the telegram scope is buggy; it could, for example, be caused by different ordering of global destructors due to the order in which things are linked.)

There is a similar known issue with boost log due to global destructor ordering: http://www.boost.org/doc/libs/1_55_0/libs/log/doc/html/log/rationale/why_crash_on_term.html

However, I don't think this the what's happening to us because, for that issue, the stack trace should end somewhere inside the locale conversion code.

One thought: I could put together a PPA with a hacked-up registry that sets MALLOC_CHECK_=2 just for the telegram scope when it is run. The hope is that this might give us a crash that's more informative. We could even release with this hack. After all, the scope is crashing as is; if we are lucky, it'll crash a l...

Read more...

Revision history for this message
Michał Karnicki (karni) wrote :

You're a rockstar Michi, thank you for all this.

We don't have mock data for this purpose, but you could run Telegram off this branch (README.md will quickly tell you how to, basically ./setup.sh -t desktop -d && ./setup.sh -t desktop -b will download deps and build it for you). The branch is a quite fresh contribution, but should at least allow you to sign in to Telegram, which would populate the database with your current Tg data. In turn, the scope should surface this data, and hopefully be more representative of a more natural use case (as opposed to being empty).

If you decide to create this ppa, I could promote it in the Telegram feedback group, I'm sure we would have a bunch of helpful folks keen to install this to help pinpoint the problem.

Revision history for this message
Michi Henning (michihenning) wrote :

Hi Michael, which branch were you referring to?

The more I look, the more I'm sure that the problem isn't in the telegram scope or scopes API, but probably due to the use of the checked singleton in boost::log. I'm fresh out of ideas as to how to fix it at the moment :-(

Revision history for this message
Michi Henning (michihenning) wrote :

BTW, I haven't read enough of the code to know... Is there any chance that one of the libraries that the telegram scope calls into uses boost::log?

Revision history for this message
Michał Karnicki (karni) wrote :

Replied on IRC, for record purposes:

Sorry I only now noticed your comments, I forgot to leave the link :/
lp:~libqtelegram-team/telegram-app/telegram2-all-platforms

I will verify the external library for use of boost.

Revision history for this message
Michał Karnicki (karni) wrote :

Talked on IRC, but for the record:

While the app uses two dependencies, these don't use boost - and the scope uses none of those altogether. The scope's pure Qt talking to an sqlite db and surfacing results to scopes-api, not much going on in there, so at least on that end it's not related to boost::log.

Revision history for this message
Michi Henning (michihenning) wrote :

Looking through the unresolved externals in libsqlite.so, it doesn't appear to be using boost::log either, so that's not it.

Revision history for this message
Michi Henning (michihenning) wrote :

OK, I think I finally found it. There are some dummy loggers in the run time that I added because some of the unit tests run without a fully initialized run time (which provides access to the logger), but still need a working logger. For these tests, a global dummy instance is used. It's the clean-up of the dummy instance that causes the problem because, depending on global destructor ordering, the instance may have been destroyed at the time the atexit handler tries to finalize it.

Not sure what the best fix is yet. Working on it...

Changed in unity-scopes-api (Ubuntu):
status: Invalid → In Progress
Changed in libqtelegram (Ubuntu):
status: Triaged → Invalid
Changed in canonical-devices-system-image:
assignee: Yuan-Chen Cheng (ycheng-twn) → Michi Henning (michihenning)
Revision history for this message
Michał Karnicki (karni) wrote :

Wow, great catch Michi!!

Revision history for this message
Michi Henning (michihenning) wrote :

Michał, I've built a branch that I believe will fix things in silo 26.

Have you at all been able to make the crash happen on your phone? (As far as I know, it only happens on Arm, but I haven't been able to reproduce.) If you can reproduce the problem, could you please try with the PPA in silo 26 and let me know how it went?

Revision history for this message
Michał Karnicki (karni) wrote :

Hi Michi. I myself have seen it rarely, but I've asked Timo and in the folks in Telegram Feedback group to give this a try, as Timo seemed he was able to reproduce this quite easily. I'll install it too, and pay attention if I notice anything.

Revision history for this message
Michał Karnicki (karni) wrote :

My /var/crash is still empty, although I can't say I've done much testing beyond browsing the scopes and refreshing them now and then. (The bug was hard to reproduce for me earlier in general.) I'll repeat my request for other folks on Telegram feedback group to give this a try if they can.

Revision history for this message
Michi Henning (michihenning) wrote :

Hi Michał, thanks for trying.

Unfortunately, it's a partially decidable problem. If you don't see the bug, that doesn't mean that the bug is fixed :-(
All we can be sure of is that, if you still do see the bug, it isn't fixed…

I have good reason to believe that the code in the PPA gets rid of the problem. But, seeing that I never managed to reproduce myself, I'm looking for more data.

Timo, if you could give this a spin too, I'd appreciate it!

Revision history for this message
Michał Karnicki (karni) wrote :

Michi, this needs no explanation, I completely agree with every sentence you said. Same goal, just trying to get more data :)

Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

I tried switching to the new libunity-scopes3 from the PPA but Today scope would not load anymore its contents after a reboot. Reverting back to the original one fixed the problem.

Note that this is my main phone and I don't want to modify the filesystem in any abnormal way. Therefore my method is:
- Check which packages from a PPA are actually installed - in this case, libunity-scopes3 only
- Unpack both the old and new .deb and do diff:s of them - files scoperegistry, smartscopesproxy, scoperunner, libunity-scopes.so.1.0.2, version in their respective directories were changed
- Do old and new tarballs of changed contents
- Remount / as rw only for the moment of unpacking either the new or old tarball
- Remount / as ro again

It took me 3 days of normal use after OTA-8 upgrade (with Telegram in Today scope enabled right after upgrade) before I hit this, so it's not that easy to reproduce.

Revision history for this message
Michi Henning (michihenning) wrote :

Timo, thanks for trying!

I normally just do a citrain device-upgrade to install a PPA, and the right magic happens automatically.

I haven't tried with this particular PPA on my Nexus. I can try tomorrow, but don't expect anything unusual.

Changed in canonical-devices-system-image:
status: Triaged → In Progress
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package unity-scopes-api - 1.0.2+16.04.20151218.2-0ubuntu1

---------------
unity-scopes-api (1.0.2+16.04.20151218.2-0ubuntu1) xenial; urgency=medium

  [ Michi Henning ]
  * Changed version number generation to use a common script. Removed
    symbols files because we are now using abi-compliance-checker.

  * Replaced global dummy loggers for testing with heap-allocated
    instances to avoid crash due to global destructor ordering (LP: #1472755).

  [ CI Train Bot ]
  * No-change rebuild.

 -- Pawel Stolowski <email address hidden> Fri, 18 Dec 2015 11:42:03 +0000

Changed in unity-scopes-api (Ubuntu):
status: In Progress → Fix Released
Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

unity-scopes-api (1.0.2+15.04.20151218.2-0ubuntu1) vivid; urgency=medium

  [ Michi Henning ]
  * Changed version number generation to use a common script. Removed
    symbols files because we are now using abi-compliance-checker.

  * Replaced global dummy loggers for testing with heap-allocated
    instances to avoid crash due to global destructor ordering (LP: #1472755).

  [ CI Train Bot ]
  * No-change rebuild.

Changed in unity-scopes-api (Ubuntu RTM):
status: New → Fix Released
Changed in canonical-devices-system-image:
status: In Progress → Fix Committed
Revision history for this message
Jean-Baptiste Lallement (jibel) wrote :

I am reopening this report. It happened again today with:
current build number: 210
device name: arale
channel: ubuntu-touch/rc-proposed/meizu.en

https://errors.ubuntu.com/oops/540179b8-b3ba-11e5-b371-fa163e4aaad4

Changed in canonical-devices-system-image:
status: Fix Committed → In Progress
Changed in unity-scopes-api (Ubuntu):
status: Fix Released → In Progress
Changed in canonical-devices-system-image:
milestone: ww02-2016 → ww08-2016
Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

Still happened on OTA-9, although only once for me so far according to error logs (Feb 1th) even though I've kept Telegram enabled in the Today scope for the whole time.

Revision history for this message
Michi Henning (michihenning) wrote :

Yes, I saw. It's caused by a global singleton object in boost::log. The crash happens due to ordering issues with global destructors. The only known fix would be to hold a boost::log instance in scope for the duration of main(). However, that's not possible because we have custom scope runners for Go and JS, for one.

boost::log is garbage and completely unusable in libraries without messing with the global program state. I'm going to rip it out, but haven't found the time to do this yet.

For what it's worth, the crash happens after returning from main() so, as far as the user is concerned, there is no problem. But it still needs doing (and the wasted bandwidth and battery for the crash dump are not nice).

Revision history for this message
Pete Woods (pete-woods) wrote :

log4cxx is what I've used before, works just like the extremely popular log4j, and is used by a wide variety of projects, not suffering from stupid initialisation issues like google logging.

Revision history for this message
unity-api-1-bot (unity-api-1-bot) wrote :

Thanks for the heads-up! log4cxx is not in main, and I've never used it. The question is whether we should risk trying this out, or just run with something really simple based on what Rodney knocked up some time ago. I'm just trying to minimise risk here. There are zero function point in this exercise for us :-(

Changed in unity-scopes-api (Ubuntu):
status: In Progress → Fix Committed
kevin gunn (kgunn72)
Changed in canonical-devices-system-image:
status: In Progress → Fix Committed
assignee: Michi Henning (michihenning) → Alejandro J. Cura (alecu)
Changed in unity-scopes-api (Ubuntu):
status: Fix Committed → Fix Released
Changed in canonical-devices-system-image:
status: Fix Committed → Fix Released
Revision history for this message
Timo Jyrinki (timo-jyrinki) wrote :

This continues to happen, tens of reports each day, so it'd be useful to keep the bug open somehow regardless if we know where the actual problem lies or not.

Changed in unity-scopes-api (Ubuntu):
status: Fix Released → Confirmed
Changed in unity-scopes-api (Ubuntu RTM):
status: Fix Released → Confirmed
Changed in libqtelegram (Ubuntu):
status: Invalid → Incomplete
assignee: Michał Karnicki (karni) → nobody
Changed in unity-scopes-api (Ubuntu):
status: Confirmed → Fix Released
Changed in unity-scopes-api (Ubuntu RTM):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.