Canonical System Image

Bug #1472755
Comment #20

Comment 20 for bug 1472755

Revision history for this message

Michi Henning (michihenning) wrote on 2015-11-25:

#20

I've spent a lot of hours on this so far, without getting closer :-(

Up front, Timo, if there are shell crashes, that's almost certainly an unrelated problem. I don't see (even in theory) how a crashing scope could be responsible for that. For starters, the scope is a separate unrelated process. When a scope crashes in the middle of executing a query, the shell eventually gets a "query failed" message. But, in this case, the scope crashes *after* it has returned from main. The crash is caused by the C++ runtime clean-up, which uses an atexit handler to invoke global destructors. So, by the time the telegram scope dies, whatever queries (if any) have completed already.

As to the crash, I had a look at the code and can't spot anything off-hand that looks wrong. I also built the scope for the desktop and tinkered with it. A simple run under valgrind comes up clean. However, that really doesn't prove anything because, for the scope to do something interesting, it needs a DB with some data in it, by the looks of things. So, my test didn't exercise much of the code.

As far as I can see, there no unit tests of any kind for the scope, so it's difficult to exercise the scope and see whether valgrind comes up with anything. Michael, I need some help getting the scope to run on the desktop with representative data so I can instrument it and see whether something comes up.

I have stared at the unity-scopes code for hours and I can't find anything that looks fishy. I'm positive that I'm finalising the boost::log machinery correctly.

For what it's worth, the crash happens when boost::log destroys the global singleton core instance. (That's a global in boost::log, not in unity-scopes; there are no globals in unity-scopes.) All of the crashes are in the exact same place. Basically, core is a class that stores a pimpl, and the destructor just calls "delete pimpl; pimpl = nullptr;" The call to delete causes the crash.

The corrupt linked list message is emitted by glib whenever any sort of memory corruption is detected. This could be a double-free, or it could be because something has crapped over the heap earlier. It's not possible to tell with the information we have.

One thing that strikes me is that this crash happens only for the telegram scope, and we are not seeing it elsewhere. It's got to be something that is specific to the telegram scope. (This doesn't mean that the telegram scope is buggy; it could, for example, be caused by different ordering of global destructors due to the order in which things are linked.)

There is a similar known issue with boost log due to global destructor ordering: http://www.boost.org/doc/libs/1_55_0/libs/log/doc/html/log/rationale/why_crash_on_term.html

However, I don't think this the what's happening to us because, for that issue, the stack trace should end somewhere inside the locale conversion code.

One thought: I could put together a PPA with a hacked-up registry that sets MALLOC_CHECK_=2 just for the telegram scope when it is run. The hope is that this might give us a crash that's more informative. We could even release with this hack. After all, the scope is crashing as is; if we are lucky, it'll crash a little sooner.

I've spent a lot of hours on this so far, without getting closer :-(

I have stared at the unity-scopes code for hours and I can't find anything that looks fishy. I'm positive that I'm finalising the boost::log machinery correctly.

There is a similar known issue with boost log due to global destructor ordering: http://www.boost.org/doc/libs/1_55_0/libs/log/doc/html/log/rationale/why_crash_on_term.html

However, I don't think this the what's happening to us because, for that issue, the stack trace should end somewhere inside the locale conversion code.