Random-seeming crashes in trunk

Bug #1690519 reported by GunChleoc
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
widelands
Fix Released
Critical
Unassigned

Bug Description

We have been getting crashes recently while testing branches that don't seem to be related to the branches. It must be some form of memory problem, maybe in the graphics system.

Crash attached to this post reported in https://code.launchpad.net/~widelands-dev/widelands/rendertarget_ints/+merge/322996/comments/847068

Tags: crash

Related branches

Revision history for this message
GunChleoc (gunchleoc) wrote :
Revision history for this message
GunChleoc (gunchleoc) wrote :
Revision history for this message
GunChleoc (gunchleoc) wrote :

Crash reported in https://bugs.launchpad.net/widelands/+bug/1687542/comments/5

This one seems somewhat different.

Revision history for this message
kaputtnik (franku) wrote :

Should we add each crash here? Or only particular ones?

Revision history for this message
kaputtnik (franku) wrote :

A crash with branch fh1-multitexture.

Like the other crashes mentioned in the merge proposal i just did nothing. Ran an old save game, played a bit with it and let it run while doing other stuff (e.g. playing other games).

Don't know where the 8 lines containing just '(gdb)' came from.

The console output shows always that the crashes appears after an autosave, but i am not sure if they are related to autosave, because i don't know how much time has gone between autosave and crash.

Revision history for this message
kaputtnik (franku) wrote :

Another one in fh1-multitexture

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

Checked "man malloc" once again, there I find:

> The following environment variables change the behavior of the allocation-related functions.

I will try these first:
MallocGuardEdges=y, MallocStackLoggingNoCompact=y, MallocScribble=y,
MallocCheckHeapStart=20000, MallocCheckHeapEach=100, MallocCheckHeapSleep=1200

See also "leaks" (XCode tool), malloc_history.

gnu/Ubuntu should have similar tools.

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

Had to go to MallocCheckHeapEach=1000, otherwise this is quite slow (and consume a _lot_ of CPU)

Now crashed at
load_image_as_sdl_surface(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, FileSystem*) + 444 (image_io.cc:78)

MallocScribble -> fill memory that has been allocated with 0xaa bytes. This increases the likelihood that a program making assumptions about the contents of freshly allocated memory will fail. Also if set, fill memory that has been deallo-cated with 0x55 bytes. This increases the likelihood that a program will fail due to accessing memory that is no longer allocated.

KERN_INVALID_ADDRESS at 0xfffffffffffffe1e looks like a decremented null-pointer.
malloc_zone_check -> so the check failed as the malloc structures where broken :-(

Revision history for this message
SirVer (sirver) wrote :

crash1, 2 and 3 all seems different things to me.

The crashes in #5 and #6 seem to be the same, but also different to the first three.

The absolute best approaches I know off to figuring out where these memory violations come from is investigating using MSAN [1] and ASAN [2]. This requires building widelands and maybe its dependencies with these settings turned on in the compiler. Other approaches are:

- Mac OS ships with some memory debugging tools. [3]
- A simple library that replace malloc and gives some memory feedback is electric fence [4]
- valgrind's memcheck is also an excellent, but slow tool [5]

[1] https://clang.llvm.org/docs/MemorySanitizer.html
[2] https://clang.llvm.org/docs/AddressSanitizer.html
[3] https://developer.apple.com/library/content/documentation/Performance/Conceptual/ManagingMemory/Articles/MallocDebug.html
[4] https://en.wikipedia.org/wiki/Electric_Fence
[5] http://valgrind.org/docs/manual/mc-manual.html

Revision history for this message
GunChleoc (gunchleoc) wrote :

I just got another one, this one definitely from the font renderer. I was working on converting some markup, so the C++ code is all the current trunk version. I guess the next step is to use unique_ptr for the RenderNodes after we have the multitexture branch finished - I had looked into it there and decided against it in order to keep the diff a bit smallerer. Also, the change will be non-trivial.

Thread 1 "widelands" received signal SIGSEGV, Segmentation fault.
0x0000000000dede0f in RT::DivTagRenderNode::~DivTagRenderNode (this=0x8f48860, __in_chrg=<optimised out>)
    at /home/bratzbert/sources/widelands/fh1-editorhelp/src/graphic/text/rt_render.cc:688
688 delete n;
(gdb) backtrace
#0 0x0000000000dede0f in RT::DivTagRenderNode::~DivTagRenderNode (this=0x8f48860, __in_chrg=<optimised out>)
    at /home/bratzbert/sources/widelands/fh1-editorhelp/src/graphic/text/rt_render.cc:688
#1 0x0000000000dede9e in RT::DivTagRenderNode::~DivTagRenderNode (this=0x8f48860, __in_chrg=<optimised out>)
    at /home/bratzbert/sources/widelands/fh1-editorhelp/src/graphic/text/rt_render.cc:691
#2 0x0000000000df87ea in std::default_delete<RT::RenderNode>::operator() (this=0x7fffffffa050, __ptr=0x8f48860)
    at /usr/include/c++/5/bits/unique_ptr.h:76
#3 0x0000000000df5e19 in std::unique_ptr<RT::RenderNode, std::default_delete<RT::RenderNode> >::~unique_ptr (this=0x7fffffffa050,
    __in_chrg=<optimised out>) at /usr/include/c++/5/bits/unique_ptr.h:236
#4 0x0000000000dec348 in RT::Renderer::render (this=0x1c28810,
    text="<rt><p><font size=18 bold=1 color=D1D1D1>Àilean beinne<br></font></p><vspace gap=6><div width=100%><div width=100%><p><font size=12><img src=world/terrains/pics/desert/highmountainmeadow_00.png><vspa"..., width=338, allowed_tags=std::set with 0 elements)
    at /home/bratzbert/sources/widelands/fh1-editorhelp/src/graphic/text/rt_render.cc:1485
#5 0x0000000000dbd053 in (anonymous namespace)::RTImage::texture (this=0x8f47f80)
    at /home/bratzbert/sources/widelands/fh1-editorhelp/src/graphic/font_handler1.cc:91
#6 0x0000000000dbcf5e in (anonymous namespace)::RTImage::width (this=0x8f47f80)

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

Found #1690649 while tetsing, not sure if is related, will continue wihtout sound and autosave,
but MallocCheckHeapEach=100 (which makes things really slow :-)

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

Thats getting worse: I now used
export MallocCheckHeapStart=1 # start directly
export MallocCheckHeapEach=1 # check each call
 ./widelands --verbose --coredum=true --fullscreen=false --xres=1024 --yres=768

when quitting via the keyboard only this works, when using the mouse
(or perhpas using some graphics) It crashes when quitting:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_malloc.dylib 0x00007fffa9342657 szone_check_all + 3160
1 libsystem_malloc.dylib 0x00007fffa9346149 malloc_zone_check + 58
2 libsystem_malloc.dylib 0x00007fffa9345ea2 internal_check + 17
3 libsystem_malloc.dylib 0x00007fffa9337f3b free + 358
4 com.apple.GeForceGLDriver 0x00007fff8ed5083c 0x7fff8ea2c000 + 3295292
5 libGPUSupportMercury.dylib 0x00007fffa1690eba gldDestroyTexture + 20
6 libGFXShared.dylib 0x00007fff97c3fe3a gfxDestroyPluginTexture + 60
7 GLEngine 0x00007fff98908801 gleFreeTextureObject + 36
8 libGFXShared.dylib 0x00007fff97c415ca gfxReleaseSharedStateAndHash + 217
9 GLEngine 0x00007fff987d9621 gliDestroyContext + 175
10 com.apple.opengl 0x00007fff987b3232 CGLReleaseContext + 187
11 com.apple.AppKit 0x00007fff913e5551 -[NSOpenGLContext dealloc] + 58
12 libSDL2-2.0.0.dylib 0x00000001098910fd Cocoa_GL_DeleteContext + 56
13 widelands 0x0000000107e74d84 Graphic::~Graphic() + 1220 (graphic.cc:125)
14 widelands 0x0000000107e751a5 Graphic::~Graphic() + 21 (graphic.cc:128)
15 widelands 0x0000000107d6332d WLApplication::shutdown_hardware() + 77 (wlapplication.cc:806)
16 widelands 0x0000000107d62fe8 WLApplication::~WLApplication() + 24 (wlapplication.cc:370)
17 widelands 0x0000000107d634b5 WLApplication::~WLApplication() + 21 (wlapplication.cc:397)
18 widelands 0x0000000107d588b1 main + 689 (main.cc:51)

So I would assume the very first drawing on the start screen already causes Problems. (!)
can someone confirm wiht some other tool?

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

SirVer:

will adding this to CMakeLists.txt add that AddressSanitizer ?

if(CMAKE_BUILD_TYPE STREQUAL "Debug")
  set(WL_DEBUG_FLAGS "-g -DDEBUGi -fsanitize=address -fno-omit-frame-pointer")

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

Nhh, this is incomplete, msut add the flag to the linker, too. As I get linker errors:

[ 84%] Linking CXX executable test_scripting
Undefined symbols for architecture x86_64:
  "___asan_after_dynamic_init", referenced from:
      __GLOBAL__sub_I_notifications_test.cc in notifications_test.cc.o
and a lot more

Revision history for this message
kaputtnik (franku) wrote :

A complete different crash with trunk.

Revision history for this message
SirVer (sirver) wrote :

#15 looks like a double free: The texture is owned by the texturecache but somebody else freed it. something like this:

unique_ptr<...> a(new Texture(...));
texture_cache.insert(unique_ptr<...>(a.get()));

Revision history for this message
GunChleoc (gunchleoc) wrote :

I suspect that the culprit is representative_images_ in animation.h - it always had a unique_ptr, but since we now add playercolor images directly to the image cache, we have 2 unique_ptrs.

Changed in widelands:
assignee: nobody → GunChleoc (gunchleoc)
Revision history for this message
GunChleoc (gunchleoc) wrote :

I'm talking nonsense, those images as scaled -> new textures that need caching. I still suspect the playercolor though.

Revision history for this message
kaputtnik (franku) wrote :

Don't know if this is related (bzr 8357):

Very rarely it happens that when clicking on some building to attack it, the corresponding window changes to a different window. E.g. just happens: Attacking a barbarian tower and clicking a lot, suddenly the window shows one of my Mills (showing settings how much wheat is in there) on a far away spot. Couldn't say if it is the same window with changed content, or if the Attack window closes and immediately the window of the mill is opened because this happens very fast (during much clicks).

This happens to me for about two times in the last few months.

Revision history for this message
GunChleoc (gunchleoc) wrote :

#19 is a completely different bug - I have opened a new bug report: https://bugs.launchpad.net/widelands/+bug/1691336

Revision history for this message
kaputtnik (franku) wrote :

Thanks :-)

Revision history for this message
GunChleoc (gunchleoc) wrote :

#2 is this bug: https://bugs.launchpad.net/widelands/+bug/1648785.

#5, #6 are particular to the fh1-multitexture branch and should have been fixed there.

#8, #15 has hopefully been fixed with r8363.

This leaves the following crashes still to analyze:

#1
#3
#10

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

Got anotherone in bzr8371[widelands] when quitting

6 GLEngine 0x000000010d48e621 gliDestroyContext + 175
7 com.apple.opengl 0x00007fffcd74e232 CGLReleaseContext + 187
8 com.apple.AppKit 0x00007fffc656520d -[NSOpenGLContext dealloc] + 58
9 libSDL2-2.0.0.dylib 0x0000000106f730fd Cocoa_GL_DeleteContext + 56
10 widelands 0x0000000105541234 Graphic::~Graphic() + 1220 (graphic.cc:125)
11 widelands 0x0000000105541655 Graphic::~Graphic() + 21 (graphic.cc:128)

Either something basic is broken at the very beginning and crashes when cleaning up
or some structure from the beginning is corupted during the game.

Revision history for this message
kaputtnik (franku) wrote :

Another one:

Thread 1 "widelands" received signal SIGSEGV, Segmentation fault.
0x0000000000c5818c in std::__atomic_base<int>::fetch_sub (__m=std::memory_order_acq_rel, __i=1, this=0x10b714978)
    at /usr/include/c++/6.3.1/bits/atomic_base.h:524
524 { return __atomic_fetch_sub(&_M_i, __i, __m); }

Backtrace attached.

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

I created a brach for testing this on OSX:
https://code.launchpad.net/~widelands-dev/widelands/osx-malloc-check

I still get +/ random like carsh in std fucntions when e.g. iterating over
WL objecs. I will try to add more and more assertions to that branch
to narrow down the problem.

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

I checked with the malloc code the hard way (takes 3 min upto the spalsh screen).
What I found:
* If I just quit via CMD-Q all is fine.
* If I click "Exit Widelands" with the "Hand Pointer" Image I get the crash.

I get the crash most times in Graphic::~Graphic() at SDL_GL_DeleteContext(gl_context_);

I now assume _evey_ image drawing does something potentially bad, we only
get away with it for quite some time.

I pushed some minor optimzations to the osx-malloc-check branch. Found #1697703.

Gun:
* Can we defer executing build_texture_atlas.cc until _after_ the splash screen?
* Can I drop that code (for debuggign) completely?
* Do you you plan to put the images directly into the Image cache?

Could someone on Linux/gcc verify this with the glibc malloc?

So far for my findings

Revision history for this message
kaputtnik (franku) wrote :

I get no crash, but run valgrind, wait until the screen with buttons appear and hit "Exit Widelands" (German laguage).

The command i used is:

valgrind --leak-check=full --track-origins=yes ./widelands

Summary (full log attached):

      ==1355== LEAK SUMMARY:
==1355== definitely lost: 7,791 bytes in 89 blocks
==1355== indirectly lost: 680 bytes in 85 blocks
==1355== possibly lost: 0 bytes in 0 blocks
==1355== still reachable: 219,010 bytes in 1,872 blocks
==1355== suppressed: 0 bytes in 0 blocks
==1355== Reachable blocks (those to which a pointer was found) are not shown.
      ==1355== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==1355==
      ==1355== For counts of detected and suppressed errors, rerun with: -v
==1355== ERROR SUMMARY: 22 errors from 20 contexts (suppressed: 0 from 0)

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

Thx for your test, aso of differnte SDL Implementation our result will vary.
Just found that I can insall valgrind via MacPorts.
What I extracted from the logs:

==1355== Syscall param writev(vector[...]) points to uninitialised byte(s)
...
==1355== by 0xC868C2: WLApplication::WLApplication(int, char const* const*) ==1355==
...
Uninitialised value was created by a stack allocation
-> this _can_ caue a random crash, mmh

those memory leaks are not good, but will not cause a crash.
We should open a kind of clenaup bug for those, too.

Ill try that valgrind once I have some spare time again, thx for your checks.

Revision history for this message
kaputtnik (franku) wrote :

Got another boost crash:

Thread 1 "widelands" received signal SIGSEGV, Segmentation fault.
      0x0000000000cdcb3a in boost::shared_ptr<boost::signals2::detail::connection_body<std::pair<boost::signals2::detail::slot_meta_group, boost::optional<int> >, boost::signals2::slot<void (unsigned int), boost::function<void (unsigned int)> >, boost::signals2::mutex> >::operator->() const (this=0x1120fa320)
at /usr/include/boost/smart_ptr/shared_ptr.hpp:711
711 BOOST_ASSERT( px != 0 );

Will attach the save game later. This save game crashed two times after a couple of playing. Unfortunately on first crash i didn't run gbd, so i am not sure if the second crash is the same.

Revision history for this message
kaputtnik (franku) wrote :
Revision history for this message
kaputtnik (franku) wrote :

Another one with the previous send save game, i guess when opening the statistics menu:

Thread 1 "widelands" received signal SIGSEGV, Segmentation fault.
0x0000000000de731e in Widelands::Command::duetime (this=0x107362d30)
    at /home/kaputtnik/Quellcode/widelands-repo/trunk/src/logic/cmd_queue.h:80
80 return duetime_;

Full backtrace attached.

Revision history for this message
GunChleoc (gunchleoc) wrote :

* Can we defer executing build_texture_atlas.cc until _after_ the splash screen?

No, unless we want to drop all graphics in it.

* Can I drop that code (for debuggign) completely?

Not easily - will probably need some recoding in g_gr->images(), because it's completely built on top of the texture atlas.

* Do you you plan to put the images directly into the Image cache?

I do not understand this question.

Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

>> * Do you you plan to put the images directly into the Image cache?
> I do not understand this question.

Currently the images are first put into some map and from there into
the image cache. I think this could be changed to put them into that
cache directly. (or maybe I still do not understand that cache)

Revision history for this message
GunChleoc (gunchleoc) wrote : Re: [Bug 1690519] Re: Random-seeming crashes in trunk

>>> * Do you you plan to put the images directly into the Image cache?
>> I do not understand this question.
>
> Currently the images are first put into some map and from there into
> the image cache. I think this could be changed to put them into that
> cache directly. (or maybe I still do not understand that cache)

 SirVer is dropping all the important bits into the first texture atlas
to reduce the frequency of texture swapping. This is an importan
performance feature, so no, there are no plans to remove this.

Revision history for this message
kaputtnik (franku) wrote :

I don't know if i should post every crash i notice here. If i should stop spamming, please let me know :)

After playing yesterday for about 4 hrs with no crash, now there is another one after playing an half an hour:

widelands: /home/kaputtnik/widelands-repo/trunk/src/ui_basic/panel.cc:609: void UI::Panel::grab_mouse(bool): Assertion `!mousegrab_ || mousegrab_ == this' failed.

Thread 1 "widelands" received signal SIGABRT, Aborted.
0x00007ffff510b670 in raise () from /usr/lib/libc.so.6

Yesterday i played mostly in window mode, whereas today i immediately press "F" during loading of the save game. The crashes mentioned before by me are all started this way: Start widelands (window mode) -> load saved game -> switch to fullscreen. Since i encounter those bugs only in fullscreen mode, i have the feeling that this is maybe importand?

Revision history for this message
kaputtnik (franku) wrote :

Happened again:

      Thread 1 "widelands" received signal SIGSEGV, Segmentation fault.
      0x0000000000dd9d3a in boost::unordered::detail::functions<boost::hash<unsigned int>, std::equal_to<unsigned int> >::current (
         this=0x1ffffbe88) at /usr/include/boost/unordered/detail/implementation.hpp:2335
2335 static_cast<void const*>(&funcs_[current_]));

To be sure that there is no issue with my computer i will run a memory test soon.

Revision history for this message
kaputtnik (franku) wrote :

Now a crash in my video driver:

Thread 1 "widelands" received signal SIGSEGV, Segmentation fault.
      0x00007fffec062591 in ?? () from /usr/lib/xorg/modules/dri/r600_dri.so

Did run a memory test before, but it found no issues.

Revision history for this message
GunChleoc (gunchleoc) wrote :

Looks like a symptom of a bug in the font renderer.

Revision history for this message
kaputtnik (franku) wrote :

Well, that's really mad... today i played for hours with no crash.

Revision history for this message
GunChleoc (gunchleoc) wrote :

I think we have squashed them all now with the help of ASan. Let's open new bug reports if we should run into any of these again.

Changed in widelands:
status: Confirmed → Fix Committed
assignee: GunChleoc (gunchleoc) → nobody
Revision history for this message
Klaus Halfmann (klaus-halfmann) wrote :

Confirm, let's cleanup all of these.

Revision history for this message
GunChleoc (gunchleoc) wrote :

Fixed in build20-rc1

Changed in widelands:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.