Desync + crash in local multiplayer

Bug #1797549 reported by Toni Förster
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
widelands
Fix Released
Undecided
Unassigned

Bug Description

With build 8869 I have a desync. 2 players, 1 Mac (Host) , 1 Windows and 2 AI. Loading the savegame also results in an immediate desync.

The Mac version was build from source. This is the windows version we used:

https://ci.appveyor.com/project/widelands-dev/widelands/builds/19306840/job/9yeu7wxn3lbcmsmo/artifacts

The savegame is attached to the bug report. This is the output:

Done reloading the game from replay
Replay writer has started
[sync] Reset
ComputerPlayer(3): initializing as type 2
    ... member of team 2
    ... DNA initialized
  3: 0 basic buildings in savegame file.
[Host]: Requesting sync reports for time 14167589
 3: expedition max duration = 8100 (135 minutes), map area root: 192
ComputerPlayer(4): initializing as type 2
    ... member of team 2
    ... DNA initialized
  4: 0 basic buildings in savegame file.
 4: expedition max duration = 8100 (135 minutes), map area root: 192
[Host]: Client 0: got pong
[Host]: Client 0: Time 14167589
[Host]: Client 0 reports time 14167589 (networktime = 14167610) during hang
[Host]: comparing syncreports for time 14167589
[Host]: lost synchronization with client 0!
I have: 8500fc12479833420dd9e386f98ef607
Client has: 5afded9210edb167fd2d1ac2a484a655
FATAL ERROR - game crashed. Attempting emergency save.
Game: Writing Preload Data ... took 40ms
Game: Writing Game Class Data ... took 1ms
Game: Writing Player Info ... took 395ms
Game: Writing Map Data!
Writing Elemental Data ... took 2ms
 Writing Player Names And Tribe Data ... took 0ms
 Writing Port Spaces Data ... took 1ms
 Writing Heights Data ... took 47ms
 Writing Terrain Data ... took 85ms
 Writing Player Start Position Data ... took 0ms
 Writing Player Message Data ... took 22ms
 Writing Resources Data ... took 60ms
 Writing Map Version ... took 0ms
 Writing Allowed Worker Types Data ... took 0ms
 Writing Allowed Building Types Data ... took 7ms
 Writing Flag Data ... took 25ms
 Writing Road Data ... took 11ms
 Writing Building Data ... took 13ms
 Writing Map Objects ... took 349ms
 Writing Flagdata Data ... took 24ms
 Writing Roaddata Data ... took 104ms
 Writing Buildingdata Data ... took 79ms
 Writing Node Ownership Data ... took 18ms
 Writing Exploration Data ... took 33ms
 Writing Players Unseen Data ... took 288ms
 Writing Scripting Data ... took 156ms
 Writing Objective Data ... took 0ms
 Writing map images ... took 2ms
 MapSaver::save() for 'Calvisson' took 1372ms
Game: Writing Map Data took 1375ms
Game: Writing Player Economies Info ... took 4ms
Game: Writing ai persistent data ... took 1ms
Game: Writing Command Queue Data ... took 347ms
Game: Writing Interactive Player Data ... took 0ms
GameSaver::save() took 2166ms
SaveHandler::save_game() took 2167ms
[Host]: disconnect_player_controller(1, 3==D)
[Host]: disconnect_client(0, CLIENT_DESYNCED, )
[NetHost] Closing network connection to 192.168.1.230:54239.
[Host]: Requesting sync reports for time 14167611

Revision history for this message
Toni Förster (stonerl) wrote :
GunChleoc (gunchleoc)
tags: added: crash multiplayer
Changed in widelands:
milestone: none → build20-rc1
GunChleoc (gunchleoc)
tags: added: desync
Revision history for this message
Toni Förster (stonerl) wrote :

And another desync, with r8898. Again host was a Mac and client was a Windows 10 machine. The windows machine disconnected suddenly. And the game also cannot be continued as soon as the savegame is transferred and the game starts it desyncs again, with this message on the mac-host:

Done reloading the game from replay
Replay writer has started
[sync] Reset
ComputerPlayer(3): initializing as type 2
    ... DNA initialized
  3: 0 basic buildings in savegame file.
 3: expedition max duration = 9506 (158 minutes), map area root: 232
ComputerPlayer(4): initializing as type 2
    ... DNA initialized
  4: 0 basic buildings in savegame file.
 4: frisians_beekeepers_house strictly requires supporters
 4: frisians_aqua_farm strictly requires supporters
 4: frisians_collectors_house strictly requires supporters
[Host]: Requesting sync reports for time 9439154
 4: expedition max duration = 9506 (158 minutes), map area root: 232
ComputerPlayer(5): initializing as type 2
    ... DNA initialized
  5: 0 basic buildings in savegame file.
 5: expedition max duration = 9506 (158 minutes), map area root: 232
[Host]: Client 0: got pong
[Host]: Client 0: Time 9439154
[Host]: Client 0 reports time 9439154 (networktime = 9439167) during hang
[Host]: comparing syncreports for time 9439154
[Host]: lost synchronization with client 0!
I have: e6c3002b3179ad25a01d7948e3ca1ca6
Client has: 5269ea930f0cac4e9a047a648c3e6357
FATAL ERROR - game crashed. Attempting emergency save.

Revision history for this message
Toni Förster (stonerl) wrote :
Download full text (3.3 KiB)

We used the same file but this time the windows machine was the host. The mac-client desyncedwith this message here:

Done reloading the game from replay
Replay writer has started
[sync] Reset
[Client]: sending timestamp: 9439167
[Client] speed: 1.000
[Client] speed: 0.000
[Client]: server is waiting.
[Client] Pong!
MO(6743,fish): cancel_moving
MO(3861,frisians_builder): [transfer]: starting task [movepath] and setting location to road 9085
MO(3609,frisians_builder): [transfer]: starting task [movepath] and setting location to road 8169
MO(7018,barbarians_soldier): [transfer]: starting task [movepath] and setting location to road 8635
MO(3779,barbarians_ox): [transfer]: starting task [movepath] and setting location to road 8754
MO(6788,water): cancel_moving
MO(3650,barbarians_builder): [transfer]: starting task [movepath] and setting location to road 8708
MO(3671,atlanteans_builder): [transfer]: starting task [movepath] and setting location to road 8405
MO(3578,barbarians_soldier): [transfer]: starting task [movepath] and setting location to road 8726
MO(7179,barbarians_carrier): [fetchfromflag]: back home
MO(6914,wheat): cancel_moving
MO(3395,barbarians_soldier): [transfer]: starting task [movepath] and setting location to road 8554
MO(6640,wheat): cancel_moving
MO(6984,frisians_carrier): [transfer]: set location to road 9100
MO(6379,flour): cancel_moving
MO(6584,granite): cancel_moving
MO(6424,fish): cancel_moving
MO(4817,frisians_blacksmith_master): [transfer]: starting task [movepath] and setting location to road 8986
MO(3898,frisians_builder): [transfer]: starting task [movepath] and setting location to road 8986
MO(6825,wheat): cancel_moving
MO(6879,log): cancel_moving
MO(6838,frisians_carrier): [transfer]: set location to road 9085
MO(7153,frisians_blacksmith): [transfer]: starting task [movepath] and setting location to road 9025
MO(3633,barbarians_builder): [gowarehouse]: Got transfer
MO(3633,barbarians_builder): [transfer]: starting task [movepath] and setting location to road 8686
MO(6949,empire_donkey): [transfer]: starting task [movepath] and setting location to road 8447
MO(6577,frisians_fisher): [transfer]: starting task [movepath] and setting location to road 8225
MO(3547,barbarians_soldier): [transfer]: starting task [movepath] and setting location to road 8726
MO(3583,frisians_builder): [transfer]: starting task [movepath] and setting location to road 9100
MO(4145,empire_brewer): [fetchfromflag]: back home
MO(6759,water): cancel_moving
MO(4419,barbarians_ox): [transfer]: set location to road 8637
MO(6557,water): cancel_moving
MO(7240,barbarians_carrier): [transfer]: starting task [movepath] and setting location to road 8708
MO(3530,frisians_soldier): [transfer]: starting task [movepath] and setting location to road 8227
MO(6808,frisians_carrier): [transfer]: set location to road 9076
MO(6883,log): cancel_moving
MO(6851,ax_sharp): cancel_moving
MO(5241,barbarians_builder): [transfer]: starting task [movepath] and setting location to road 8772
MO(3484,atlanteans_soldier): [transfer]: starting task [movepath] and setting location to road 8364
[Client] received NETCMD_INFO_DESYNC. Trying to salvage some information for debugging...

Read more...

Revision history for this message
GunChleoc (gunchleoc) wrote : Re: Crash in local multiplayer

This is a crash, not a desync. So, what we need to have is some log output or backtrace of the crashing client, preferably with the last autosave too.

summary: - desync in local multiplayer
+ Crash in local multiplayer
summary: - Crash in local multiplayer
+ Desync + crash in local multiplayer
Changed in widelands:
status: New → Confirmed
Revision history for this message
GunChleoc (gunchleoc) wrote :

Sorry, not quite awake yet. So, we have first a desync and when it's received a crash. That's probably an exception that would be caught in a release build.

Revision history for this message
Toni Förster (stonerl) wrote :

By release build you mean compiling with the switch release instead of debug? If so, thats what we're already doing...

Revision history for this message
Toni Förster (stonerl) wrote :

I think I failed to describe the bug properly, so I'll do a step be step description:

 1. For Windows we downloaded the x86-Release build [1]
 2. For Mac we build r8898 with the "release" switch
 3. The Mac hosts a LAN-game: 2 human players 3 AI-Players
 4. Windows joins the host.
 5. The game starts and lasts roughly 2,5h in-game hours.
 6. The Client disconnects from the server.
 7. The host auto-saves a crash-save.

 8. Mac hosts again and loads the crash-save file.
 9. Windows connects and receives the file from host.
10. Host starts the game and the game desyncs/crashes.
    See log in #2

11. Windows hosts the game uses the transferred crash-save.
12. Mac connects.
13. Windows starts the game.
14. Game desyncs/crashes.
    See log in #3

No log was created when the first desync/crash in step 7 happened.

When loading the game on either Windows or Mac and substituting the second human player with an AI player, the game runs without any problems.

The Crash-Save we used can be found in #2. I also attach the last auto-save before the crash with this comment.

[1] https://ci.appveyor.com/api/buildjobs/bhtprind9wbslqku/artifacts/Widelands-master-3962-Release-x64.exe

Revision history for this message
Toni Förster (stonerl) wrote :

We continued testing. We used the auto-save from #7 (wl_autosave_nethost_00.wgf) to start the game. Gametime was 2h27min, the crash happened around 2:37 the last time. This time 2:39. Before the desync happened, this time I manually saved and shortly after the desynced and crashed. I was prepared this time and logged everything.

Revision history for this message
Toni Förster (stonerl) wrote :
Revision history for this message
Toni Förster (stonerl) wrote :
Revision history for this message
Toni Förster (stonerl) wrote :

Another interesting note. I opened the last savefile, I created manually shortly before the desync/crash and we are able to continue the game...

Revision history for this message
Toni Förster (stonerl) wrote :

The desyncs/crashes are somehow correlating with the map size, it seems. The bigger the map, the earlier a desync is likely to happen.

Revision history for this message
Toni Förster (stonerl) wrote :

Just another addition. I'm adding the replay for #2 because this on lasts the longest. When I start the replay it desyncs at the 15 minutes mark, the replay should last 2:30 hours

REPLAY: Lost synchronization at time 910200
I have: 31cc2400c98213244f817257c9f77e1d
Replay has: 35834365bf28a8d3e50a880deb60cc09

Dindn't know this was possible

Revision history for this message
Toni Förster (stonerl) wrote :

One thing I don't understand is, why the output has mixed langugages?

The Windows client was playing with the german local:

MO(33329,empire_foresters_house): Bauarbeiter leaving

The same line one the mac host with english local:

MO(33329,empire_foresters_house): Builder leaving

All lines regarding the Builder leaving hat mixed locals on the windows machine.

Revision history for this message
Toni Förster (stonerl) wrote :

One Problems seems to be the naming of the ships. Before the desync; this is what the mac log says:

(17711,barbarians_ship): New ship: Odin

and Windows:

(17711,barbarians_ship): New ship: Dellingr

The ship is named different.

Revision history for this message
Toni Förster (stonerl) wrote :

Also it seems that the desync happened shortly after a game had been saved (manually and/or auto). The replay-desync happend at the autosave mark of 15 mins.

Revision history for this message
Toni Förster (stonerl) wrote :

Okay on of the problems is the definitely the ships' name. As soon as my first shipyard was build the game desynced/crashed.

This was the log on the mac:

MO(17845,barbarians_ship): New ship: Hnoss
[Host]: Requesting sync reports for time 9543841
MO(5673,fleet): Fleet::act
MO(17507,barbarians_carrier): [transfer]: starting task [movepath] and setting location to road 8508
[Host]: Client 0: Time 9543585
MO(17616,barbarians_carrier): [dropoff]: flag is overloaded
MO(17463,water): cancel_moving
MO(17018,empire_lumberjack): [transfer]: starting task [movepath] and setting location to road 8145
MO(17343,empire_donkey): [transfer]: starting task [movepath] and setting location to road 8088
MO(4004,barbarians_builder): [transfer]: starting task [movepath] and setting location to road 8401
MO(17356,barbarians_carrier): [transfer]: starting task [movepath] and setting location to road 8401
MO(16895,barley): cancel_moving
[Host]: Client 0: Time 9543841
[Host]: comparing syncreports for time 9543841
[Host]: lost synchronization with client 0!
I have: 87c43794a5ff22cd2c280389c0aa6399
Client has: ab308c5c760f137238aafb29cc6ea025
FATAL ERROR - game crashed. Attempting emergency save.

and this on the windows machine:

MO(17845,barbarians_ship): New ship: Volf
[Client]: sending timestamp: 9543585
MO(5673,fleet): Fleet::act
MO(17507,barbarians_carrier): [transfer]: starting task [movepath] and setting location to road 8508
MO(17616,barbarians_carrier): [dropoff]: flag is overloaded
MO(17463,water): cancel_moving
MO(17018,empire_lumberjack): [transfer]: starting task [movepath] and setting location to road 8145
MO(17343,empire_donkey): [transfer]: starting task [movepath] and setting location to road 8088
MO(4004,barbarians_builder): [transfer]: starting task [movepath] and setting location to road 8401
MO(17356,barbarians_carrier): [transfer]: starting task [movepath] and setting location to road 8401
MO(16895,barley): cancel_moving
MO(4062,atlanteans_fishbreeder): Breed(fish, 1)
MO(4062,atlanteans_fishbreeder): Successfully bred
[Client] received NETCMD_INFO_DESYNC. Trying to salvage some information for debugging.

Revision history for this message
Toni Förster (stonerl) wrote :

I'm going to split the findings in several single bugreports and reference this one here.

Revision history for this message
Teppo Mäenpää (kxq) wrote :

#15:
logic/player.h has "std::unordered_set<std::string> remaining_shipnames_;"

cplusplus.com says: "Internally, the elements in the unordered_set are not sorted in any particular order, but organized into buckets depending on their hash values to allow for fast access to individual elements directly by their values (with a constant average time complexity on average)."

Player::pick_shipname() iterates through the remaining_shipnames_ in a way which might need content order to be the same across hosts. Maybe the hashing details of unordered_set differ between various gcc and clang versions?

I do not have time to actually try this, at least right now. Sorry.

Revision history for this message
Notabilis (notabilis27) wrote :

With r8898 and r8934 I am able to load the attached savegames and two of the three are immediately resulting in a desync. With the patch from https://code.launchpad.net/~widelands-dev/widelands/bug-1811030-desync-ai/+merge/361689 applied I no longer receive this desyncs. So I guess this bug can be considered fixed now as well?

Revision history for this message
kaputtnik (franku) wrote :

Since all payed maps allow seafaring, i would also consider this as fixed.

Changed in widelands:
status: Confirmed → Fix Committed
Revision history for this message
GunChleoc (gunchleoc) wrote :

Fixed in build20-rc1

Changed in widelands:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.