Scenarios: Some files get removed on autosave

Bug #1753230 reported by Tomek Szczęsny
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
widelands
Fix Released
High
Unassigned

Bug Description

I guess I have fulfilled one of the mission's objectives, the screen moves near a fresh built tower, and game exits with no information.

dmesg returns one line:
[12782.205364] widelands[5368]: segfault at 0 ip 00000000008a6c9a sp 00007fffba74e1b0 error 4 in widelands[400000+6b7000]

Savegame included about one minute before crash.

EDIT: I play Widelands Build 19 (Release) on Linux Mint 17.3.

Tags: savegame

Related branches

Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :
description: updated
Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :

I ran Widelands with --verbose option, that reveals the source of a problem:

"
Fatal exception: [/build/widelands-H_4qMe/widelands-19-1/src/graphic/image_io.h:35] Image not found: map:princess.png
Game: Writing Preload Data ... Naruszenie ochrony pamięci
"

(The line in polish states: Memory protection violation)

Revision history for this message
kaputtnik (franku) wrote :

Hi Tomek, thanks for your bug report.

While i can reproduce your problem with the attached savegame, i have no idea why this happens. The file princess.png is used for message boxes when Jundlina speaks. Until the time of the savegame was made, she had 'spoken' some times to the player, e.g. at the beginning of the campaign. So it is curious that the problem didn't occur former.

My guess is that the savegame is broken somehow.

Can you load a previous saved game and play until the same point where the game crashes?

tags: added: savegame
Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :

I did what you say and I succesfully expanded to that area from some autosave from earlier in the game. I got a dialog box about another task: building a warehouse and stables (which is where my provided savegame crashes).

I doubt the savegame is broken. The same thing happened to my savegame (provided), and last autosave before it.

I found out something very strange, however. If you load a very early savegame from this mission, or even start it from beginning, then immediately leave the game (without leaving Widelands) and load the provided savegame, the game WILL NOT crash!
However, this cannot be just any game running beforehand (I'm guessing it must contain princess.png).

The reason I stumbled upon this bug was that I took a break from a game, and saved it at this exact moment, and apparently it doesn't like being resumed from here.

My guess is that the game depends on that picture being somehow cached beforehand, and for any reason it is not when running a savegame.

I'll be happy to help if I can! :)

Revision history for this message
kaputtnik (franku) wrote :

> The reason I stumbled upon this bug was that I took a break from a game, and saved it at this exact moment,

Can you explain a bit more? What do you mean with "took a break"?

I also can provide a somehow broken save game for the atlantean campaign with r8613. Not sure how to reproduce it with specific steps. In the attached save game all text in message boxes are broken (falling back to old font renderer). Just explore the territory by building a tower at position 110,148 and wait until the message box appears.

Revision history for this message
GunChleoc (gunchleoc) wrote :

The "pics" folder is missing from both savegames.

Changed in widelands:
status: New → Confirmed
milestone: none → build20-rc1
importance: Undecided → High
Revision history for this message
kaputtnik (franku) wrote :

As said i am not sure, but the save game was made manually close before an automatic savegame regarding an objective was done. This was my idea: Make a savegame close to the same time when an objective triggers a savegame. But i couldn't reproduce it yet.

Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :

> Can you explain a bit more? What do you mean with "took a break"?

I saved the game, left Widelands and resumed the next day. Shortly after I loaded the savegame, the game crashed for the first time. I tried last autosave with the same result, before I reported a bug.
This is why I believe the game has troubles merely resuming from this point.

Anyway, I used my trick to finish the game with no problems. I started a new game of "atl01", then immediately loaded my savegame, and it worked flawlessly.

> I also can provide a somehow broken save game for the atlantean campaign with r8613.

I could try testing and finding possible workaround for that savegame later today, if it's compatible with my version of Widelands. I'm hoping the same trick just might work in this case too. Did the problem happen before the game was saved?

> The "pics" folder is missing from both savegames.

I'll take a look and compare working and broken savegame dirs for obvious differences.

Also pardon my chaotic reporting habits, it's a new experience to me.

Revision history for this message
kaputtnik (franku) wrote :

You can't load my savegame, because at least one file will be missing.

> I started a new game of "atl01", then immediately loaded my savegame, and it worked flawlessly.

Well, that's strange, this trick works also with my saved game.

Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :

I compared my buggy savefile with an early autosave that proved to be working before.

Files missing from crashing savefile:
/map/pics directory, like #6 pointed out
/map/scrpting/*.lua - all script files are missing.

The same applies to autosave from #5.

Supplying missing files from any savegame of the same mission solves the problem (since missing files are static content anyway).

I also found out that when I was continuing my game using "the trick", all following autosaves are corrupted too, in the same way.
However, continuing a repaired savegame yields repaired autosaves in the future.

Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :

Okay, I found a way to create a corrupted savefile!

The missing files from buggy savegames are the only ones that rely on being copied from the current game ("egbase") filesystem. (map_saver.cc line 276 onwards)

So my guess was that the original files must be missing before they are saved in a new location.

Yup, you guessed it.. All you have to do is to save game on top of your current working savefile. Or just save it somewhere twice in a row to be sure.

One can also achieve it by setting an amount of autosaves to one.

Revision history for this message
kaputtnik (franku) wrote :

Didn't test it, but if i understand correctly the steps to trigger this bug is:

1. Save a normal (non scenario) game as 'test_save'
2. Run a scenario and save it manually to the same savegame ('test_save')

After doing this all other save games will be corrupt? Also autosave's?

Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :

Not exactly. Here are the detailed procedures.

1. Load a scenario game 'test_save'
2. Save a scenario game in 'test_save'.
(no other save, including autosave, can happen between 1 and 2).

OR
1. Load or start any scenario game
2. Save your scenario game in a selected destination, like 'test_save',
3. Save your game again in the same file, 'test_save',
(no other save, including autosave, can happen between 2 and 3).

OR
1. Set an amount of autosaves to one in game settings
2. Load or start any scenario game
3. Wait for 2 consecutive autosaves to happen (no other saves allowed in between).
Autosave #00 was forced to rewrite itself.

In all three procedures, the last created savegame will be corrupt, and all following savegames will be broken as well (manual, autosave, probably those objective-triggered too but I haven't checked).

The corrupted savegame lacks pictures and scripts. The bug doesn't affect savegames not depending on scripts and pictures (or where the whole script content has already been processed). This is why there was no problem on regular maps.

The reason for that bug has to do with map saving procedure. According to the procedure, it simply copies script and image portion from current game map filesystem.
The curent map filesystem is the last one the game has been saved to.
Prior to overwriting an existing savegame, the destination folder is deleted.
If source folder and destination folder happen to be the same (overwriting last savegame), the source data will be deleted before it was copied to the destination folder.

game_main_menu_save_game.cc, line 180 says:
"
 g_fs->fs_unlink(filename_);
"
If I'm not mistaken, this is the moment when the target savefile is deleted, just after the player agreed to overwrite it in a dialog box.

save_handler.cc, line 63:
"
 g_fs->fs_unlink(filename_previous); // Delete last of the rolling files
"
This is a part of rolling autosave mechanism, this time even with a relevant comment. If there's only one autosave slot to roll through, it will be constantly overwritten and eventually lead to corrupted saves.

However, what puzzles me the most is.. why would a trick described in #8 work? I guess that's a completely separate bug.

Revision history for this message
GunChleoc (gunchleoc) wrote :

Yes, #8 looks like a bug too.

You seem to know something about programming - would you enjoy working on this bug yourself?

Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :

I'll give it a try. My skills are not impressive and I never worked in collaborative software project, but everyone started somewhere.

Revision history for this message
GunChleoc (gunchleoc) wrote :

Great, thanks!

Our 2 important wiki pages for C++ development are:

https://wl.widelands.org/wiki/BzrPrimer/

https://wl.widelands.org/wiki/Building%20Widelands/

Let us know whenever you get stuck or need help :)

Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :

Hey, thanks. I already did that. I created a branch locally and compiled it successfully.

I can't think of an elegant fix for this bug. Here are the options I've considered:

- Don't delete the savegame prior overwriting it, if the target savegame is the same as the current working one. But this might make the user end up with unnecessary files in the future (also, I don't know if existing procedures overwrite files by default)

- Make the game work on temporary savegame all the time, but I fell it's too much of a major change for a petty bug like that. Also, I think this might affect scenario development.

- Copy existing savegame somewhere and swap working filesystem with a temporary one. Then, after successful save, the game would swap back to last savegame. Delete temp savegame. Kinda convoluted, but essentially eliminates the source of a problem in all possible cases.

I'm open to any suggestions. Personally I'll see if I can make #2 or #3 work.

Revision history for this message
kaputtnik (franku) wrote :

I think the third option would be the best. But i am not a c++ developer, so i can't validate the effort... nevertheless it seems to be a clear solution to me.

How do you check for an successful save? Or what will be happen if the save isn't successful?

Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :

I think that checking amount of files in a savegame is enough to recognize it as corrupted. Saved game should contain at least as many files as the original.

I looked into LUA test scripts too, and I think that creating a test for this bug will help prevent it in the future. I'll try to make one before attempting the fix.

Revision history for this message
GunChleoc (gunchleoc) wrote :

Option 3 sounds good to me too - we already do some swapping with a temporary file, so maybe all we need is to change that swapping to be more intelligent for this case.

Revision history for this message
Arty (artydent) wrote :
Download full text (6.4 KiB)

I guess Tomek didn't go through with this, so I was looking a little into this today.

For starters I can't reproduce the exact behaviour (tested both Release Build 19 as in a recent development build). I CAN reproduce somewhat similar things though, and I am pretty sure that the source of it is the same, but there are additional issue on Windows (regarding modifications of the "current save file" of a running game) which changes the scenarions where the main bug really occurs.

Based purely on game behaviour (haven't checked the code yet) I think there are three bugs:
- Bug 1: As already explained in posts above, Widelands doesn't handle it properly when it tries to save to the same file that is the "current save file" associated with a running game. Specifically, it deletes the save file content of the existing file to use it as new save file, but then tries to get data from the "current save file" (which was just deleted), resulting in corrupt save file (missing some data).
- Bug 2: In Windows, any modifications of this "current save file" are prevented, but Widelands doesn't realize or handle it.
- Bug 3: In Windows, reassociating the "current save file" to the game doesn't happen during saving. No idea why.

Those three things would explain the behaviours in the different scenarios and why it happens differently on Windows. I'll list the various scenarios (starting with the three mentioned in #13) with some explanations.

Scenario 1
1. Load a scenario game 'test_save'
2. Save a scenario game in 'test_save'.
(no other save, including autosave, can happen between 1 and 2).

A) reported result:
- Save file is corrupted (Bug 1).

B) my result on Windows:
- Game gives an error message (http://prntscr.com/kxtu1x), save file stays untouched.
- Presumably, deletion of the file content is prevented (Bug 2), then trying to add the "binary" dir in the zip file system fails because it still exists, so game gives an error message and aborts the saving.
- Also, it doesn't actually matter here if other saves happen inbetween (whether manual or autosave). Even after several other saves, the associated "current save file" of the game is still the one it was one originally loaded from (Bug 3).

Scenario 2
1. Load or start any scenario game
2. Save your scenario game in a selected destination, like 'test_save',
3. Save your game again in the same file, 'test_save',
(no other save, including autosave, can happen between 2 and 3).

A) reported result:
- Save file is corrupted (Bug 1).

B) my result on Windows:
- Works fine.
- Presumably, the associated "current save file" wasn't changed when first saving (Bug 3), so deleting/rewriting content went through without a hitch.

Scenario 3:
1. Set an amount of autosaves to one in game settings
2. Load or start any scenario game
3. Wait for 2 consecutive autosaves to happen (no other saves allowed in between).
Autosave #00 was forced to rewrite itself.

A) reported result:
- Save file is corrupted (Bug 1).

B) my result on Windows:
- Works fine. (Unless the game was loaded from the autosave, see next scenario.)
- Same explanation as in scenario 2.

Scenario 4
1. Set an amount of autosaves to one in game setting...

Read more...

kaputtnik (franku)
summary: - Game crashes in Atlantis tutorial mission
+ Scenarios: Some files get removed on autosave
Revision history for this message
Arty (artydent) wrote :

I've started properly working on it. I should have time this week to dive deeper into it and fix things.

Am currently just doing some "pre-work", i.e. cleaning up some filesystem stuff by throwing proper exceptions, and catching those file-related exceptions in the save handler to initiate retries (and a message to the player) instead of having the game crash.

After fixing those crashes, I'll upload a branch. Shall I make an extra bug report and branch for that or just keep it in the branch related to this bug report? (Not sure what your usual procedure is for this.)

Btw. during this stuff I also fixed another tiny bug that would have skipped the waiting period beween retries after a failed save. I'd just include that in the branch it that's okay. It fits in here (and is just moving one line of code up a few lines anyway).

Then I'll implement some save file swapping when the "current" save is gonna be overwritten, which will fix the main problem of ending up with corrupt save files (or just save failures on Windows).

After that - if I haven't stumbled upon it already - I'll investigate why the "current" save doesn't change properly on Windows and fix that.

Revision history for this message
GunChleoc (gunchleoc) wrote :

Hi Arty,

thanks for working on this. Please link your branch to this bug report. You can also work with multiple branches for each issue if you want, as long as they don't conflict with each other.

Note that we just merged a fix for rolling autosave into trunk yesterday, so make sure to merge trunk to avoid duplicating the stuff that's already fixed.

Changed in widelands:
assignee: nobody → Arty (artydent)
GunChleoc (gunchleoc)
Changed in widelands:
status: Confirmed → In Progress
Revision history for this message
Arty (artydent) wrote :

After some delay (got other stuff to do) I finally found time working on this, but it's going not as smoothly as expected.
- I confirmed that the bugs were caused by trying to rename/delete the save file that the game still had a handle on (filesystem of the current game's map pointing to the save file).
- This map's filesystem is only ever changed when loading a game, but not when saving. Which explains the behavour I saw, but contradicts the reported cases by Tomek. (Maybe this was different back then, or maybe Tomek's insistence on "no saves inbetween" were just guesswork.)
- I fixed this in the save_handler: Whenever the file that the map fs points to is supposed to be changed (saved to or autosave rolled), it gets temporarily renamed and the map fs reassigned. And I made sure that the map fs always points to the last save.
- The fix worked like a charm for scenarios, but it turns out that it broke games that allow replays. For those games, right after starting (new game or loading a save), the game is saved immediately in the replay dir, then reloaded from this save (not sure why). Which also means that before my fix, those games weren't even affected by those bugs because after reloading from the replay save the map fs would point to this one and stay that way forever, thus never being in conflict with regular saving. Now after the fix, saving in the replay dir already automatically sets the map fs to that replay save, but then the reloading from it fails.

I am not even 100% sure yet why this happens. It can't just be that the map fs already points to the file that is being loaded from. (When the map fs points to a regular save, then opening the save menu reads preload data from this save file whithout a problem, but in the reload attempt from a replay save, reading the preload data already fails.)

A dirty fix would be to particularly handle the replay save thing (e.g. simply not reassign the map fs when saving in the replay dir), but that would just hide the deeper problem. I guess if we had a quicksave/quickload feature there would be the same issue.

I am starting to think that just swapping map fs in some places might not be the proper approach, because there are apparently several places where some conflict arises. Maybe the game really should always work with some temporary save file (in its own dir) for the map fs. It would only change when starting/loading a game. Any opinions about that?

Revision history for this message
GunChleoc (gunchleoc) wrote :

I think that might work, go for it :)

Revision history for this message
Tomek Szczęsny (rageagainstthebath) wrote :

Hi! I'm sorry about myself looking like I abandoned the issue, but my attempts to fix it failed to work. I'm not a C++ coder, I was merely able to read the existing code.

I guess something must have changed in the code at some point, because I'm sure that during my original investigation I've seen filesystem swaps after each save, into the target savegames. This is why I successfully recreated this bug with all savegame-breaking procedures described in #13, at least on Linux.

Swapping filesystem into the most recent savefile is something I didn't find any reason for, but since it seems to not work on Windows (#21) yet remained unnoticed, I'd tell it held no particular purpose and can be discarded.

I agree that the cleanest solution to all load/save problems is to make the game work with its own temporary map copy and not ever "swap fs" while saving the game. If you think about it, right now one might load a game from a removable medium, like a USB drive, and unmount it during the game, causing all hell to break loose. Not that common scenario, but possible.

Like I said I'm not handy with C++, but I think the ultimate solution is to load the game into the memory, rather than read it from file system on demand. And use it exclusively for processing content and sourcing files for future saves.
Using a temporary savefile is a good compromise for the sake of programming effort, but please keep in mind clogging HDD with abandoned files when the game exits in an unexpected manner.

Anyway, I share the confidence that this solution will probably work just fine and provide stability benefits in the future. I wish I could be more helpful than this, but after my pathetic attempts at tinkering with the code, I guess I'll stick to testing from now on.

Revision history for this message
GunChleoc (gunchleoc) wrote :

Good testers are just as valuable as good programmers, so thank you for your analysis :)

If we always use the same filename for the temporary file and delete it when Widelands starts or exits, clogging the hard disk should not be a problem.

Revision history for this message
Arty (artydent) wrote :

Temp files for are implemented now. Branch uploaded.

I think it's a pretty clean solution. The active temp file is handled completely by EditorGameBase. In EditorGameBase::postload - if a map filesystem exists - the map is saved to a new temp file, and the map filesystem is reassigned. The temp file is deleted in EditorGameBase::cleanup_for_load and the destructor.

This way, any map loaded in the editor or for a game or replay is automatically dealt with. No need to ever take special care of map fs conflicts when handling map/save/replay files ingame. In fact, map fs swapping for an already loaded editor or game map should even be avoided elsewhere, because it would circumvent the temp file solution, thus reintroducing the problems.

For that reason, the old map fs swapping in MainMenuSaveMap::save_map is also removed now. I actually replaced the whole "save to tmp file first, then replace old file" approach by a more straightforward "delete old file, save map".

Strangely, EditorGameBase::postload was never called when loading a map in the editor (only when generating a map), which seems to have been a bug on its own. I fixed that, EditorInteractive::load calls it now.

Btw, the basic filesystem stuff I mentioned I was doing in preparation for this bugfix turned out to be independent of this, so I kept it out of this bugfix branch and will make another branch for that stuff (where I'll keep adding more filesystem-related error handling where it's missing).

Revision history for this message
GunChleoc (gunchleoc) wrote :

> I actually replaced the whole "save to tmp file first, then replace old file" approach by a more straightforward "delete old file, save map".

The reason that was there was in case something went wrong during the save, so at there will be a backup that can be restored. Maybe "rename old file, save map, if error: rename old file back else: delete renamed old file"?

If you think this branch is ready, please create a merge request :)

Revision history for this message
Arty (artydent) wrote :

merge requested

As for the overwriting approach, I think the way it was before wasn't good, because it potentially puts the user in unwanted or at least uncertain positions.
 User perspective:
 - Why does the game tell about <filname.tmp>, I just wanted to save under <filename>?
 - The game said the save failed, but I still have <filname> here. Ist it just the old one? Ist it corrupt now?
It was also kind of the wrong way around: making a proper save, then deleting it again if the old file couldn't be deleted.

Not sure if it's so important to keep the old file at all in case of save failure, but the "rename old file, save map, if error: rename old file back else: delete renamed old file" is definitely better. More precisely I'd say:
- rename the old file (possibly even try a few alternative filenames if files exist)
- if renaming fails, just tell the player that the old file couldn't be removed.
- save map or game or whatever (I think we should generally use the same approach for any user-initiated save files).
- if it fails, delete file from botched attempt (if it exists), rename old file back (unlikely to fail at this point but catch this anyway and tell the player about the backup), tell the player not only that the save failed, but that the old file was restored.
- if successful, delete backup
- if backup deletion fails, ignore it and don't bother the player with it (but potentially have a cleanup routine to try deleting such files during application start)

I have looked a lot through all the standard saving stuff recently and am pretty familiar with it now, so I won't mind implementing this.

Revision history for this message
GunChleoc (gunchleoc) wrote :

Yes, this sounds like a plan!

Maybe create a new branch off the branch that's already up for review? This will make the diffs smaller.

I'd like to have this in Build 20.

Revision history for this message
Arty (artydent) wrote :

Saving is generally robust and consistent now.

I added a generic Save Handler class that deals with all the stuff around the actual saving, i.e., temporary backups and handling any file errors. It's invoked whenever maps or games are saved.

It's a sub-sub-branch of the temp file branch. (There is one branch inbetween that actually throws proper filesystem errors. Gonna branch off there to catch some more fs-related errors in other places.)

Revision history for this message
GunChleoc (gunchleoc) wrote :

Did you see my comments in the merge request?

Revision history for this message
Arty (artydent) wrote :

Oh, I didn't see those comments. Before you just pointed it out, I didn't even realize that there are comments at all or how to find them. I found them now, and will answer there. (Sorry, I am still new to Launchpad and don't know my way around yet.)

Revision history for this message
GunChleoc (gunchleoc) wrote :

No problem :)

After the first branch is in, we can then merge trunk into your other branches to make the diffs smaller and the review easier.

GunChleoc (gunchleoc)
Changed in widelands:
status: In Progress → Fix Committed
assignee: Arty (artydent) → nobody
Revision history for this message
GunChleoc (gunchleoc) wrote :

Fixed in build20-rc1

Changed in widelands:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.