Wrong encoding in savegame filenames

Bug #1530635 reported by Tino
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
widelands
Won't Fix
Low
Unassigned

Bug Description

On windows Widelands fails to write savegames with the correct encoding.
To reproduce just play the warfare tutorial with german language settings and have a look at the autosaves:

Kriegsführung (Besiege den feindlichen Stamm).wgf
Kriegsführung (Erweitere deine Festung zu einer Zitadelle).wgf
Kriegsführung (Reiße den nordwestlichen Wachposten ab).wgf
...

Or just save game and enter special characters as name:

äöü.wgf

Screenshot of directory and load dialog in widelands: http://imgur.com/a/aF4Nr

Tags: windows
Revision history for this message
Hans Joachim Desserud (hjd) wrote :

Hm, interesting. I assume saving a map in the editor with special characters yields the same result?

(We've had a couple of other reports recently which might be somewhat related, see bug 1530124 and bug 1526916)

Revision history for this message
GunChleoc (gunchleoc) wrote :

I have just seen this too on r7662 for Windows, in the editor "Load map" screen with default map names - "Together we're strong" does not display the curly quote correctly. The same name is displayed correctly in the list when starting a new game.

Revision history for this message
GunChleoc (gunchleoc) wrote :

I have done some more testing. Seems like Windows needs converting to/from ANSI when creating/loading filenames.

Changed in widelands:
milestone: none → build19-rc1
Revision history for this message
GunChleoc (gunchleoc) wrote :
Revision history for this message
SirVer (sirver) wrote :

This is a very difficult problem - Rust (the language) uses a abstraction called an OsStr[1] to represent file names internally. If you want to convert a utf-8 into a OSStr or vice versa, rust enforces error checking.

[1] https://doc.rust-lang.org/nightly/std/ffi/struct.OsStr.html

It is infeasible for us to do something along those lines in Widelands - the amount of work to get this right is too much. I only see three options)

1) accept the wrong encoding/status quo and do nothing.
2) purge non-ascii characters from file names. replace them by _. Ugly, but functional and easy to implement.
3) Do not use the player provided string as filename. Instead use something we control (like for replays 2016-08-07T09.43.38_single_player.wgf) and save the player provided string us UTF-8 in the "preload" file that is in every WGF.

I think 3) is the right choice here: A gamer will never care about the actual filename of her saves since she only accesses them from inside the game anyways. But if we require a save (for example for debugging) the timestamp should be sufficient to identify the file. 3 is also implementable in a backwards compatible way after b19.

I think there is no minimal invasive fix for this bug, so I'd say we live with it for b19 and try implementing 3) after that. Thoughts?

Changed in widelands:
status: Confirmed → Incomplete
Revision history for this message
GunChleoc (gunchleoc) wrote :

Imagine an Arabic speaker typing in a filename in Arabic script, or a Russian speaker in Cyrillic script. This is what will happen:

1) Encoding is screwed up, resulting in total gibberish, but it will work.

2) Filenames can only be distinguished by their length, because everything will be replaced by _. Very bad.

3) The time and date information is already encoded in the computer's file system, so we have extra code for no real gain

I vote for 1) - it is the least of all evils IMO. We can still keep this bug around as low priority without milestone, in case anybody would enjoy cracking the problem.

We could also consider an option

4) Use a library like boost::filesystem, which would mean redoing our file system code. Big new bug potential, so definitely not for Build 19.

Changed in widelands:
milestone: build19-rc1 → none
status: Incomplete → Confirmed
Revision history for this message
SirVer (sirver) wrote :

I am voting for 1) for now too. I still think 3) is the best solution.

Using timestamps as filenames is not meant to encode the time this file was created, but guarantee unique filenames and lexical sorting == time sorting. And we can guarantee utf-8 encoding in our files - i.e. the names of the save ingame.

4) will not solve this issue. Boost does not contain code to convert std::string into valid os strings, so we will still have the same encoding issues for filenames as we have now.

Revision history for this message
GunChleoc (gunchleoc) wrote :
Changed in widelands:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.