> OK, so let me explain what's going on a bit more explicitly. There are application programmers who are rewriting application files like this:
>
> 1.a) open and read file ~/.kde/foo/bar/baz
> 1.b) fd = open("~/.kde/foo/bar/baz", O_WRONLY|O_TRUNC|O_CREAT) --- this truncates the file
> 1.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
> 1.d) close(fd)
>
> Slightly more sophisticated application writers will do this:
>
> 2.a) open and read file ~/.kde/foo/bar/baz
>2.b) fd = open("~/.kde/foo/bar/baz.new", O_WRONLY|O_TRUNC|O_CREAT)
>2.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
>2.d) close(fd)
>2.e) rename("~/.kde/foo/bar/baz.new", "~/.kde/foo/bar/baz")
>
>What emacs (and very sophisticated, careful application writers) will do is this:
>
>3.a) open and read file ~/.kde/foo/bar/baz
>3.b) fd = open("~/.kde/foo/bar/baz.new", O_WRONLY|O_TRUNC|O_CREAT)
>3.c) write(fd, buf-of-new-contents-of-file, size-of-new-contents-of-file)
>3.d) fsync(fd) --- and check the error return from the fsync
>3.e) close(fd)
>3.f) rename("~/.kde/foo/bar/baz", "~/.kde/foo/bar/baz~") --- this is optional
>3.g) rename("~/.kde/foo/bar/baz.new", "~/.kde/foo/bar/baz")
>
>The fact that series (1) and (2) works at all is an accident. Ext3 in its default configuration happens to have the property that 5 seconds after (1) and (2) completes, the data is safely on disk. (3) is the ***only*** thing which is guaranteed not to lose data. For example, if you are using laptop mode, the 5 seconds is extended to 30 seconds.
The variant (1) is unsafe by design: data can be gone due to software failure. But variant (2) is correct. Both application developer and ext3 assuming following logic behind the scene:
2.a) open and read file ~/.kde/foo/bar/baz
2.b) fd = open("~/.kde/foo/bar/baz.new", O_WRONLY|O_TRUNC|O_CREAT)
Because of that, such problem might happen in many other areas. It cannot be fixed easily just by putting call to fsync(fd), (which is not available in every programming language, BTW).
IMHO, ext4 should respect these hidden transactions. I.e., it should not reorder file and filesystem operations, which come from same process.
> OK, so let me explain what's going on a bit more explicitly. There are application programmers who are rewriting application files like this: /.kde/foo/ bar/baz" , O_WRONLY| O_TRUNC| O_CREAT) --- this truncates the file new-contents- of-file, size-of- new-contents- of-file) /.kde/foo/ bar/baz. new", O_WRONLY| O_TRUNC| O_CREAT) new-contents- of-file, size-of- new-contents- of-file) "~/.kde/ foo/bar/ baz.new" , "~/.kde/ foo/bar/ baz") /.kde/foo/ bar/baz. new", O_WRONLY| O_TRUNC| O_CREAT) new-contents- of-file, size-of- new-contents- of-file) "~/.kde/ foo/bar/ baz", "~/.kde/ foo/bar/ baz~") --- this is optional "~/.kde/ foo/bar/ baz.new" , "~/.kde/ foo/bar/ baz")
>
> 1.a) open and read file ~/.kde/foo/bar/baz
> 1.b) fd = open("~
> 1.c) write(fd, buf-of-
> 1.d) close(fd)
>
> Slightly more sophisticated application writers will do this:
>
> 2.a) open and read file ~/.kde/foo/bar/baz
>2.b) fd = open("~
>2.c) write(fd, buf-of-
>2.d) close(fd)
>2.e) rename(
>
>What emacs (and very sophisticated, careful application writers) will do is this:
>
>3.a) open and read file ~/.kde/foo/bar/baz
>3.b) fd = open("~
>3.c) write(fd, buf-of-
>3.d) fsync(fd) --- and check the error return from the fsync
>3.e) close(fd)
>3.f) rename(
>3.g) rename(
>
>The fact that series (1) and (2) works at all is an accident. Ext3 in its default configuration happens to have the property that 5 seconds after (1) and (2) completes, the data is safely on disk. (3) is the ***only*** thing which is guaranteed not to lose data. For example, if you are using laptop mode, the 5 seconds is extended to 30 seconds.
The variant (1) is unsafe by design: data can be gone due to software failure. But variant (2) is correct. Both application developer and ext3 assuming following logic behind the scene:
2.a) open and read file ~/.kde/foo/bar/baz /.kde/foo/ bar/baz. new", O_WRONLY| O_TRUNC| O_CREAT)
2.b) fd = open("~
transaction_ start(fd) ; // Hidden logic
2.c) write(fd, buf-of- new-contents- of-file, size-of- new-contents- of-file) "~/.kde/ foo/bar/ baz.new" , "~/.kde/ foo/bar/ baz")
2.d) close(fd)
2.e) rename(
transaction_ finish( fd); // Hidden logic
While ext4 and XFS assumes following logic:
2.a) open and read file ~/.kde/foo/bar/baz /.kde/foo/ bar/baz. new", O_WRONLY| O_TRUNC| O_CREAT) new-contents- of-file, size-of- new-contents- of-file)
2.b) fd1 = open("~
2.c) write(fd, buf-of-
2.d) close(fd)
transaction_ start() ; // Hidden logic
2.e) rename( "~/.kde/ foo/bar/ baz.new" , "~/.kde/ foo/bar/ baz")
transaction_ finish( ); // Hidden logic
Because of that, such problem might happen in many other areas. It cannot be fixed easily just by putting call to fsync(fd), (which is not available in every programming language, BTW).
IMHO, ext4 should respect these hidden transactions. I.e., it should not reorder file and filesystem operations, which come from same process.