newly added files always trigger IN_MEMORY_MODIFIED

Bug #765881 reported by John A Meinel on 2011-04-19
A while back, Robert added a nice change to 'bzr status' so that we would not compute the 'sha1' of files which are newly added. The code looks like this:
        if (stat_value.st_mtime < self._cutoff_time
            and stat_value.st_ctime < self._cutoff_time
            and len(entry[1]) > 1
            and entry[1][1][0] != 'a'):
                # Could check for size changes for further optimised
                # avoidance of sha1's. However the most prominent case of
                # over-shaing is during initial add, which this catches.
            link_or_sha1 = self._sha1_file(abspath)
            entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
                           executable, packed_stat)
            entry[1][0] = ('f', '', stat_value.st_size,
                           executable, DirState.NULLSTAT)
    self._dirblock_state = DirState.IN_MEMORY_MODIFIED
    return link_or_sha1

However, this causes us to cache the file using the NULLSTAT, and then set DirState.IN_MEMORY_MODIFIED.

We probably want a bit of an overhaul here anyway, since we shouldn't be caching these sha1 values, and instead should just be using the "it is the same/different as the last commit" logic. Regardless, this makes 'bzr status' in a tree with added files *always* rewrite the dirstate file. Which is definitely bad.

John A Meinel (jameinel) wrote :

A possible patch, basically, just short-cut out if we aren't actually updating the new data.

=== modified file 'bzrlib/_dirstate_helpers_pyx.pyx'
--- bzrlib/_dirstate_helpers_pyx.pyx 2010-08-27 18:02:22 +0000
+++ bzrlib/_dirstate_helpers_pyx.pyx 2011-04-19 12:58:26 +0000
@@ -916,8 +916,12 @@
             entry[1][0] = ('f', link_or_sha1, stat_value.st_size,
                            executable, packed_stat)
- entry[1][0] = ('f', '', stat_value.st_size,
- executable, DirState.NULLSTAT)
+ new = ('f', '', stat_value.st_size, executable, DirState.NULLSTAT)
+ if entry[1][0] == new:
+ # We explicitly return early here, because we aren't changing
+ # anything, so we don't want to set IN_MEMORY_MODIFIED
+ return None
+ entry[1][0] = new
     elif minikind == c'd':
         link_or_sha1 = None
         entry[1][0] = ('d', '', 0, False, packed_stat)

John A Meinel (jameinel) on 2011-04-27
Changed in bzr:
status: In Progress → Fix Released
milestone: none → 2.4b2
