.stale directory was removed after cluster format

Bug #1402887 reported by Saeki Masaki
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sheepdog
New
High
Unassigned

Bug Description

Reproduction procedure is as follows.
However, not occur 100%, is occured by the timing.
----------------------------------------------
1. start sheepdog cluster
2. check .stale directory
3. format cluster
4. recheck .stale directory
----------------------------------------------

# sheep -p 7000 -z 0 -l dir=/var/log/sheep0 /var/lib/sheepdog/data0
# sheep -p 7001 -z 1 -l dir=/var/log/sheep1 /var/lib/sheepdog/data1
# sheep -p 7002 -z 2 -l dir=/var/log/sheep2 /var/lib/sheepdog/data2

# ls -la /var/lib/sheepdog/data*/obj/
/var/lib/sheepdog/data0/obj/:
total 0
drwxr-x--- 3 root root 19 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .stale

/var/lib/sheepdog/data1/obj/:
total 0
drwxr-x--- 3 root root 19 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .stale

/var/lib/sheepdog/data2/obj/:
total 0
drwxr-x--- 3 root root 19 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .stale

# dog cluster format
using backend plain store

# ls -la /var/lib/sheepdog/data*/obj/
/var/lib/sheepdog/data0/obj/:
total 0
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..

/var/lib/sheepdog/data1/obj/:
total 0
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..

/var/lib/sheepdog/data2/obj/:
total 0
drwxr-x--- 2 root root 6 Dec 16 10:52 2014 .
drwxr-x--- 4 root root 63 Dec 16 10:52 2014 ..

----------------------------------------------
I think it is race condition at cluter format.
I've tested with an embedded debugging statements

here is debugging patch.

# git diff
diff --git a/lib/util.c b/lib/util.c
index 21e0143..90b4f66 100644
--- a/lib/util.c
+++ b/lib/util.c
@@ -485,6 +485,7 @@ int rmdir_r(const char *dir_path)
        ret = purge_directory(dir_path);
        if (ret == 0)
                ret = rmdir(dir_path);
+ sd_notice("rmdir %s", dir_path);

        return ret;
 }
diff --git a/sheep/plain_store.c b/sheep/plain_store.c
index 876582c..97c5078 100644
--- a/sheep/plain_store.c
+++ b/sheep/plain_store.c
@@ -230,6 +230,7 @@ static int make_stale_dir(const char *path)
        char p[PATH_MAX];

        snprintf(p, PATH_MAX, "%s/.stale", path);
+ sd_notice("mkdir %s", p);
        if (xmkdir(p, sd_def_dmode) < 0) {
                sd_err("%s failed, %m", p);
                return SD_RES_EIO;

--------------------------
 log message is below.

# grep NOTICE /var/log/sheep*/sheep.log
/var/log/sheep0/sheep.log:Dec 16 10:52:15 NOTICE [main] nfs_init(607) nfs server service is not compiled
/var/log/sheep0/sheep.log:Dec 16 10:52:59 NOTICE [main] make_stale_dir(233) mkdir /var/lib/sheepdog/data0/obj/.stale
/var/log/sheep0/sheep.log:Dec 16 10:52:59 NOTICE [util] rmdir_r(488) rmdir /var/lib/sheepdog/data0/obj/.stale

/var/log/sheep1/sheep.log:Dec 16 10:52:15 NOTICE [main] nfs_init(607) nfs server service is not compiled
/var/log/sheep1/sheep.log:Dec 16 10:52:59 NOTICE [main] make_stale_dir(233) mkdir /var/lib/sheepdog/data1/obj/.stale
/var/log/sheep1/sheep.log:Dec 16 10:52:59 NOTICE [util] rmdir_r(488) rmdir /var/lib/sheepdog/data1/obj/.stale

/var/log/sheep2/sheep.log:Dec 16 10:52:15 NOTICE [main] nfs_init(607) nfs server service is not compiled
/var/log/sheep2/sheep.log:Dec 16 10:52:59 NOTICE [main] make_stale_dir(233) mkdir /var/lib/sheepdog/data2/obj/.stale
/var/log/sheep2/sheep.log:Dec 16 10:52:59 NOTICE [util] rmdir_r(488) rmdir /var/lib/sheepdog/data2/obj/.stale

Tags: v0.9.0
Changed in sheepdog-project:
importance: Undecided → High
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.