Disable recovery when there's not enough space

Bug #1377402 reported by sirio81
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sheepdog
New
Critical
Unassigned

Bug Description

This is the simpler example: a cluster with 3 nodes and --copies 2.
All nodes have about 80-90% of used space.
When I kill a node, the cluster try to replicate the missing copies of the lost node but there's abviously not enough space.
I think sheepdog should behave like this:
- as soon as there's not enough space on the cluster to replicate the loss of any of the nodes, recovery has to be disabled.
- if a node die, the cluster is still able to work but it has to show a 'degraded' state in dog cluster info.
  (This is alike mdadm showing 'clean,degraded' when a disk is missing)

dog node info
Id Size Used Avail Use%
 0 4.6 GB 4.1 GB 479 MB 89%
 1 5.0 GB 3.8 GB 1.1 GB 77%
 2 5.0 GB 4.1 GB 894 MB 82%
Total 15 GB 12 GB 2.5 GB 83%

df -h /mnt/sheep/0
/dev/sda6 4,7G 4,2G 479M 90% /mnt/sheep/0

dog cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Sat Oct 4 10:34:30 2014
Epoch Time Version
2014-10-04 10:34:30 1 [192.168.10.4:7000, 192.168.10.5:7000, 192.168.10.6:7000]
root@test004:~# dog cluster info -v
Cluster status: running, auto-recovery enabled
Cluster store: plain with 2 redundancy policy
Cluster vnode mode: node
Cluster created at Sat Oct 4 10:34:30 2014

dog node kill 2

dog node info
Id Size Used Avail Use%
 0 4.6 GB 4.6 GB 2.7 MB 99%
 1 5.0 GB 5.0 GB 1.5 MB 99%
Total 9.6 GB 9.6 GB 4.2 MB 99%

/var/lib/sheepdog/sheep.log
Oct 04 10:37:39 ERROR [rw 4593] prealloc(385) failed to preallocate space, No space left on device
Oct 04 10:37:39 ERROR [rw 4593] err_to_sderr(108) diskfull, oid=fd38150000005b
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(404) cannot access any replicas of fd38150000005b at epoch 1
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(405) clients may see old data
Oct 04 10:37:39 ERROR [rw 4593] recover_replication_object(412) can not recover oid fd38150000005b
Oct 04 10:37:39 ERROR [rw 4593] recover_object_work(576) failed to recover object fd38150000005b

dog vdu check
Server has no space for new objects

Sheepdog daemon version 0.8.0_353_g4d282d3

Changed in sheepdog-project:
importance: Undecided → Critical
Changed in sheepdog-project:
milestone: none → v1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.