Disable recovery when there's not enough space

Bug #1377402 reported by sirio81
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
sheepdog
Critical
Unassigned

Bug Description

This is the simpler example: a cluster with 3 nodes and --copies 2.
All nodes have about 80-90% of used space.
When I kill a node, the cluster try to replicate the missing copies of the lost node but there's abviously not enough space.
I think sheepdog should behave like this:
- as soon as there's not enough space on the cluster to replicate the loss of any of the nodes, recovery has to be disabled.
- if a node die, the cluster is still able to work but it has to show a 'degraded' state in dog cluster info.
  (This is alike mdadm showing 'clean,degraded' when a disk is missing)

dog node info
Id Size Used Avail Use%
 0 4.6 GB 4.1 GB 479 MB 89%
 1 5.0 GB 3.8 GB 1.1 GB 77%
 2 5.0 GB 4.1 GB 894 MB 82%
Total 15 GB 12 GB 2.5 GB 83%

df -h /mnt/sheep/0
/dev/sda6 4,7G 4,2G 479M 90% /mnt/sheep/0

dog cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Sat Oct 4 10:34:30 2014
Epoch Time Version
2014-10-04 10:34:30 1 [192.168.10.4:7000, 192.168.10.5:7000, 192.168.10.6:7000]
root@test004:~# dog cluster info -v
Cluster status: running, auto-recovery enabled
Cluster store: plain with 2 redundancy policy
Cluster vnode mode: node
Cluster created at Sat Oct 4 10:34:30 2014

dog node kill 2

dog node info
Id Size Used Avail Use%
 0 4.6 GB 4.6 GB 2.7 MB 99%
 1 5.0 GB 5.0 GB 1.5 MB 99%
Total 9.6 GB 9.6 GB 4.2 MB 99%

/var/lib/sheepdog/sheep.log
Oct 04 10:37:39 ERROR [rw 4593] prealloc(385) failed to preallocate space, No space left on device
Oct 04 10:37:39 ERROR [rw 4593] err_to_sderr(108) diskfull, oid=fd38150000005b
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(404) cannot access any replicas of fd38150000005b at epoch 1
Oct 04 10:37:39 ALERT [rw 4593] recover_replication_object(405) clients may see old data
Oct 04 10:37:39 ERROR [rw 4593] recover_replication_object(412) can not recover oid fd38150000005b
Oct 04 10:37:39 ERROR [rw 4593] recover_object_work(576) failed to recover object fd38150000005b

dog vdu check
Server has no space for new objects

Sheepdog daemon version 0.8.0_353_g4d282d3

Changed in sheepdog-project:
importance: Undecided → Critical
Changed in sheepdog-project:
milestone: none → v1.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers