Recovery with EC fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
sheepdog |
New
|
Critical
|
Unassigned |
Bug Description
0 192.168.2.44:7000 3 738371776
1 192.168.2.45:7000 3 755148992
2 192.168.2.46:7000 3 771926208
3 192.168.2.47:7000 3 788703424
1) I create a test disk
dog vdi create -P test 10G
2) I constantly write on it, also during recovery.
Note: because my cpu is slow and I'm using /dev/urandom, it doesn't write faster than 3M/s.
dd if=/dev/urandom bs=1M count=2048 | dog vdi write test
3) I kill the 4th node by 'dog node kill 3'. I get many warning messages and the recovery completes successfully:
Jul 15 10:16:26 WARN [rw 6866] read_erasure_
Jul 15 10:16:26 WARN [rw 6867] read_erasure_
Jul 15 10:16:26 WARN [rw 6865] read_erasure_
4) I kill the 3th node by 'dog node kill 2'. I get the warning messages and the recovery completes successfully:
Jul 15 10:37:37 WARN [rw 6832] read_erasure_
Jul 15 10:37:37 WARN [rw 6867] read_erasure_
Jul 15 10:37:37 WARN [rw 6865] read_erasure_
5) I insert back node id 2 (note that I "clean" the node first by 'rm -r /var/lib/sheepdog; rm -r /mnt/sheep/0').
During recovery, I get the 'object not found' messages.
Jul 15 10:39:05 WARN [rw 6867] read_erasure_
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_
Jul 15 10:39:05 WARN [rw 6865] sheep_exec_
I insert back the node id 3
Jul 15 10:38:28 NOTICE [main] cluster_
Jul 15 10:39:05 INFO [main] local_vdi_
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_
Jul 15 10:39:05 WARN [rw 6832] sheep_exec_
Jul 15 10:39:05 WARN [rw 6832] read_erasure_
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_
Jul 15 10:39:05 WARN [rw 6867] sheep_exec_
...
Jul 15 10:40:26 WARN [rw 6832] sheep_exec_
Jul 15 10:40:57 ERROR [main] check_request_
Jul 15 10:40:57 ERROR [main] check_request_
Jul 15 10:40:57 ERROR [main] check_request_
Jul 15 10:40:57 INFO [main] local_vdi_
Jul 15 10:40:58 INFO [main] recover_
Jul 15 10:40:59 INFO [main] recover_
...
Jul 15 10:41:25 INFO [main] recover_
Jul 15 10:41:26 INFO [main] recover_
Jul 15 10:41:26 ERROR [rw 6832] err_to_sderr(74) diskfull, oid=7c2b250000057d
Jul 15 10:41:26 ERROR [rw 6832] recover_
Jul 15 10:41:26 ERROR [rw 6986] err_to_sderr(74) diskfull, oid=7c2b250000057e
Jul 15 10:41:26 ERROR [rw 6986] recover_
Jul 15 10:41:26 ERROR [rw 6865] err_to_sderr(74) diskfull, oid=7c2b250000057f
Jul 15 10:41:26 ERROR [rw 6865] recover_
Changed in sheepdog-project: | |
importance: | Undecided → Critical |
May be this can fix bug? /github. com/sheepdog- ng/sheepdog- ng/commit/ 4c58191cb6360d3 f3191a5efd021cb 2f17dd021e
https:/