The node does not left when one of the node is stoped node
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
sheepdog |
Fix Committed
|
Undecided
|
Unassigned |
Bug Description
The node does not left when one of the node is stoped node
I found problem that the node does not left when one of the node is stoped node under recovery is running
And I reproduced the problem and get a sheepdg debug log.
It was repoduced by continuous write from 2 nodes.
run it on node.
[root@13EV0104 ~]# dd if=/dev/zero |collie vdi write test1-100G
run it on the other node1
[root@13EV0105 ~]# dd if=/dev/zero |collie vdi write test2-100G
Version
Corosync 2.3
sheepdog 0.7.6
When stop one node, all of other node got corosync callback.
There is log 'cdrv_cpg_confchg'
Sep 11 21:20:43 DEBUG [main] client_handler(788) 1, rx 2, tx 3
Sep 11 21:20:43 DEBUG [main] client_handler(788) 1, rx 2, tx 3
Sep 11 21:20:43 DEBUG [main] finish_rx(590) 31, 10.0.0.14:41962
Sep 11 21:20:43 DEBUG [main] queue_request(347) WRITE_PEER, 1
Sep 11 21:20:43 DEBUG [io 1660] do_process_
Sep 11 21:20:43 DEBUG [io 1660] md_get_
Sep 11 21:20:43 DEBUG [main] client_handler(788) 4, rx 0, tx 3
Sep 11 21:20:43 DEBUG [main] finish_tx(677) connection from: 31, 10.0.0.14:41962
Sep 11 21:20:43 DEBUG [main] cdrv_cpg_
Sep 11 21:20:43 DEBUG [main] __corosync_
Sep 11 21:20:44 DEBUG [main] client_handler(788) 19, rx 0, tx 3
Sep 11 21:20:44 DEBUG [main] clear_client_
Sep 11 21:20:44 DEBUG [main] clear_client_
Sep 11 21:20:44 DEBUG [main] destroy_client(707) connection from: 10.0.0.10:44854
Sep 11 21:20:47 DEBUG [main] listen_handler(847) accepted a new connection: 23
Sep 11 21:20:47 DEBUG [main] client_handler(788) 1, rx 0, tx 0
Sep 11 21:20:47 DEBUG [main] finish_rx(590) 23, 127.0.0.1:53491
Sep 11 21:20:47 DEBUG [main] queue_request(347) GET_NODE_LIST, 1
but epoch did not updated,
Then I tried restart stopped node.
sd_leave_handler was called followed by cdrv_cpg_confchg joined callback.
It seems thet there is COROSYNC_
Sep 11 21:27:40 DEBUG [main] destroy_client(707) connection from: 127.0.0.1:53502
Sep 11 21:28:15 DEBUG [main] cdrv_cpg_
Sep 11 21:28:15 DEBUG [main] sd_leave_
Sep 11 21:28:15 DEBUG [main] sd_leave_
Sep 11 21:28:15 DEBUG [main] sd_leave_
Sep 11 21:28:15 DEBUG [main] sd_leave_
Sep 11 21:28:15 DEBUG [main] sd_leave_
Sep 11 21:28:15 DEBUG [main] sd_leave_
Sep 11 21:28:15 DEBUG [main] sd_leave_
Sep 11 21:28:15 DEBUG [main] sd_leave_
Sep 11 21:28:15 DEBUG [main] sd_leave_
Sep 11 21:28:15 DEBUG [main] recalculate_
Sep 11 21:28:15 DEBUG [main] recalculate_
Sep 11 21:28:15 DEBUG [main] recalculate_
Sep 11 21:28:15 DEBUG [main] recalculate_
Sep 11 21:28:15 DEBUG [main] recalculate_
Sep 11 21:28:15 DEBUG [main] recalculate_
Sep 11 21:28:15 DEBUG [main] recalculate_
Sep 11 21:28:15 DEBUG [main] recalculate_
Sep 11 21:28:15 DEBUG [main] update_
Sep 11 21:28:15 DEBUG [rw] prepare_
[root@13EV0097 ~]# collie cluster info
Cluster status: running, auto-recovery enabled
Cluster created at Fri Sep 5 18:13:13 2014
Epoch Time Version
2014-09-11 21:28:15 14 [10.0.0.7:7000, 10.0.0.8:7000, 10.0.0.9:7000, 10.0.0.10:7000, 10.0.0.11:7000, 10.0.0.12:7000, 10.0.0.13:7000, 10.0.0.14:7000, 10.0.0.15:7000]
2014-09-11 21:28:15 13 [10.0.0.7:7000, 10.0.0.8:7000, 10.0.0.9:7000, 10.0.0.11:7000, 10.0.0.12:7000, 10.0.0.13:7000, 10.0.0.14:7000, 10.0.0.15:7000]
2014-09-10 13:34:52 12 [10.0.0.7:7000, 10.0.0.8:7000, 10.0.0.9:7000, 10.0.0.10:7000, 10.0.0.11:7000, 10.0.0.12:7000, 10.0.0.13:7000, 10.0.0.14:7000, 10.0.0.15:7000]
Changed in sheepdog-project: | |
status: | New → Fix Committed |