some nodes not join to sheepdog cluster.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
sheepdog |
New
|
Undecided
|
Unassigned |
Bug Description
When trying to build a Sheepdog cluster of 12 nodes using Corosync, we found a strange behavior.
Launched a Sheep process on each server, some nodes were not join to the cluster.
Which "node" or not , was unspecified. ( for more infomation is below )
I'm not familiar with corosync, but corosync logged "enabling flow control" .
because send message buffer is full .
In our environment, unlikely to occur number of nodes was small.
and we can not possible to reproduce in sheepdog v0.7.8.
and not possible to reproduce in corosync v2.3.3.
I have some question
1. In sheepdog v0.8.1 message size, when join cluster, increased from v0.7.8?
2. Which corosync version you are used mainly.
---
The environment occurred is,
CentOS6.5 ( 2.6.32-
sheepdog v0.8.1
corosync-
---
Preparation
Delete cluster file and data of Sheepdog completely.
---
Steps to Reproduce
[root@sds01 ~]# ssh sds01 "sheep -p 7000 -b 192.168.2.11 -i host=192.
[root@sds01 ~]# ssh sds02 "sheep -p 7000 -b 192.168.2.12 -i host=192.
(Snip 9 nodes)
[root@sds01 ~]# ssh sds12 "sheep -p 7000 -b 192.168.2.22 -i host=192.
---
Confirmation of the results
Some nodes had different status, and log
[root@sds01 ~]# dog node list -a 192.168.2.11
Id Host:Port V-Nodes Zone
0 192.168.2.11:7000 33 184723648
1 192.168.2.12:7000 80 201500864
2 192.168.2.13:7000 140 218278080
3 192.168.2.14:7000 145 235055296
4 192.168.2.15:7000 147 251832512
5 192.168.2.16:7000 145 268609728
6 192.168.2.17:7000 147 285386944
7 192.168.2.18:7000 145 302164160
8 192.168.2.19:7000 146 318941376
9 192.168.2.20:7000 144 335718592
10 192.168.2.21:7000 146 352495808
11 192.168.2.22:7000 119 369273024
[root@sds01 ~]# dog node list -a 192.168.2.16
Id Host:Port V-Nodes Zone
0 192.168.2.11:7000 33 184723648
1 192.168.2.12:7000 82 201500864
2 192.168.2.13:7000 143 218278080
3 192.168.2.14:7000 148 235055296
4 192.168.2.15:7000 150 251832512
5 192.168.2.16:7000 148 268609728
6 192.168.2.17:7000 150 285386944
7 192.168.2.18:7000 149 302164160
8 192.168.2.19:7000 149 318941376
---
sds01 sheepdog log
May 12 15:33:25 DEBUG [main] tx_main(832) 37, 192.168.2.21:58765
May 12 15:33:25 DEBUG [block] sockfd_
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [main] sd_join_
May 12 15:33:28 DEBUG [main] sd_join_
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] sd_accept_
May 12 15:33:28 DEBUG [main] update_
May 12 15:33:28 DEBUG [main] sockfd_
May 12 15:33:28 DEBUG [main] recalculate_
May 12 15:33:28 DEBUG [main] recalculate_
(Snip)
May 12 15:33:28 DEBUG [main] recalculate_
May 12 15:33:28 DEBUG [block] do_get_vdis(495) try to get vdi bitmap from IPv4 ip:192.168.2.22 port:7000
May 12 15:33:28 DEBUG [block] sockfd_
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [main] __corosync_
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [block] connect_to(209) 38, 192.168.2.22:7001
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [main] cdrv_cpg_
May 12 15:33:28 DEBUG [block] sockfd_
May 12 15:33:28 DEBUG [main] listen_handler(996) accepted a new connection: 39
May 12 15:33:28 DEBUG [main] client_handler(916) 1, 0
May 12 15:33:28 DEBUG [main] rx_main(780) 39, 192.168.2.22:43156
May 12 15:33:28 DEBUG [main] queue_request(454) GET_VDI_COPIES, 2
May 12 15:33:28 DEBUG [io 12749] do_process_
May 12 15:33:28 DEBUG [main] client_handler(916) 4, 0
May 12 15:33:28 DEBUG [main] tx_main(832) 39, 192.168.2.22:43156
sds06 sheepdog log
May 12 15:33:25 DEBUG [main] tx_main(832) 35, 192.168.2.21:60702
May 12 15:33:28 DEBUG [main] listen_handler(996) accepted a new connection: 36
May 12 15:33:28 DEBUG [main] client_handler(916) 1, 0
May 12 15:33:28 DEBUG [main] rx_main(780) 36, 192.168.2.22:38787
May 12 15:33:28 DEBUG [main] queue_request(454) GET_VDI_COPIES, 2
May 12 15:33:28 DEBUG [io 15732] do_process_
May 12 15:33:28 DEBUG [main] client_handler(916) 4, 0
May 12 15:33:28 DEBUG [main] tx_main(832) 36, 192.168.2.22:38787