node list incorrect in sheep start immediately after shutdown
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
sheepdog |
New
|
Undecided
|
Unassigned |
Bug Description
In the case of using the zookeeper,
When sheep start immediately after cluster shutdown, each sheep is started.
But, result of 'dog node list' is not consistent within a cluster.
Cause is related that the remains ephemeral node.
It is possible to avoid the problem by starting sheep after ephemeral node deletion.
This problem has occurred since been added znode of QUEUE_POS.
How to reproduce
***** Case to sheep started before ephemeral node deletion
# sheep -v
Sheepdog daemon version 0.9.0_130_
# zkServer.sh start
# mkdir -p /tmp/sheepdog/
# for i in {0..3}; do sheep -c zookeeper:
# dog node list
Id Host:Port V-Nodes Zone
0 172.16.3.217:7000 128 0
1 172.16.3.217:7001 128 1
2 172.16.3.217:7002 128 2
3 172.16.3.217:7003 128 3
# echo -e 'ls /sheepdog/
[zk: localhost:
[0000000000, 0000000001, 0000000002, 0000000003]
[zk: localhost:
[IPv4 ip:172.16.3.217 port:7002, IPv4 ip:172.16.3.217 port:7003, IPv4 ip:172.16.3.217 port:7000, IPv4 ip:172.16.3.217 port:7001]
[zk: localhost:
[IPv4 ip:172.16.3.217 port:7002, IPv4 ip:172.16.3.217 port:7003, IPv4 ip:172.16.3.217 port:7000, IPv4 ip:172.16.3.217 port:7001]
# dog cluster shutdown
# echo -e 'ls /sheepdog/
[zk: localhost:
[0000000002, 0000000003]
[zk: localhost:
[]
[zk: localhost:
[IPv4 ip:172.16.3.217 port:7002, IPv4 ip:172.16.3.217 port:7003, IPv4 ip:172.16.3.217 port:7000, IPv4 ip:172.16.3.217 port:7001]
# ps -fe | grep sheep | grep -v grep
# for i in {0..3}; do sheep -c zookeeper:
# ps -fe | grep sheep | grep -v grep
root 16196 1 9 13:55 ? 00:00:00 sheep -c zookeeper:
root 16198 16196 0 13:55 ? 00:00:00 sheep -c zookeeper:
root 16205 1 7 13:55 ? 00:00:00 sheep -c zookeeper:
root 16207 16205 0 13:55 ? 00:00:00 sheep -c zookeeper:
root 16214 1 1 13:55 ? 00:00:00 sheep -c zookeeper:
root 16216 16214 0 13:55 ? 00:00:00 sheep -c zookeeper:
root 16233 1 1 13:55 ? 00:00:00 sheep -c zookeeper:
root 16235 16233 0 13:55 ? 00:00:00 sheep -c zookeeper:
# for i in {0..3}; do dog node list -p 700${i};done
Id Host:Port V-Nodes Zone
0 172.16.3.217:7001 128 1
1 172.16.3.217:7003 128 3
There are no active sheep daemons
Id Host:Port V-Nodes Zone
0 172.16.3.217:7000 128 0
1 172.16.3.217:7001 128 1
2 172.16.3.217:7003 128 3
Id Host:Port V-Nodes Zone
0 172.16.3.217:7000 128 0
1 172.16.3.217:7001 128 1
# echo -e 'ls /sheepdog/
[zk: localhost:
[0000000004, 0000000005, 0000000006, 0000000007]
[zk: localhost:
[IPv4 ip:172.16.3.217 port:7002, IPv4 ip:172.16.3.217 port:7003, IPv4 ip:172.16.3.217 port:7000, IPv4 ip:172.16.3.217 port:7001]
[zk: localhost:
[]
# grep ERROR /tmp/sheepdog/
/tmp/sheepdog/
/tmp/sheepdog/
/tmp/sheepdog/
/tmp/sheepdog/
# grep -15 ERROR /tmp/sheepdog/
Mar 18 13:55:36 DEBUG [main] zk_handle_
Mar 18 13:55:36 DEBUG [main] sd_join_
Mar 18 13:55:36 DEBUG [main] sd_join_
Mar 18 13:55:36 DEBUG [main] zk_watcher(732) path:/queue/
Mar 18 13:55:36 DEBUG [main] push_join_
Mar 18 13:55:36 DEBUG [main] zk_handle_
Mar 18 13:55:36 DEBUG [main] zk_event_
Mar 18 13:55:36 DEBUG [main] zk_queue_
Mar 18 13:55:36 DEBUG [main] zk_handle_
Mar 18 13:55:36 DEBUG [main] init_node_
Mar 18 13:55:36 DEBUG [main] zk_handle_
Mar 18 13:55:36 DEBUG [main] zk_handle_
Mar 18 13:55:36 DEBUG [main] zk_watcher(732) path:/member/IPv4 ip:172.16.3.217 port:7002, type:1, state:3
Mar 18 13:55:36 DEBUG [main] zk_handle_
Mar 18 13:55:36 DEBUG [main] zk_watcher(732) path:/member, type:4, state:3
Mar 18 13:55:36 ERROR [main] zk_handle_
Mar 18 13:55:36 DEBUG [main] zk_event_
Mar 18 13:55:37 DEBUG [main] zk_watcher(732) path:/master, type:4, state:3
Mar 18 13:55:37 DEBUG [main] zk_watcher(732) path:/queue/
Mar 18 13:55:37 DEBUG [main] zk_event_
Mar 18 13:55:37 DEBUG [main] zk_queue_
Mar 18 13:55:37 DEBUG [main] zk_handle_
Mar 18 13:55:37 DEBUG [main] sd_join_
Mar 18 13:55:37 DEBUG [main] sd_join_
Mar 18 13:55:37 DEBUG [main] zk_watcher(732) path:/queue/
Mar 18 13:55:37 DEBUG [main] push_join_
Mar 18 13:55:37 DEBUG [main] zk_handle_
Mar 18 13:55:37 DEBUG [main] zk_event_
Mar 18 13:55:37 DEBUG [main] zk_queue_
Mar 18 13:55:37 DEBUG [main] zk_handle_
Mar 18 13:55:37 DEBUG [main] zk_handle_
=======
***** Case to sheep start waiting for the ephemeral node deletion
(same as above)
# dog cluster shutdown
# echo -e 'ls /sheepdog/
[zk: localhost:
[0000000002, 0000000003]
[zk: localhost:
[]
[zk: localhost:
[IPv4 ip:172.16.3.217 port:7002, IPv4 ip:172.16.3.217 port:7003, IPv4 ip:172.16.3.217 port:7000, IPv4 ip:172.16.3.217 port:7001]
(Wait a few seconds....)
# echo -e 'ls /sheepdog/
[zk: localhost:
[]
[zk: localhost:
[]
[zk: localhost:
[]
[zk: localhost:
# for i in {0..3}; do sheep -c zookeeper:
# for i in {0..3}; do dog node list -p 700${i};done
Id Host:Port V-Nodes Zone
0 172.16.3.217:7000 128 0
1 172.16.3.217:7001 128 1
2 172.16.3.217:7002 128 2
3 172.16.3.217:7003 128 3
Id Host:Port V-Nodes Zone
0 172.16.3.217:7000 128 0
1 172.16.3.217:7001 128 1
2 172.16.3.217:7002 128 2
3 172.16.3.217:7003 128 3
Id Host:Port V-Nodes Zone
0 172.16.3.217:7000 128 0
1 172.16.3.217:7001 128 1
2 172.16.3.217:7002 128 2
3 172.16.3.217:7003 128 3
Id Host:Port V-Nodes Zone
0 172.16.3.217:7000 128 0
1 172.16.3.217:7001 128 1
2 172.16.3.217:7002 128 2
3 172.16.3.217:7003 128 3
# echo -e 'ls /sheepdog/
[zk: localhost:
[0000000004, 0000000005, 0000000006, 0000000007]
[zk: localhost:
[IPv4 ip:172.16.3.217 port:7002, IPv4 ip:172.16.3.217 port:7003, IPv4 ip:172.16.3.217 port:7000, IPv4 ip:172.16.3.217 port:7001]
[zk: localhost:
[IPv4 ip:172.16.3.217 port:7002, IPv4 ip:172.16.3.217 port:7003, IPv4 ip:172.16.3.217 port:7000, IPv4 ip:172.16.3.217 port:7001]