https://product-ci.infra.mirantis.net/view/7.0_swarm/job/7.0.system_test.ubuntu.full_cluster_reinstallation/15/
Steps to reproduce:
Scenario:
1 .Create a cluster
Add 3 nodes with controller roles
Add a node with compute and cinder roles
Add a node with mongo role
Deploy the cluster
Verify that the deployment is completed successfully
2. Create an empty sample file on each node to check that it is not
available after cluster reinstallation
3. Reinstall all cluster nodes
4. Verify that all nodes are reinstalled (not just rebooted),
i.e. there is no sample file on a node
5. Run network verification
6. Run OSTF
7. Verify that Ceilometer API service is up and running
8. Verify that all cinder services are up and running on nodes
Actual Result:
Ostf tests are failed with keystone unavailable message by management endpoint, but actually there is no connectivity to management vip from 2 controllers and mgmt_vip is available only from controllers where it run
As result any os services do not work from 2 controllers:
root@node-5:~# keystone user-list
/usr/lib/python2.7/dist-packages/keystoneclient/shell.py:65: DeprecationWarning: The keystone CLI is deprecated in favor of python-openstackclient. For a Python library, continue using python-keystoneclient.
'python-keystoneclient.', DeprecationWarning)
Authorization Failed: Unable to establish connection to http://10.109.17.3:5000/v2.0/tokens
if we look at the mac near 10.109.17.3 on this node:
root@node-5:~# arp -i br-mgmt
Address HWtype HWaddress Flags Mask Iface
node-4.test.domain.loca ether 3e:57:71:b9:8a:24 C br-mgmt
node-2.test.domain.loca ether 64:39:32:7c:11:33 C br-mgmt
10.109.17.2 ether 72:34:ff:45:a1:48 C br-mgmt
node-1.test.domain.loca ether 64:2e:ce:b4:22:34 C br-mgmt
10.109.17.3 ether 56:a2:ec:9b:99:ba C br-mgmt
node-3.test.domain.loca ether 64:fc:10:5a:5d:cd C br-mgmt
root@node-5:~# arping -I br-mgmt
we can see that it actually incorrect, so that we failed to connect to management_vip
Correct mac looks like described bellow (see command output from node-4)
root@node-4:~# arp -i br-mgmt
Address HWtype HWaddress Flags Mask Iface
node-3.test.domain.loca ether 64:fc:10:5a:5d:cd C br-mgmt
10.109.17.2 ether 72:34:ff:45:a1:48 C br-mgmt
node-5.test.domain.loca ether 64:80:fe:df:2b:0e C br-mgmt
node-2.test.domain.loca ether 64:39:32:7c:11:33 C br-mgmt
10.109.17.3 ether 26:3c:31:ad:fd:10 C br-mgmt
node-1.test.domain.loca ether 64:2e:ce:b4:22:34 C br-mgmt
root@node-4:~# ip netns exec ip a
Cannot open network namespace "ip": No such file or directory
root@node-4:~# ip netns exec haproxy ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
20: hapr-ns: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 32:d3:12:47:bd:a3 brd ff:ff:ff:ff:ff:ff
inet 240.0.0.2/30 scope global hapr-ns
valid_lft forever preferred_lft forever
inet6 fe80::30d3:12ff:fe47:bda3/64 scope link
valid_lft forever preferred_lft forever
25: b_management: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 26:3c:31:ad:fd:10 brd ff:ff:ff:ff:ff:ff
inet 10.109.17.3/24 scope global b_management
valid_lft forever preferred_lft forever
inet6 fe80::243c:31ff:fead:fd10/64 scope link
valid_lft forever preferred_lft forever
After next actions connectivity was restored
root@node-5:~# arp -d 10.109.17.3
root@node-5:~# ping 10.109.17.3
PING 10.109.17.3 (10.109.17.3) 56(84) bytes of data.
64 bytes from 10.109.17.3: icmp_seq=1 ttl=64 time=0.187 ms
64 bytes from 10.109.17.3: icmp_seq=2 ttl=64 time=0.285 ms
64 bytes from 10.109.17.3: icmp_seq=3 ttl=64 time=0.184 ms
The issue is related to tests. What we need to do is two things:
extend the time we send arpings (increase count and wait time)
add ip neigh flush all to restore task (along with time sync)