Node Validations break when default route is pushed via bgp
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
High
|
Michele Baldessari |
Bug Description
When deploying on a train-based predeployed server that has default routes injected via BGP and ECMP the deployment fails with:
TASK [AllNodesValida
fatal: [ctrl-1-0]: FAILED! => {"changed": true, "msg": "non-zero return code", "rc": 1, "stderr": "Shared connection to 99.99.1.1 closed.\r\n", "stderr_lines": ["Shared connection to 99.99.1.1 closed."], "stdout": "Trying to ping default gateway bgp...Pi
ng to bgp failed. Retrying...\r\nPing to bgp failed. Retrying...\r\nPing to bgp failed. Retrying...\r\nPing to bgp failed. Retrying...\r\nPing to bgp failed. Retrying...\r\nPing to bgp failed. Retrying...\r\nPing to bgp failed. Retrying...\r\nPing to bgp
failed. Retrying...\r\nPing to bgp failed. Retrying...\r\nPing to bgp failed. Retrying.
, "Ping to bgp failed. Retrying...", "Ping to bgp failed. Retrying...", "Ping to bgp failed. Retrying...", "Ping to bgp failed. Retrying...", "Ping to bgp failed. Retrying...", "Ping to bgp failed. Retrying...", "Ping to bgp failed. Retrying...", "Ping t
o bgp failed. Retrying...", "FAILURE", "bgp is not pingable."]}
The reason is that the code at https:/
[root@ctrl-1-0 ~]# ip r
default proto bgp src 99.99.1.1 metric 20
nexthop via 100.65.1.1 dev eth0 weight 1
nexthop via 100.64.0.1 dev eth1 weight 1
100.64.0.0/30 dev eth1 proto kernel scope link src 100.64.0.2
100.65.1.0/30 dev eth0 proto kernel scope link src 100.65.1.2
192.168.14.0/24 dev eth2 proto kernel scope link src 192.168.14.7
This is actually already fixed in victoria/master thanks to the ansible node_validation role move done there. This LP is to track the work there
Changed in tripleo: | |
milestone: | wallaby-1 → wallaby-2 |
Changed in tripleo: | |
milestone: | wallaby-2 → wallaby-3 |
Tested on train on a BGP setup like the one above and it all worked correctly with the following patches applied: /review. opendev. org/763053 /review. opendev. org/763064
Tripleo-ansible -> https:/
THT -> https:/
TASK [tripleo_ nodes_validatio n : Check Default IPv4 Gateway availability] ****** 250/0.250/ 0.000 ms", "stdout_lines": ["PING 100.64.0.1 (100.64.0.1) 56(84) 250/0.250/ 0.000 ms"]}
Wednesday 18 November 2020 08:31:21 +0000 (0:00:00.765) 0:04:47.918 ****
ok: [ctrl-1-0] => {"changed": false, "cmd": ["ping", "-w", "10", "-c", "1", "100.64.0.1"], "delta": "0:00:00.006779", "end": "2020-11-18 08:3
1:21.924376", "rc": 0, "start": "2020-11-18 08:31:21.917597", "stderr": "", "stderr_lines": [], "stdout": "PING 100.64.0.1 (100.64.0.1) 56(84
) bytes of data.\n64 bytes from 100.64.0.1: icmp_seq=1 ttl=64 time=0.250 ms\n\n--- 100.64.0.1 ping statistics ---\n1 packets transmitted, 1 r
eceived, 0% packet loss, time 0ms\nrtt min/avg/max/mdev = 0.250/0.
bytes of data.", "64 bytes from 100.64.0.1: icmp_seq=1 ttl=64 time=0.250 ms", "", "--- 100.64.0.1 ping statistics ---", "1 packets transmitte
d, 1 received, 0% packet loss, time 0ms", "rtt min/avg/max/mdev = 0.250/0.
Ussuri backports: /review. opendev. org/763052 /review. opendev. org/763058
Tripleo-ansible -> https:/
THT -> https:/
Train backports: /review. opendev. org/763053 /review. opendev. org/763064
Tripleo-ansible -> https:/
THT -> https:/