Preamble: In a BGP deployment a node was a number of /24 or /30 ip addresses for communication towards the switches, but has one main /32 IP for the node itself (which is the ip that is announced via BGP itself).
When deploying via cephadm via such an environment we fail with:
2021-05-04 12:41:10,044 p=554830 u=stack n=ansible | 2021-05-04 12:41:10.044631 | 5254004b-fe7a-35d5-555e-000000000050 | TASK | Stat pre ceph conf file in case we should bootrap with it
2021-05-04 12:41:10,306 p=554830 u=stack n=ansible | 2021-05-04 12:41:10.305717 | 5254004b-fe7a-35d5-555e-000000000050 | OK | Stat pre ceph conf file in case we should bootrap with it | ctrl-1-0
2021-05-04 12:41:10,316 p=554830 u=stack n=ansible | 2021-05-04 12:41:10.316526 | 5254004b-fe7a-35d5-555e-000000000052 | TASK | Run cephadm bootstrap
2021-05-04 12:41:11,047 p=554830 u=stack n=ansible | 2021-05-04 12:41:11.047294 | 5254004b-fe7a-35d5-555e-000000000052 | FATAL | Run cephadm bootstrap | ctrl-1-0 | error={"changed": true, "cmd": "/usr/sbin/cephadm --image undercloud-0.ctlplane.bgp.ftw:8787/ceph-ci/daemon:v6.0.0-stable-6.0-pacific-centos-8-x86_64 bootstrap --skip-firewalld --ssh-private-key /home/ceph-admin/.ssh/id_rsa --ssh-public-key /home/ceph-admin/.ssh/id_rsa.pub --ssh-user ceph-admin --allow-fqdn-hostname --output-keyring /etc/ceph/bgpwin.client.admin.keyring --output-config /etc/ceph/bgpwin.conf --fsid 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 --config /home/ceph-admin/bootstrap_bgpwin.conf \\--skip-monitoring-stack --skip-dashboard --mon-ip 172.30.1.1\n", "delta": "0:00:00.470593", "end": "2021-05-04 12:41:11.020758", "msg": "non-zero return code", "rc": 1, "start": "2021-05-04 12:41:10.550165", "stderr": "Verifying podman|docker is present...\nVerifying lvm2 is present...\nVerifying time synchronization is in place...\nUnit chronyd.service is enabled and running\nRepeating the final host check...\npodman|docker (/bin/podman) is present\nsystemctl is present\nlvcreate is present\nUnit chronyd.service is enabled and running\nHost looks OK\nCluster fsid: 4b5c8c0a-ff60-454b-a1b4-9747aa737d19\nVerifying IP 172.30.1.1 port 3300 ...\nVerifying IP 172.30.1.1 port 6789 ...\nERROR: Failed to infer CIDR network for mon ip 172.30.1.1; pass --skip-mon-network to configure it later", "stderr_lines": ["Verifying podman|docker is present...", "Verifying lvm2 is present...", "Verifying time synchronization is in place...", "Unit chronyd.service is enabled and running", "Repeating the final host check...", "podman|docker (/bin/podman) is present", "systemctl is present", "lvcreate is present", "Unit chronyd.service is enabled and running", "Host looks OK", "Cluster fsid: 4b5c8c0a-ff60-454b-a1b4-9747aa737d19", "Verifying IP 172.30.1.1 port 3300 ...", "Verifying IP 172.30.1.1 port 6789 ...", "ERROR: Failed to infer CIDR network for mon ip 172.30.1.1; pass --skip-mon-network to configure it later"], "stdout": "", "stdout_lines": []}
The reason is that the cephadm command seems to get confused by this setup. Running it by hand:
/usr/sbin/cephadm --image undercloud-0.ctlplane.bgp.ftw:8787/ceph-ci/daemon:v6.0.0-stable-6.0-pacific-centos-8-x86_64 bootstrap --skip-firewalld --ssh-private-key /home/ceph-admin/.ssh/id_rsa --ssh-public-key /home/ceph-admin/.ssh/id_rsa.pub --ssh-user
ceph-admin --allow-fqdn-hostname --output-keyring /etc/ceph/bgpwin.client.admin.keyring --output-config /etc/ceph/bgpwin.conf --fsid 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 --config /home/ceph-admin/bootstrap_bgpwin.conf --skip-monitoring-stack --skip-dashboard --mon-ip 172.30.1.1
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman|docker (/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 4b5c8c0a-ff60-454b-a1b4-9747aa737d19
Verifying IP 172.30.1.1 port 3300 ...
Verifying IP 172.30.1.1 port 6789 ...
ERROR: Failed to infer CIDR network for mon ip 172.30.1.1; pass --skip-mon-network to configure it later
The network config is:
[root@ctrl-1-0 ~]# ip -o a
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
1: lo inet 172.30.1.1/32 brd 172.30.1.1 scope host lo\ valid_lft forever preferred_lft forever
1: lo inet6 f00d:f00d:f00d:f00d:f00d:f00d:f00d:1/128 scope global \ valid_lft forever preferred_lft forever
1: lo inet6 ::1/128 scope host \ valid_lft forever preferred_lft forever
2: enp1s0 inet 192.168.1.164/24 brd 192.168.1.255 scope global enp1s0\ valid_lft forever preferred_lft forever
2: enp1s0 inet6 fe80::5054:ff:fe76:c707/64 scope link \ valid_lft forever preferred_lft forever
3: enp2s0 inet 100.65.1.2/30 brd 100.65.1.3 scope global enp2s0\ valid_lft forever preferred_lft forever
3: enp2s0 inet6 fe80::5054:ff:feb9:10dc/64 scope link \ valid_lft forever preferred_lft forever
4: enp3s0 inet 100.64.0.2/30 brd 100.64.0.3 scope global enp3s0\ valid_lft forever preferred_lft forever
4: enp3s0 inet6 fe80::5054:ff:fe8d:67b4/64 scope link \ valid_lft forever preferred_lft forever
7: br-ex inet 192.168.222.1/32 scope global br-ex\ valid_lft forever preferred_lft forever
7: br-ex inet6 fd53:d91e:400:7f17::1/128 scope global \ valid_lft forever preferred_lft forever
7: br-ex inet6 fe80::c846:55ff:fe9d:5d43/64 scope link \ valid_lft forever preferred_lft forever
8: br-vlan inet 192.168.222.2/32 scope global br-vlan\ valid_lft forever preferred_lft forever
8: br-vlan inet6 fd53:d91e:400:7f17::2/128 scope global \ valid_lft forever preferred_lft forever
8: br-vlan inet6 fe80::1c80:6dff:fefd:3342/64 scope link \ valid_lft forever preferred_lft forever
I presume that the /32 is what is being problematic for cephadm in this case.
That is due to this code here https:/ /github. com/ceph/ ceph/blob/ master/ src/cephadm/ cephadm# L4638-L4661
which tries to parse 'ip route ls'. In a BGP/BFD environment this is more complex than expected:
[root@ctrl-1-0 sbin]# ip r
default proto bgp src 172.30.1.1 metric 20
nexthop via 100.64.0.1 dev enp3s0 weight 1
nexthop via 100.65.1.1 dev enp2s0 weight 1
100.64.0.0/30 dev enp3s0 proto kernel scope link src 100.64.0.2
100.65.1.0/30 dev enp2s0 proto kernel scope link src 100.65.1.2
192.168.1.0/24 dev enp1s0 proto kernel scope link src 192.168.1.164
192.168.2.0/24 via 192.168.1.1 dev enp1s0
192.168.3.0/24 via 192.168.1.1 dev enp1s0
192.168.4.0/24 via 192.168.1.1 dev enp1s0