cephadm fails in a bgp environment

Bug #1927097 reported by Michele Baldessari
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Triaged
Medium
Unassigned

Bug Description

Preamble: In a BGP deployment a node was a number of /24 or /30 ip addresses for communication towards the switches, but has one main /32 IP for the node itself (which is the ip that is announced via BGP itself).

When deploying via cephadm via such an environment we fail with:
2021-05-04 12:41:10,044 p=554830 u=stack n=ansible | 2021-05-04 12:41:10.044631 | 5254004b-fe7a-35d5-555e-000000000050 | TASK | Stat pre ceph conf file in case we should bootrap with it
2021-05-04 12:41:10,306 p=554830 u=stack n=ansible | 2021-05-04 12:41:10.305717 | 5254004b-fe7a-35d5-555e-000000000050 | OK | Stat pre ceph conf file in case we should bootrap with it | ctrl-1-0
2021-05-04 12:41:10,316 p=554830 u=stack n=ansible | 2021-05-04 12:41:10.316526 | 5254004b-fe7a-35d5-555e-000000000052 | TASK | Run cephadm bootstrap
2021-05-04 12:41:11,047 p=554830 u=stack n=ansible | 2021-05-04 12:41:11.047294 | 5254004b-fe7a-35d5-555e-000000000052 | FATAL | Run cephadm bootstrap | ctrl-1-0 | error={"changed": true, "cmd": "/usr/sbin/cephadm --image undercloud-0.ctlplane.bgp.ftw:8787/ceph-ci/daemon:v6.0.0-stable-6.0-pacific-centos-8-x86_64 bootstrap --skip-firewalld --ssh-private-key /home/ceph-admin/.ssh/id_rsa --ssh-public-key /home/ceph-admin/.ssh/id_rsa.pub --ssh-user ceph-admin --allow-fqdn-hostname --output-keyring /etc/ceph/bgpwin.client.admin.keyring --output-config /etc/ceph/bgpwin.conf --fsid 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 --config /home/ceph-admin/bootstrap_bgpwin.conf \\--skip-monitoring-stack --skip-dashboard --mon-ip 172.30.1.1\n", "delta": "0:00:00.470593", "end": "2021-05-04 12:41:11.020758", "msg": "non-zero return code", "rc": 1, "start": "2021-05-04 12:41:10.550165", "stderr": "Verifying podman|docker is present...\nVerifying lvm2 is present...\nVerifying time synchronization is in place...\nUnit chronyd.service is enabled and running\nRepeating the final host check...\npodman|docker (/bin/podman) is present\nsystemctl is present\nlvcreate is present\nUnit chronyd.service is enabled and running\nHost looks OK\nCluster fsid: 4b5c8c0a-ff60-454b-a1b4-9747aa737d19\nVerifying IP 172.30.1.1 port 3300 ...\nVerifying IP 172.30.1.1 port 6789 ...\nERROR: Failed to infer CIDR network for mon ip 172.30.1.1; pass --skip-mon-network to configure it later", "stderr_lines": ["Verifying podman|docker is present...", "Verifying lvm2 is present...", "Verifying time synchronization is in place...", "Unit chronyd.service is enabled and running", "Repeating the final host check...", "podman|docker (/bin/podman) is present", "systemctl is present", "lvcreate is present", "Unit chronyd.service is enabled and running", "Host looks OK", "Cluster fsid: 4b5c8c0a-ff60-454b-a1b4-9747aa737d19", "Verifying IP 172.30.1.1 port 3300 ...", "Verifying IP 172.30.1.1 port 6789 ...", "ERROR: Failed to infer CIDR network for mon ip 172.30.1.1; pass --skip-mon-network to configure it later"], "stdout": "", "stdout_lines": []}

The reason is that the cephadm command seems to get confused by this setup. Running it by hand:
/usr/sbin/cephadm --image undercloud-0.ctlplane.bgp.ftw:8787/ceph-ci/daemon:v6.0.0-stable-6.0-pacific-centos-8-x86_64 bootstrap --skip-firewalld --ssh-private-key /home/ceph-admin/.ssh/id_rsa --ssh-public-key /home/ceph-admin/.ssh/id_rsa.pub --ssh-user
ceph-admin --allow-fqdn-hostname --output-keyring /etc/ceph/bgpwin.client.admin.keyring --output-config /etc/ceph/bgpwin.conf --fsid 4b5c8c0a-ff60-454b-a1b4-9747aa737d19 --config /home/ceph-admin/bootstrap_bgpwin.conf --skip-monitoring-stack --skip-dashboard --mon-ip 172.30.1.1
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman|docker (/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 4b5c8c0a-ff60-454b-a1b4-9747aa737d19
Verifying IP 172.30.1.1 port 3300 ...
Verifying IP 172.30.1.1 port 6789 ...
ERROR: Failed to infer CIDR network for mon ip 172.30.1.1; pass --skip-mon-network to configure it later

The network config is:
[root@ctrl-1-0 ~]# ip -o a
1: lo inet 127.0.0.1/8 scope host lo\ valid_lft forever preferred_lft forever
1: lo inet 172.30.1.1/32 brd 172.30.1.1 scope host lo\ valid_lft forever preferred_lft forever
1: lo inet6 f00d:f00d:f00d:f00d:f00d:f00d:f00d:1/128 scope global \ valid_lft forever preferred_lft forever
1: lo inet6 ::1/128 scope host \ valid_lft forever preferred_lft forever
2: enp1s0 inet 192.168.1.164/24 brd 192.168.1.255 scope global enp1s0\ valid_lft forever preferred_lft forever
2: enp1s0 inet6 fe80::5054:ff:fe76:c707/64 scope link \ valid_lft forever preferred_lft forever
3: enp2s0 inet 100.65.1.2/30 brd 100.65.1.3 scope global enp2s0\ valid_lft forever preferred_lft forever
3: enp2s0 inet6 fe80::5054:ff:feb9:10dc/64 scope link \ valid_lft forever preferred_lft forever
4: enp3s0 inet 100.64.0.2/30 brd 100.64.0.3 scope global enp3s0\ valid_lft forever preferred_lft forever
4: enp3s0 inet6 fe80::5054:ff:fe8d:67b4/64 scope link \ valid_lft forever preferred_lft forever
7: br-ex inet 192.168.222.1/32 scope global br-ex\ valid_lft forever preferred_lft forever
7: br-ex inet6 fd53:d91e:400:7f17::1/128 scope global \ valid_lft forever preferred_lft forever
7: br-ex inet6 fe80::c846:55ff:fe9d:5d43/64 scope link \ valid_lft forever preferred_lft forever
8: br-vlan inet 192.168.222.2/32 scope global br-vlan\ valid_lft forever preferred_lft forever
8: br-vlan inet6 fd53:d91e:400:7f17::2/128 scope global \ valid_lft forever preferred_lft forever
8: br-vlan inet6 fe80::1c80:6dff:fefd:3342/64 scope link \ valid_lft forever preferred_lft forever

I presume that the /32 is what is being problematic for cephadm in this case.

Revision history for this message
Michele Baldessari (michele) wrote :

That is due to this code here https://github.com/ceph/ceph/blob/master/src/cephadm/cephadm#L4638-L4661

which tries to parse 'ip route ls'. In a BGP/BFD environment this is more complex than expected:
[root@ctrl-1-0 sbin]# ip r
default proto bgp src 172.30.1.1 metric 20
        nexthop via 100.64.0.1 dev enp3s0 weight 1
        nexthop via 100.65.1.1 dev enp2s0 weight 1
100.64.0.0/30 dev enp3s0 proto kernel scope link src 100.64.0.2
100.65.1.0/30 dev enp2s0 proto kernel scope link src 100.65.1.2
192.168.1.0/24 dev enp1s0 proto kernel scope link src 192.168.1.164
192.168.2.0/24 via 192.168.1.1 dev enp1s0
192.168.3.0/24 via 192.168.1.1 dev enp1s0
192.168.4.0/24 via 192.168.1.1 dev enp1s0

Revision history for this message
Michele Baldessari (michele) wrote :

For completeness, not that even after patching cephadm to parse ip route ls properly in the BGP/ECMP case we still fail with:
May 06 10:33:31 ctrl-1-0.bgp.ftw systemd[1]: Started Ceph mgr.ctrl-1-0.bgp.ftw.hpwvwv for 4b5c8c0a-ff60-454b-a1b4-9747aa737d19.
May 06 10:33:31 ctrl-1-0.bgp.ftw conmon[74961]: debug 2021-05-06T10:33:31.388+0000 7f9e29044500 0 set uid:gid to 167:167 (ceph:ceph)
May 06 10:33:31 ctrl-1-0.bgp.ftw conmon[74961]: debug 2021-05-06T10:33:31.388+0000 7f9e29044500 0 ceph version 16.2.0 (0c2054e95bcd9b30fdd908a79ac1d8bbc3394442) pacific (stable), process ceph-mgr, pid 7
May 06 10:33:31 ctrl-1-0.bgp.ftw conmon[74961]: debug 2021-05-06T10:33:31.389+0000 7f9e29044500 -1 unable to find any IP address in networks '172.31.0.1/32,172.31.0.1/32' interfaces ''
May 06 10:33:31 ctrl-1-0.bgp.ftw systemd[1]: libpod-1c9f60cdb0bba3d228d24bbc8dbdd4be6091c9dca481577cf233376972f7963d.scope: Succeeded.
May 06 10:33:31 ctrl-1-0.bgp.ftw systemd[1]: libpod-1c9f60cdb0bba3d228d24bbc8dbdd4be6091c9dca481577cf233376972f7963d.scope: Consumed 64ms CPU time

This is because https://github.com/ceph/ceph/blob/master/src/common/pick_address.cc#L210 and https://github.com/ceph/ceph/blob/master/src/common/pick_address.cc#L147 do not cater to environments where the ip is a /32 and the routes are then just propagated via bgp.

Revision history for this message
Michele Baldessari (michele) wrote :

https://tracker.ceph.com/issues/50688 is the upstream ceph issue for this

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.