Unable to introspect nodes via DHCPv6 relay

Bug #1899008 reported by Harald Jensås
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Harald Jensås

Bug Description

In a routed spine-and leaf setup intospecting baremetal nodes on a remote subnet fails.
On the undercloud the ironic-inspector dnsmasq log the DHCPSOLICIT and DHCPADVERTISE can bee seen.

File: /var/log/containers/ironic-inspector/dnsmasq.log
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 available DHCP range: fd12:3456:789a:2::aaaa -- fd12:3456:789a:2::afff
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 vendor class: 343
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 client MAC address: fa:16:3e:03:33:fb
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 DHCPSOLICIT(br-ctlplane) 00:04:b8:ec:72:2e:93:51:46:ce:96:0f:b4:4d:98:c1:8e:fb
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 DHCPADVERTISE(br-ctlplane) fd12:3456:789a:2::aaaa 00:04:b8:ec:72:2e:93:51:46:ce:96:0f:b4:4d:98:c1:8e:fb
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 requested options: 23:dns-server, 24:domain-search, 59:bootfile-url,
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 requested options: 60:bootfile-param
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 tags: leaf1, known, ipxe6, dhcpv6, br-ctlplane
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 18 option: 1 client-id 00:04:b8:ec:72:2e:93:51:46:ce:96:0f:b4:4d...
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 14 option: 2 server-id 00:01:00:01:27:11:19:ae:fa:16:3e:6a:78:1f
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 40 option: 3 ia-na IAID=2535401266 T1=300 T2=525
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 nest size: 24 option: 5 iaaddr fd12:3456:789a:2::aaaa PL=600 VL=600
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 9 option: 13 status 0 success
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 1 option: 7 preference 0
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 48 option: 59 bootfile-url http://[fd12:3456:789a:1::1]:8088/inspector.ipxe

On the wire, only the response from neutron's DHCP server can be seen:

[root@undercloud ~]# ip netns exec qdhcp-6157272f-cabf-4225-a23b-988d06ecc4e4 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
8: tap213ee6d1-84: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1445 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:1a:a9:15 brd ff:ff:ff:ff:ff:ff
    inet6 fd12:3456:789a:1::4/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe1a:a915/64 scope link
       valid_lft forever preferred_lft forever

[root@undercloud ~]# tcpdump -i eth1 -n -vv udp port 546 or 547 or icmp6
10:42:11.708885 IP6 (class 0xc0, hlim 1, next-header UDP (17) payload length: 183) fd12:3456:789a:2::fffe.dhcpv6-server > ff05::1:3.dhcpv6-server: [bad udp cksum 0xa9d6 -> 0x7a70!] dhcp6 relay-fwd (linkaddr=fd12:3456:789a:2::fffe peeraddr=fe80::f816:3eff:fe03:33fb (opt_79) (relay-message (dhcp6 solicit (xid=59bfe0 (client-ID type 4) (IA_NA IAID:2535401266 T1:0 T2:0) (option-request DNS-server DNS-search-list opt_59 opt_60) (vendor-class) (opt_61) (opt_62) (user-class) (elapsed-time 792))))

10:42:11.711531 IP6 (class 0xc0, flowlabel 0x7ed8f, hlim 64, next-header UDP (17) payload length: 122) fd12:3456:789a:1::4.dhcpv6-server > fd12:3456:789a:2::fffe.dhcpv6-server: [bad udp cksum 0x5498 -> 0x1e46!] dhcp6 relay-reply (linkaddr=fd12:3456:789a:2::fffe peeraddr=fe80::f816:3eff:fe03:33fb (opt_79) (relay-message (dhcp6 advertise (xid=59bfe0 (client-ID type 4) (server-ID hwaddr/time type 1 time 655432311 fa163e1aa915) (status-code NoAddrsAvail))))

  ^^Notice fd12:3456:789a:1::4.dhcpv6-server is the Neutron DHCP server. We expect NoAddrsAvail from this DHCP server at this point.

But where is the DHCPADVERTISE(br-ctlplane) inspector's dnsmasq log show?
I can't see that with TCP dump on either br-ctlplane or eth1 (br-ctlplane's wires to eth1)

[centos@undercloud ~]$ cat undercloud.conf
[DEFAULT]

# templates = ~/tripleo-heat-templates
# container_images_file = /home/cloud-user/containers-prepare-parameter.yaml

enable_routed_networks = true
enable_ui = false
overcloud_domain_name = localdomain
scheduler_max_attempts = 2
undercloud_ntp_servers = pool.ntp.org
undercloud_hostname = undercloud.rdocloud
local_interface = eth1
local_mtu = 1445
local_ip = fd12:3456:789a:1::1/64
undercloud_public_host = fd12:3456:789a:1::2
undercloud_admin_host = fd12:3456:789a:1::3
undercloud_nameservers = 8.8.8.8,8.8.4.4
local_subnet = ctlplane-subnet
subnets = ctlplane-subnet,leaf1,leaf2
ipv6_address_mode = dhcpv6-stateful

[ctlplane-subnet]
cidr = fd12:3456:789a:1::/64
gateway = fd12:3456:789a:1::fffe
inspection_iprange = fd12:3456:789a:1::aaaa,fd12:3456:789a:1::afff
dns_nameservers = fd12:3456:789a:1::1

[leaf1]
cidr = fd12:3456:789a:2::/64
gateway = fd12:3456:789a:2::fffe
inspection_iprange = fd12:3456:789a:2::aaaa,fd12:3456:789a:2::afff
dns_nameservers = fd12:3456:789a:1::1

[leaf2]
cidr = fd12:3456:789a:3::/64
gateway = fd12:3456:789a:3::fffe
inspection_iprange = fd12:3456:789a:3::aaaa,fd12:3456:789a:3::afff
dns_nameservers = fd12:3456:789a:1::1

[root@undercloud ~]# cat /var/lib/config-data/puppet-generated/ironic_inspector/etc/ironic-inspector/dnsmasq.conf
port=0
interface=br-ctlplane
log-dhcp

dhcp-range=set:ctlplane-subnet,fd12:3456:789a:1::aaaa,fd12:3456:789a:1::afff,64,10m
dhcp-option-force=tag:ctlplane-subnet,option:mtu,1445
dhcp-range=set:leaf1,fd12:3456:789a:2::aaaa,fd12:3456:789a:2::afff,64,10m
dhcp-option-force=tag:leaf1,option:mtu,1445
dhcp-range=set:leaf2,fd12:3456:789a:3::aaaa,fd12:3456:789a:3::afff,64,10m
dhcp-option-force=tag:leaf2,option:mtu,1445
dhcp-sequential-ip
dhcp-match=ipxe,175
dhcp-match=set:efi,option:client-arch,7
dhcp-match=set:efi,option:client-arch,9
dhcp-match=set:efi,option:client-arch,11
# dhcpv6s for Client System Architecture Type (61)
dhcp-match=set:efi6,option6:61,0007
dhcp-match=set:efi6,option6:61,0009
dhcp-match=set:efi6,option6:61,0011
dhcp-userclass=set:ipxe6,iPXE
# Client is already running iPXE; move to next stage of chainloading
dhcp-boot=tag:ipxe,http://[fd12:3456:789a:1::1]:8088/inspector.ipxe
dhcp-option=tag:ipxe6,option6:bootfile-url,http://[fd12:3456:789a:1::1]:8088/inspector.ipxe
# Client is PXE booting over EFI without iPXE ROM; send EFI version of iPXE chainloader
dhcp-boot=tag:efi,tag:!ipxe,ipxe.efi
dhcp-option=tag:efi6,tag:!ipxe6,option6:bootfile-url,tftp://[fd12:3456:789a:1::1]/ipxe.efi
# Client is running PXE over BIOS; send BIOS version of iPXE chainloader
dhcp-boot=undionly.kpxe,localhost.localdomain,fd12:3456:789a:1::1

dhcp-hostsdir=/var/lib/ironic-inspector/dhcp-hostsdir

Revision history for this message
Harald Jensås (harald-jensas) wrote :

Root cause is no default route for IPv6 on the undercloud. There is also no route's specific for the leafs fd12:3456:789a:2::/64 and fd12:3456:789a:3::/64

(undercloud) [centos@undercloud ~]$ ip -6 route
::1 dev lo proto kernel metric 256 pref medium
unreachable ::/96 dev lo metric 1024 pref medium
unreachable ::ffff:0.0.0.0/96 dev lo metric 1024 pref medium
unreachable 2002:a00::/24 dev lo metric 1024 pref medium
unreachable 2002:7f00::/24 dev lo metric 1024 pref medium
unreachable 2002:a9fe::/32 dev lo metric 1024 pref medium
unreachable 2002:ac10::/28 dev lo metric 1024 pref medium
unreachable 2002:c0a8::/32 dev lo metric 1024 pref medium
unreachable 2002:e000::/19 dev lo metric 1024 pref medium
unreachable 3ffe:ffff::/32 dev lo metric 1024 pref medium
fd12:3456:789a:1::2 dev br-ctlplane proto kernel metric 256 pref medium
fd12:3456:789a:1::3 dev br-ctlplane proto kernel metric 256 pref medium
fd12:3456:789a:1::/64 dev br-ctlplane proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth2 proto kernel metric 256 pref medium
fe80::/64 dev br-ctlplane proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium

Adding a default route for IPv6 using gateway on leaf1 fixes the problem:
sudo ip -6 route add default via fd12:3456:789a:1::fffe dev br-ctlplane

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)

Fix proposed to branch: master
Review: https://review.opendev.org/756683

Changed in tripleo:
assignee: nobody → Harald Jensås (harald-jensas)
status: Triaged → In Progress
tags: removed: train-backport-potential ussuri-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/756683
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=6ac7c08257e37ecf9801936480e9b56a5c8f7343
Submitter: Zuul
Branch: master

commit 6ac7c08257e37ecf9801936480e9b56a5c8f7343
Author: Harald Jensås <email address hidden>
Date: Thu Oct 8 13:41:42 2020 +0200

    Generate routes for undercloud ctlplane network attrs

    In https://review.opendev.org/753195 we set up ctlplane
    network attributes and later use those in THT when setting
    group_vars for os_net_config templates in ansible.

    The change missed to add 'host_routes' for peer-subnets in
    a spine-and-leaf set-up. This caused introspection and
    provisioning to fail in spine-and-leaf set-ups because the
    undercloud did'nt know how to reach the remote subnets.

    This change updates the code to include calculated routes
    to the remote subnets.

    Change-Id: I265b2b586ceaeaa98bbf6073bb79cde6a91627da
    Closes-Bug: #1899008

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.