Unable to introspect nodes via DHCPv6 relay

Bug #1899008 reported by Harald Jensås
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Harald Jensås

Bug Description

In a routed spine-and leaf setup intospecting baremetal nodes on a remote subnet fails.
On the undercloud the ironic-inspector dnsmasq log the DHCPSOLICIT and DHCPADVERTISE can bee seen.

File: /var/log/containers/ironic-inspector/dnsmasq.log
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 available DHCP range: fd12:3456:789a:2::aaaa -- fd12:3456:789a:2::afff
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 vendor class: 343
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 client MAC address: fa:16:3e:03:33:fb
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 DHCPSOLICIT(br-ctlplane) 00:04:b8:ec:72:2e:93:51:46:ce:96:0f:b4:4d:98:c1:8e:fb
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 DHCPADVERTISE(br-ctlplane) fd12:3456:789a:2::aaaa 00:04:b8:ec:72:2e:93:51:46:ce:96:0f:b4:4d:98:c1:8e:fb
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 requested options: 23:dns-server, 24:domain-search, 59:bootfile-url,
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 requested options: 60:bootfile-param
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 tags: leaf1, known, ipxe6, dhcpv6, br-ctlplane
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 18 option: 1 client-id 00:04:b8:ec:72:2e:93:51:46:ce:96:0f:b4:4d...
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 14 option: 2 server-id 00:01:00:01:27:11:19:ae:fa:16:3e:6a:78:1f
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 40 option: 3 ia-na IAID=2535401266 T1=300 T2=525
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 nest size: 24 option: 5 iaaddr fd12:3456:789a:2::aaaa PL=600 VL=600
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 9 option: 13 status 0 success
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 1 option: 7 preference 0
Oct 8 10:36:45 dnsmasq-dhcp[7]: 6023847 sent size: 48 option: 59 bootfile-url http://[fd12:3456:789a:1::1]:8088/inspector.ipxe

On the wire, only the response from neutron's DHCP server can be seen:

[root@undercloud ~]# ip netns exec qdhcp-6157272f-cabf-4225-a23b-988d06ecc4e4 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
8: tap213ee6d1-84: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1445 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ether fa:16:3e:1a:a9:15 brd ff:ff:ff:ff:ff:ff
    inet6 fd12:3456:789a:1::4/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe1a:a915/64 scope link
       valid_lft forever preferred_lft forever

[root@undercloud ~]# tcpdump -i eth1 -n -vv udp port 546 or 547 or icmp6
10:42:11.708885 IP6 (class 0xc0, hlim 1, next-header UDP (17) payload length: 183) fd12:3456:789a:2::fffe.dhcpv6-server > ff05::1:3.dhcpv6-server: [bad udp cksum 0xa9d6 -> 0x7a70!] dhcp6 relay-fwd (linkaddr=fd12:3456:789a:2::fffe peeraddr=fe80::f816:3eff:fe03:33fb (opt_79) (relay-message (dhcp6 solicit (xid=59bfe0 (client-ID type 4) (IA_NA IAID:2535401266 T1:0 T2:0) (option-request DNS-server DNS-search-list opt_59 opt_60) (vendor-class) (opt_61) (opt_62) (user-class) (elapsed-time 792))))

10:42:11.711531 IP6 (class 0xc0, flowlabel 0x7ed8f, hlim 64, next-header UDP (17) payload length: 122) fd12:3456:789a:1::4.dhcpv6-server > fd12:3456:789a:2::fffe.dhcpv6-server: [bad udp cksum 0x5498 -> 0x1e46!] dhcp6 relay-reply (linkaddr=fd12:3456:789a:2::fffe peeraddr=fe80::f816:3eff:fe03:33fb (opt_79) (relay-message (dhcp6 advertise (xid=59bfe0 (client-ID type 4) (server-ID hwaddr/time type 1 time 655432311 fa163e1aa915) (status-code NoAddrsAvail))))

  ^^Notice fd12:3456:789a:1::4.dhcpv6-server is the Neutron DHCP server. We expect NoAddrsAvail from this DHCP server at this point.

But where is the DHCPADVERTISE(br-ctlplane) inspector's dnsmasq log show?
I can't see that with TCP dump on either br-ctlplane or eth1 (br-ctlplane's wires to eth1)

[centos@undercloud ~]$ cat undercloud.conf
[DEFAULT]

# templates = ~/tripleo-heat-templates
# container_images_file = /home/cloud-user/containers-prepare-parameter.yaml

enable_routed_networks = true
enable_ui = false
overcloud_domain_name = localdomain
scheduler_max_attempts = 2
undercloud_ntp_servers = pool.ntp.org
undercloud_hostname = undercloud.rdocloud
local_interface = eth1
local_mtu = 1445
local_ip = fd12:3456:789a:1::1/64
undercloud_public_host = fd12:3456:789a:1::2
undercloud_admin_host = fd12:3456:789a:1::3
undercloud_nameservers = 8.8.8.8,8.8.4.4
local_subnet = ctlplane-subnet
subnets = ctlplane-subnet,leaf1,leaf2
ipv6_address_mode = dhcpv6-stateful

[ctlplane-subnet]
cidr = fd12:3456:789a:1::/64
gateway = fd12:3456:789a:1::fffe
inspection_iprange = fd12:3456:789a:1::aaaa,fd12:3456:789a:1::afff
dns_nameservers = fd12:3456:789a:1::1

[leaf1]
cidr = fd12:3456:789a:2::/64
gateway = fd12:3456:789a:2::fffe
inspection_iprange = fd12:3456:789a:2::aaaa,fd12:3456:789a:2::afff
dns_nameservers = fd12:3456:789a:1::1

[leaf2]
cidr = fd12:3456:789a:3::/64
gateway = fd12:3456:789a:3::fffe
inspection_iprange = fd12:3456:789a:3::aaaa,fd12:3456:789a:3::afff
dns_nameservers = fd12:3456:789a:1::1

[root@undercloud ~]# cat /var/lib/config-data/puppet-generated/ironic_inspector/etc/ironic-inspector/dnsmasq.conf
port=0
interface=br-ctlplane
log-dhcp

dhcp-range=set:ctlplane-subnet,fd12:3456:789a:1::aaaa,fd12:3456:789a:1::afff,64,10m
dhcp-option-force=tag:ctlplane-subnet,option:mtu,1445
dhcp-range=set:leaf1,fd12:3456:789a:2::aaaa,fd12:3456:789a:2::afff,64,10m
dhcp-option-force=tag:leaf1,option:mtu,1445
dhcp-range=set:leaf2,fd12:3456:789a:3::aaaa,fd12:3456:789a:3::afff,64,10m
dhcp-option-force=tag:leaf2,option:mtu,1445
dhcp-sequential-ip
dhcp-match=ipxe,175
dhcp-match=set:efi,option:client-arch,7
dhcp-match=set:efi,option:client-arch,9
dhcp-match=set:efi,option:client-arch,11
# dhcpv6s for Client System Architecture Type (61)
dhcp-match=set:efi6,option6:61,0007
dhcp-match=set:efi6,option6:61,0009
dhcp-match=set:efi6,option6:61,0011
dhcp-userclass=set:ipxe6,iPXE
# Client is already running iPXE; move to next stage of chainloading
dhcp-boot=tag:ipxe,http://[fd12:3456:789a:1::1]:8088/inspector.ipxe
dhcp-option=tag:ipxe6,option6:bootfile-url,http://[fd12:3456:789a:1::1]:8088/inspector.ipxe
# Client is PXE booting over EFI without iPXE ROM; send EFI version of iPXE chainloader
dhcp-boot=tag:efi,tag:!ipxe,ipxe.efi
dhcp-option=tag:efi6,tag:!ipxe6,option6:bootfile-url,tftp://[fd12:3456:789a:1::1]/ipxe.efi
# Client is running PXE over BIOS; send BIOS version of iPXE chainloader
dhcp-boot=undionly.kpxe,localhost.localdomain,fd12:3456:789a:1::1

dhcp-hostsdir=/var/lib/ironic-inspector/dhcp-hostsdir

Revision history for this message
Harald Jensås (harald-jensas) wrote :

Root cause is no default route for IPv6 on the undercloud. There is also no route's specific for the leafs fd12:3456:789a:2::/64 and fd12:3456:789a:3::/64

(undercloud) [centos@undercloud ~]$ ip -6 route
::1 dev lo proto kernel metric 256 pref medium
unreachable ::/96 dev lo metric 1024 pref medium
unreachable ::ffff:0.0.0.0/96 dev lo metric 1024 pref medium
unreachable 2002:a00::/24 dev lo metric 1024 pref medium
unreachable 2002:7f00::/24 dev lo metric 1024 pref medium
unreachable 2002:a9fe::/32 dev lo metric 1024 pref medium
unreachable 2002:ac10::/28 dev lo metric 1024 pref medium
unreachable 2002:c0a8::/32 dev lo metric 1024 pref medium
unreachable 2002:e000::/19 dev lo metric 1024 pref medium
unreachable 3ffe:ffff::/32 dev lo metric 1024 pref medium
fd12:3456:789a:1::2 dev br-ctlplane proto kernel metric 256 pref medium
fd12:3456:789a:1::3 dev br-ctlplane proto kernel metric 256 pref medium
fd12:3456:789a:1::/64 dev br-ctlplane proto kernel metric 256 pref medium
fe80::/64 dev eth0 proto kernel metric 256 pref medium
fe80::/64 dev eth2 proto kernel metric 256 pref medium
fe80::/64 dev br-ctlplane proto kernel metric 256 pref medium
fe80::/64 dev eth1 proto kernel metric 256 pref medium

Adding a default route for IPv6 using gateway on leaf1 fixes the problem:
sudo ip -6 route add default via fd12:3456:789a:1::fffe dev br-ctlplane

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)

Fix proposed to branch: master
Review: https://review.opendev.org/756683

Changed in tripleo:
assignee: nobody → Harald Jensås (harald-jensas)
status: Triaged → In Progress
tags: removed: train-backport-potential ussuri-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/756683
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=6ac7c08257e37ecf9801936480e9b56a5c8f7343
Submitter: Zuul
Branch: master

commit 6ac7c08257e37ecf9801936480e9b56a5c8f7343
Author: Harald Jensås <email address hidden>
Date: Thu Oct 8 13:41:42 2020 +0200

    Generate routes for undercloud ctlplane network attrs

    In https://review.opendev.org/753195 we set up ctlplane
    network attributes and later use those in THT when setting
    group_vars for os_net_config templates in ansible.

    The change missed to add 'host_routes' for peer-subnets in
    a spine-and-leaf set-up. This caused introspection and
    provisioning to fail in spine-and-leaf set-ups because the
    undercloud did'nt know how to reach the remote subnets.

    This change updates the code to include calculated routes
    to the remote subnets.

    Change-Id: I265b2b586ceaeaa98bbf6073bb79cde6a91627da
    Closes-Bug: #1899008

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers