We hit this again last night. This is an HA only issue, it's very clear from the logs - it has something to do with the second DHCP server handing out a lease for some reason.
52:54:00:50:fd:5b is the MAC address of the KVM node that failed to PXE boot
You can see that node is getting an IP from the secondary rack server:
syslog:Sep 20 07:10:17 infra1 dhcpd[24849]: DHCPDISCOVER from 52:54:00:50:fd:5b via broam: load balance to peer failover-vlan-5022
syslog:Sep 20 07:10:18 infra1 dhcpd[24849]: DHCPREQUEST for 10.245.208.170 (10.245.208.31) from 52:54:00:50:fd:5b via broam: lease owned by peer
syslog:Sep 20 07:10:18 infra1 dhcpd[24849]: DHCPREQUEST for 10.245.208.170 (10.245.208.31) from 52:54:00:50:fd:5b via broam: lease owned by peer
syslog:Sep 20 07:10:19 infra1 maas.interface: [info] eno2 (physical) on infra1: New MAC, IP binding observed: 52:54:00:50:fd:5b, 10.245.208.170
Later on it tries to request that IP again and the primary rack server offers it a different IP. It, however, wants the IP the secondary rack server is offering:
syslog:Sep 20 07:53:20 infra1 dhcpd[24849]: DHCPDISCOVER from 52:54:00:50:fd:5b via broam
syslog:Sep 20 07:53:20 infra1 dhcpd[24849]: DHCPOFFER on 10.245.208.222 to 52:54:00:50:fd:5b via broam
syslog:Sep 20 07:53:21 infra1 dhcpd[24849]: DHCPDISCOVER from 52:54:00:50:fd:5b via broam
syslog:Sep 20 07:53:23 infra1 dhcpd[24849]: DHCPREQUEST for 10.245.208.170 (10.245.208.31) from 52:54:00:50:fd:5b via broam: lease 10.245.208.170 unavailable.
Here's syslog from the secondary rack controller, showing it offering the 10.245.208.170 address at the same time the primary rack controller is offering the 10.245.208.22 address:
The two offers:
<secondary> Sep 20 07:53:21 infra2 dhcpd[20205]: DHCPOFFER on 10.245.208.170 to 52:54:00:50:fd:5b via broam
<primary> Sep 20 07:53:21 infra1 dhcpd[24849]: DHCPOFFER on 10.245.208.222 to 52:54:00:50:fd:5b
So the question is, why is the secondary rack server offering an IP? Why is the "load balance to peer" thing happening?
We hit this again last night. This is an HA only issue, it's very clear from the logs - it has something to do with the second DHCP server handing out a lease for some reason.
52:54:00:50:fd:5b is the MAC address of the KVM node that failed to PXE boot
Here is syslog from the primary rack controller:
http:// paste.ubuntu. com/25579251/
You can see that node is getting an IP from the secondary rack server:
syslog:Sep 20 07:10:17 infra1 dhcpd[24849]: DHCPDISCOVER from 52:54:00:50:fd:5b via broam: load balance to peer failover-vlan-5022
syslog:Sep 20 07:10:18 infra1 dhcpd[24849]: DHCPREQUEST for 10.245.208.170 (10.245.208.31) from 52:54:00:50:fd:5b via broam: lease owned by peer
syslog:Sep 20 07:10:18 infra1 dhcpd[24849]: DHCPREQUEST for 10.245.208.170 (10.245.208.31) from 52:54:00:50:fd:5b via broam: lease owned by peer
syslog:Sep 20 07:10:19 infra1 maas.interface: [info] eno2 (physical) on infra1: New MAC, IP binding observed: 52:54:00:50:fd:5b, 10.245.208.170
Later on it tries to request that IP again and the primary rack server offers it a different IP. It, however, wants the IP the secondary rack server is offering:
syslog:Sep 20 07:53:20 infra1 dhcpd[24849]: DHCPDISCOVER from 52:54:00:50:fd:5b via broam
syslog:Sep 20 07:53:20 infra1 dhcpd[24849]: DHCPOFFER on 10.245.208.222 to 52:54:00:50:fd:5b via broam
syslog:Sep 20 07:53:21 infra1 dhcpd[24849]: DHCPDISCOVER from 52:54:00:50:fd:5b via broam
syslog:Sep 20 07:53:23 infra1 dhcpd[24849]: DHCPREQUEST for 10.245.208.170 (10.245.208.31) from 52:54:00:50:fd:5b via broam: lease 10.245.208.170 unavailable.
Here's syslog from the secondary rack controller, showing it offering the 10.245.208.170 address at the same time the primary rack controller is offering the 10.245.208.22 address:
http:// paste.ubuntu. com/25579275/
The two offers:
<secondary> Sep 20 07:53:21 infra2 dhcpd[20205]: DHCPOFFER on 10.245.208.170 to 52:54:00:50:fd:5b via broam
<primary> Sep 20 07:53:21 infra1 dhcpd[24849]: DHCPOFFER on 10.245.208.222 to 52:54:00:50:fd:5b
So the question is, why is the secondary rack server offering an IP? Why is the "load balance to peer" thing happening?