Comment 8 for bug 1747764

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1747764] Re: [2.3, ha] rack controller HA fails during a network partition

I blocked all IP traffic between the isolated system and the other two
systems, so it couldn't talk either via RPC or HTTP. dhcpd never
stopped on the isolated system, and is still running right now.

On Tue, Feb 6, 2018 at 4:03 PM, Blake Rouse <email address hidden> wrote:
> It is designed to stop DHCPD if the rack controller cannot talk to any
> region controllers. Just because you prevented the rack controller from
> talking to the region over HTTP did you prevent the RPC connections?
> That is a different port.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1747764
>
> Title:
> [2.3, ha] rack controller HA fails during a network partition
>
> Status in MAAS:
> Incomplete
>
> Bug description:
> I have an HA setup with 3 MAAS controllers, each running rack
> controllers and region controllers.
>
> On two of the three controllers, I used iptables to drop traffic from
> the third, to simulate a network partition.
>
> Then I instructed MAAS to deploy a node. The node powered on fine,
> but when it started PXE booting, the third isolated rack controller
> responded to the DHCP request, gave it an IP, and told it to talk to
> it via tftp to get its pxelinux.cfg.
>
> That rack controller was unable to provide the pxelinux.cfg because it
> couldn't reach the region controller via the VIP due to the network
> partition, and the node failed to PXE boot.
>
> I think that the isolated rack controller should not be running DHCP.
> If a rack controller can't reach the region controller, it can't
> handle PXE booting a node, and shouldn't try. If it would not have
> responded, one of the functional rack controllers would have and it
> would be fine.
>
> In the attached logs, 10.245.31.4 is the node that was isolated. I
> started the isolation at about 21:15.
>
> This is with 2.3.0-6434-gd354690-0ubuntu1~16.04.1.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/maas/+bug/1747764/+subscriptions