There is currently no periodic task that ensures that a conductor's local PXE boot environment (the tftp config file and cached kernels & ramdisks) matches the set of nodes which are currently mapped to that host by the hash ring. This could get out of sync if, eg, the size of the hash ring changes, the conductor was temporarily offline and a node was created or deleted, or the configuration option for number of hash replicas was changed and the service restarted.
Related to this, ironic.common.hash_ring defines the hash_distribution_replicas option. The purpose of this option is to provide a faster fail-over time when a conductor goes offline by allowing the next conductor in the ring to pre-cache the environment. The "do_node_deploy" RPC message will only be sent to the first conductor that the node is mapped to; if # replicas is greater than one, the additional conductor(s) should precache the kernel & ramdisk from within this periodic task. Note that they should not cache the entire user image; that is only needed during the act of deploying.
Within this periodic task, the conductor should compare its locally-cached deploy environments with the list of nodes mapped to it, and then either prepare or clean up those deployment environments as appropriate.
Fix proposed to branch: master /review. openstack. org/92115
Review: https:/