Comment 1 for bug 1844605

Revision history for this message
Billy Olsen (billy-olsen) wrote :

This appears to be a bug in both the calico and canal charms.

FWICT, the problem arises when the calico/canal charm does not invoke the canal_upgrade.complete(). This occurs when the leader unit is on the same node that a kubernetes-master is installed on. The canal_upgrade.complete() is called as part of the upgrade_v3_complete() method, which requires that the network policy controller is deployed. The problem is that the *only* a worker node will actually deploy the network policy controller, so if the leader is on a master the calico.npc.deployed flag will never be set and the upgrade will not be marked as completed.

Furthermore, the network policy controller is deployed by *all* of the worker nodes, which on the surface doesn't feel necessary as nothing in the rendered deployment yaml is specific to the local node. I believe the calico.npc.deployed should be moved to leader storage to ensure it is only deployed a single time.

The immediate fix should be relatively straightforward, remove the 'cni.is-worker' check from the deploy_network_policy_controller method in reactive/calico.py.

I believe an improved version would be to use leader storage only (and not local storage) for the calico.npc.deployed configuration. The leadership.is_leader flag should then be added to the deploy_network_policy_controller so that only one node deploys the policy controller.