Comment 19 for bug 1924780

Revision history for this message
Gabriel Cocenza (gabrielcocenza) wrote : Re: easyrsa install hook fails on public address not found

Looking at the first crashdump debug_log.txt it's quite confusing to see the IP used in machine-1, unit-easyrsa-0 and controller changes 3 times in a short period of time. The fact is that the final public address in juju-status is 34.229.60.140 for machine-1 and easyrsa.

The time line that I saw is:

Line 1396:
machine-1: 2021-04-15 17:22:32 DEBUG juju.network including address public:34.239.49.54 for machine
Note: It receives a public address in the beginning

Line 1528:
 machine-1: 2021-04-15 17:22:33 DEBUG juju.network including address local-cloud:172.31.43.239 for machine
Note: public address disappears

Line 1684:
controller-0: 2021-04-15 17:22:36 INFO juju.worker.instancepoller machine "1" (instance ID "i-0b2d8dd9ad6b30451") has new addresses: [local-cloud:172.31.43.239@alpha public:34.229.60.140@alpha]
Note: controller gives the final public address for the machine and easyrsa

Line 1691
machine-1: 2021-04-15 17:22:36 DEBUG juju.worker.machiner observed network config updated for "machine-1"...
Note: does not contain the final public address and contains the IP from line 1528

Line 3276:
machine-1: 2021-04-15 17:23:16 ERROR unit.easyrsa/0.juju-log Hook error:

My two cents is that when the hook is triggered the private address is at a transition which causes the error. I think that those transitions might give a false trigger that the machine is "ready" to start the unit when it actually it's not .

I would like some opinion on how could we make the charms more resilient to this. Currently I see two options:

1) Change charm-helpers
Changing the function unit_public_ip [1] on charm-helpers and add the decorator [2] retry_on_exception.

This might be simple and effective to solve the problem in charms that are facing this problem, but OTOH I don't know if would be a good match with Juju and operator framework wouldn't benefit from this.

2) Treat the error in each charm
Knowing that we might face a concurrency, every charm should have a logic to try and except when dealing with private address. The charms would need to log the problem and wait for the next hook to run. In this case we will need to document in some place that it's a expected behavior and the charm should be prepared if this issue happens

[1] https://github.com/juju/charm-helpers/blob/master/charmhelpers/core/hookenv.py#L874-L876
[2] https://github.com/juju/charm-helpers/blob/master/charmhelpers/core/decorators.py#L30