unit-get public address is unreliable in install hook

Bug #1910973 reported by Liam Young
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Won't Fix
Undecided
Joseph Phillips

Bug Description

I have recently starting to see charms sometime go into an error state during the install hook. I *think* this is since I upgrade my client and controller to 2.9-rc3. The error comes from:

`unit-get --format=json public-address` failing.

I have only been running tests using the openstack provider.

I have created a small charm cs:~gnuoy/addr-test-0 to reproduce the error. It just runs `unit-get public-address` multiple times when a hook is called. It seems the errors only happen in the install hook, if the charm gets passed the install hook then the error does not occur.

I would estimate that I see this error 10% to 20% of the time.

Revision history for this message
Liam Young (gnuoy) wrote :

I am deploying cs:~gnuoy/addr-test-0 in batches of 20. In recent tests:

Run 1: 0 failures
Run 2: 3 failures
Run 3: 1 failure
Run 4: 2 failures
Run 5: 2 failures

Revision history for this message
Liam Young (gnuoy) wrote :

I recreated my controller and reran the same test with juju 2.8 and I didn't see the issue at all. Just in case there had been some issue with my first controller I recreated the controller back at 2.9-rc3 and reran again and the issue appeared again:

Run 1: 0 failures
Run 2: 0 failures
Run 3: 1 failure
Run 4: 5 failures
Run 5: 3 failures

Revision history for this message
Liam Young (gnuoy) wrote :

Logs from units and controller

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.9-rc4
importance: Undecided → High
status: New → Triaged
Revision history for this message
Joseph Phillips (manadart) wrote :

I can see what this is. It's a race.

Upon creation the hook context members for private/public address are set.

It is possible if created very early, that the unit's machine has not yet been assigned a preferred private or public address, in which case accessing the public address returns a "not found" error as seen here.

I will see what avenues are available to us here.

Changed in juju:
assignee: nobody → Joseph Phillips (manadart)
Revision history for this message
Joseph Phillips (manadart) wrote :

I've discussed this with Ian.

Using unit-get to retrieve public/private addresses is deprecated, and planned for removal in Juju 3.0.

Instead, charm logic reasoning about public addresses should call network-get and use the result from the "ingress-addresses" key.

Changed in juju:
milestone: 2.9-rc4 → none
status: Triaged → Won't Fix
Revision history for this message
Joseph Phillips (manadart) wrote :

Neglected to note above; the API backing for network-get implements a polling strategy to give time for the relevant addresses to be populated, which works around the specific issue here.

Changed in juju:
importance: High → Undecided
Revision history for this message
John A Meinel (jameinel) wrote :

This is running on IAAS, though, and not K8s (at least the original post was 'on the openstack provider').
We certainly would have the unit running on a machine by the time you came to install.

It *is* true that you very likely hit install before the pod is up on k8s (since you haven't set a pod spec for it to exist yet). But there doesn't seem like there should be any reason to not have a private address on IAAS.

Revision history for this message
Joseph Phillips (manadart) wrote :

This is `unit-get --format=json public-address` though.

If you look at where it is set on the context in worker/uniter/runner/context/contextfactory.go, there is a specific comment that it may not have been set in time for a hook execution.

In fact "install" is probably the only hook where it could be accessed quickly enough to not be found.

There is an easy work-around here - we supply public address on-demand and eschew setting it at context creation. The question is a strategic one; do we fix this tool that we want to deprecate, or push for preferred access path (which as of today seems be the subject of contention regarding whether it returns a suitable address).

Revision history for this message
John A Meinel (jameinel) wrote :

We talked through this a bit in the last session with them. One question is:

a) the actual failure seems to be calling 'unit-get private-address' (public?) in a loop
That speaks to something failing talking to the Uniter (if it is cached in the context, then the unit agent shouldn't be calling back to the controller).

Which should be investigated and understood vs just the model around whether this should be available or not.

b) making it strictly lazy probably makes this worse rather than better, but you're right that it could eliminate one of the round trips to the controller on every hook invocation.

c) A better fix would be to have a single API call for all of the context, rather than filling the context out slowly with a bunch of requests.

Revision history for this message
John A Meinel (jameinel) wrote :

Also (d) it would be better to make the shared code *more* binding aware rather than having it assume there is a default that fits all use cases. (if you want to share code, as soon as that shared code gets used 2 times, then default can't be the best answer. also exposing default means that it is 'yet another binding' that isn't as independent as the rest of the bindings because changing default has impact with otherwise unspecified endpoints)

Revision history for this message
Liam Young (gnuoy) wrote :

An example of the bug in the keystone charm

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.