Canonical Juju

IP address sometimes not set or incorrect on pebble_ready event

Bug #1929364 reported by Ben Hoyt on 2021-05-24

This bug affects 3 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Fix Released	High	Harry Pidcock	Canonical Juju 2.9.7

Bug Description

Per the report from "sed-i" in https://github.com/canonical/operator/issues/538, sometimes when he deployed and then added more units, occasionally the IP address is not ready when the pebble_ready arrives at the charm. Details from canonical/operator#538:

ENVIRONMENT

mickrok8s in multipass (4 cpus, 8G ram):

$ juju --version
2.9.0-ubuntu-amd64

$ microk8s kubectl version
Client Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.7-34+df7df22a741dbc", GitCommit:"df7df22a741dbc18dc3de3000b2393a1e3c32d36", GitTreeState:"clean", BuildDate:"2021-05-12T21:08:20Z", GoVersion:"go1.15.10", Compiler:"gc", Platform:"linux/amd64"}

DESCRIPTION

When I deploy and then add 3 units (alertmanager https://github.com/sed-i/alertmanager-operator/tree/feature/pebblization, in my case), occasionally the ip address is not ready.
This does not happen if I manually (slowly) add the units one by one.

Expected: An IP address is ready (and correct) by the time pebble_ready fires.
Actual: Occasionally, when adding multiple units at once, IP address is not available (or old) when queried from within pebble_ready.

REPRODUCIBLE SCENARIO

While adding 3 units

Very consistently, the following code assigns None to bind_address, for 1-2 of the added units:

relation = self.model.get_relation("replicas")
bind_address = self.model.get_binding(relation).network.bind_address

Similarly, unit-get occasionally returns an empty string under the same circumstances:

bind_address = check_output(["unit-get", "private-address"]).decode().strip()

When restarting the machine

I have 4 units running, and then suddenly I sudo reboot. When the application (alertmanager) starts up, bind_address returns the IP address from the previous boot.

Tags:

Jon Seager (jnsgruk) on 2021-05-24

tags:	added: sidecar-charn
tags:	added: sidecar-charm removed: sidecar-charn

Revision history for this message

John A Meinel (jameinel) wrote on 2021-05-24:

It makes sense that for pod-spec charms they might not have a bind address at the time the charm code runs, since if they haven't declared a pod spec yet, there is no workload pod.
However, the with sidecar charms, the pod is running both the charm container and the workload container, and by the time pebble_ready fires there should be a valid IP address for the pod. (I would actually expect us to have a valid IP by the time 'install' fires.)

Changed in juju:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 2.9.4

Revision history for this message

Leon (sed-i) wrote on 2021-05-27:

Per suggestion of ~jnsgruk: `ip a` (via subprocess.check_output) shows a correct ip address while at the same time `bind_address` returns None.

Canonical Juju QA Bot (juju-qa-bot) on 2021-06-02

Changed in juju:
milestone:	2.9.4 → 2.9.5

Revision history for this message

Canonical Juju QA Bot (juju-qa-bot) wrote on 2021-06-04:

I suspect the root cause here is as per https://bugs.launchpad.net/bugs/1930649

There's a PR to fix the above bug
https://github.com/juju/juju/pull/13049

Hopefully retesting with the above fix will show it's solved.

Revision history for this message

Leon (sed-i) wrote on 2021-06-04:

In juju 2.9.5 (maybe [this](https://github.com/juju/juju/pull/13049) PR) some entries were automatically added to the peer data bag:
```
self.model.get_relation("replicas").data.keys() =
KeysView({
  <ops.model.Unit alertmanager-k8s/0>: {
    'egress-subnets': '10.152.183.222/32',
    'ingress-address': '10.152.183.222',
    'private-address': '10.152.183.222',
    'private_address': '10.1.157.125'}, # added by me
  <ops.model.Application alertmanager-k8s>: {}
})
```

Note the difference between:
1. `'private-address': '10.152.183.222'` - auto populated - application address
2. `'private_address': '10.1.157.125'` - populated manually by me - unit address

**Shouldn't the auto-populated `private-address` be the unit address instead of the app address?**

Model Controller Cloud/Region Version SLA Timestamp
dev-model my-ctrlr microk8s/localhost 2.9.5 unsupported 11:09:18-04:00

App Version Status Scale Charm Store Channel Rev OS Address Message
alertmanager-k8s active 1 alertmanager-k8s local 0 kubernetes 10.152.183.222

Unit Workload Agent Address Ports Message
alertmanager-k8s/0* active idle 10.1.157.125

Relation provider Requirer Interface Type Message
alertmanager-k8s:replicas alertmanager-k8s:replicas alertmanager-replica peer

Revision history for this message

Leon (sed-i) wrote on 2021-06-04 (last edit on 2021-06-04):

Also, bind_address occasionally returns None from within "on_peer_joined" event. AFAIU when in "on_peer_joined", an IP address should be guaranteed. Observed with Juju 2.9.5.

Canonical Juju QA Bot (juju-qa-bot) on 2021-06-16

Changed in juju:
milestone:	2.9.5 → 2.9.6

Harry Pidcock (hpidcock) on 2021-06-17

Changed in juju:
assignee:	nobody → Harry Pidcock (hpidcock)
status:	Triaged → In Progress

Revision history for this message

Harry Pidcock (hpidcock) wrote on 2021-06-21:

https://github.com/juju/juju/pull/13092

Canonical Juju QA Bot (juju-qa-bot) on 2021-06-21

Changed in juju:
milestone:	2.9.6 → 2.9.7

Harry Pidcock (hpidcock) on 2021-06-22

Changed in juju:
status:	In Progress → Fix Committed
milestone:	2.9.7 → 3.0.0
milestone:	3.0.0 → 2.9.7
status:	Fix Committed → Fix Released

Harry Pidcock (hpidcock) on 2021-06-22

Changed in juju:
milestone:	2.9.7 → 2.9.6

John A Meinel (jameinel) on 2021-06-28

Changed in juju:
milestone:	2.9.6 → 2.9.7
status:	Fix Released → Fix Committed

Canonical Juju QA Bot (juju-qa-bot) on 2021-06-30

Changed in juju:
status:	Fix Committed → Fix Released

Revision history for this message

Leon (sed-i) wrote on 2021-12-06:

Image Pasted at 2021-12-6 18-43.png Edit (71.7 KiB, image/png)

This is still an issue with Juju 2.9.21

Revision history for this message

Ryan Barry (rbarry) wrote on 2021-12-07:

The issue present in Juju 2.9.21 seems to be that, during some sequence of events (@Leon-mintz can maybe provide logs), a sidecar charm unit ends up in a scenario where it has no address which can be used for the binding to a peer relation.

The pod *is* up and running, so there is an address, just not one which `network-get peer-relation` returns.

From our POV, a charm should *always* have binding addresses to interfaces which it provides as long as the pod is up and charm code is running. That it may not is definitely a bug.

Revision history for this message

Simon Aronsson (0x12b) wrote on 2022-02-02:

Yes, this is very much still an issue on 2.9.22 and 2.9.23.