telegraf haproxy input broken with Juju >= 2.8.7

Bug #1910974 reported by Thomas Cuthbert on 2021-01-11
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Content Cache Charm
Low
Thomas Cuthbert
Telegraf Charm
Low
Thomas Cuthbert
juju
High
Joseph Phillips

Bug Description

# Code that configures the haproxy telegraf plugin.
## notice that it hasn't been changed since 2016.
29b50c58 src/reactive/telegraf.py (Xav Paice 2020-08-07 15:04:46 +1200 913) addr = rel["private-address"]
^a7836fc reactive/telegraf.py (Guillermo Gonzalez 2016-04-29 18:24:43 -0300 914) if addr == hookenv.unit_private_ip():
^a7836fc reactive/telegraf.py (Guillermo Gonzalez 2016-04-29 18:24:43 -0300 915) addr = "localhost"

# The problem
That code would work if we called unit-get public-address, see below it returns the fqdn which would satisfy line 914

juju run --unit content-cache-1ss/1 "relation-get -r haproxy-statistics:34 - telegraf-1ss/5"
ingress-address: darkbowser.canonical.com
private-address: darkbowser.canonical.com

juju run --unit telegraf-1ss/5 "relation-get -r haproxy:34 - content-cache-1ss/1"
enabled: "True"
ingress-address: darkbowser.canonical.com

port: "10000"
private-address: darkbowser.canonical.com
user: haproxy

juju run --unit telegraf-1ss/5 "unit-get private-address"
91.189.91.43

(mojo-prod-snapstore-content-cache)prod-snapstore-content-cache@wekufe:~$ juju run --unit telegraf-1ss/5 "unit-get private-address"
91.189.91.43
(mojo-prod-snapstore-content-cache)prod-snapstore-content-cache@wekufe:~$ =^C
(mojo-prod-snapstore-content-cache)prod-snapstore-content-cache@wekufe:~$ juju run --unit telegraf-1ss/5 "unit-get public-address"
darkbowser.canonical.com

Related branches

Haw Loeung (hloeung) wrote :

Not a bug in the content-cache charm.

Changed in content-cache-charm:
status: New → Invalid
Haw Loeung (hloeung) wrote :

I think this is a bug in the telegraf charm, this logic here[1]:

| addr = rel["private-address"]
| if addr == hookenv.unit_private_ip():
| addr = "localhost"

If this is a 'subordinate' charm, would there ever be a case where it's not scraping haproxy stats from a locally running haproxy instance?

[1]https://git.launchpad.net/charm-telegraf/tree/src/reactive/telegraf.py#n913

Haw Loeung (hloeung) wrote :

I think the best solution here is to update the haproxy[1] and content-cache[2] to pass through the listen address for haproxy statistics, they both already pass through the username, password, and port. Then have the telegraf charm check the presence of this and use it if it's something other than 0.0.0.0. If it doesn't exist, try work it out with the existing code (rel["private-address"] and hookenv.unit_private_ip()).

For the HAProxy charm, it defaults to statistics on 0.0.0.0 so why it doesn't appear broken with the latest Juju version - localhost or private-address, HAProxy statistics will answer on both interfaces. For the Content-cache charm, it configures statistics to listen on 127.0.0.1.

[1]https://bazaar.launchpad.net/~haproxy-team/charm-haproxy/trunk/view/head:/hooks/hooks.py#L1437
[2]https://git.launchpad.net/~hloeung/content-cache-charm/tree/reactive/content_cache.py#n483

Joseph Phillips (manadart) wrote :

The relation settings must have been written/re-written with the FQDN as the private address. The charm logic that does that might be of interest.

Note that for manually provisioned machines, the provider always returns the address used to provision it (presumably, "juju add-machine ssh:<email address hidden>") when queried.

"unit-get" appears to be getting an IP address because it is using the NetworkInfo method, which queries link-layer devices.

Something curious though; can you check your logs for errors? Based on the model.yaml on your private fileshare, all of your link-layer device addresses have an origin of "provider". This is set as an upgrade step and I would expect that many of these would be relinquished to the machine since.

Joseph Phillips (manadart) wrote :

Actually scratch that. Manual machines are not updated by the instance-poller, so the addresses will remain as they are.

Joseph Phillips (manadart) wrote :

This is happening upon (re)entering relation scope. I am looking into it.

This bug is not against Juju, but it looks like a symptom of https://bugs.launchpad.net/bugs/1911135.

Joseph Phillips (manadart) wrote :

I have got to the bottom of this.

Juju changed behaviour for the "network-get" tool, pushing host name resolution from the hook context to be behind the API backing.

What was missed is that for "unit-get private-address", we call the same backing API first before falling back if required to the "private-address" set on the hook context.

So now, "unit-get private-address" will return FQDNs as IPs where it can resolve them.

Although this is an unintended change (2.8.6 -> 2.8.7), using "unit-get" for addresses is deprecated and intended for removal in Juju 3.0. The network-get tool should be used instead.

Will your use-case work if instead of comparing the results from "relation-get" and "unit-get", we use "network-get --ingress-address ..." in each case, using "--relation <rel>" for the first?

Changed in juju:
status: New → In Progress
importance: Undecided → High
assignee: nobody → Joseph Phillips (manadart)
milestone: none → 2.8.8
Joseph Phillips (manadart) wrote :

https://github.com/juju/juju/pull/12528 addresses this for 2.8.8.

Changed in juju:
status: In Progress → Fix Committed
Haw Loeung (hloeung) on 2021-01-28
Changed in charm-telegraf:
assignee: nobody → Thomas Cuthbert (tcuthbert)
status: New → Fix Committed
Changed in content-cache-charm:
assignee: nobody → Thomas Cuthbert (tcuthbert)
importance: Undecided → Low
Changed in charm-telegraf:
importance: Undecided → Low
Changed in content-cache-charm:
status: Invalid → Fix Committed
Xiyue Wang (ziyiwang) on 2021-02-03
Changed in charm-telegraf:
status: Fix Committed → Fix Released
milestone: none → 21.01
Changed in content-cache-charm:
status: Fix Committed → Fix Released
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers