deploying the candidate charm stuck in maintenance status

Bug #1988312 reported by Tom Haddon
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Telegraf Charm
Fix Released
High
Robert Gildein

Bug Description

Deploying the following bundle in LXD on Juju 2.9.33

series: jammy
applications:
  ubuntu:
    scale: 1
    charm: ch:ubuntu
  telegraf:
    charm: ch:telegraf
    channel: candidate
relations:
  - ["ubuntu", "telegraf"]

My juju status is stuck as follows:

Model Controller Cloud/Region Version SLA Timestamp
tele-test localhost-localhost localhost/localhost 2.9.33 unsupported 15:28:35+02:00

App Version Status Scale Charm Channel Rev Exposed Message
telegraf maintenance 1 telegraf candidate 55 no Installing python3-netifaces,python3-yaml,sysstat,telegraf
ubuntu 22.04 active 1 ubuntu stable 20 no

Unit Workload Agent Machine Public address Ports Message
ubuntu/0* active idle 0 10.144.51.60
  telegraf/0* maintenance idle 10.144.51.60 Installing python3-netifaces,python3-yaml,sysstat,telegraf

Machine State Address Inst id Series AZ Message
0 started 10.144.51.60 juju-c85c5c-0 jammy Running

Tags: bseng-382

Related branches

Eric Chen (eric-chen)
tags: added: bseng-382
Changed in charm-telegraf:
importance: Undecided → High
Changed in charm-telegraf:
assignee: nobody → Robert Gildein (rgildein)
status: New → In Progress
Revision history for this message
Robert Gildein (rgildein) wrote :

I could reproduced this error from 49 revision above
and the cause of the issue is MP [1].

The issue is that function `configure_telegraf()` is run only
when "plugins.prometheus-client.configured" flag is set, but this
flag is set only in two functions:
1. `configure_prometheus_client_with_relation(prometheus)` - Prometheus relation
2. `configure_prometheus_client()`

And function `configure_prometheus_client()` runs only when `telegraf.configured`,
which is set only in `configure_telegraf()`. So without Prometheus relation
none of the functions `configure_telegraf()` or `configure_prometheus_client()`
will be triggered.

I tried to reproduced it (issue mentioned in MP [1]) with deploying
revision 49 as follow:

1.
 - `juju deploy ch:telegraf --revision 49 --channel stable --series focal`
 - change `install_method` configuration to `snap`
 - run `hooks/upgrade-charm`
2.
 - `juju deploy ch:telegraf --revision 49 --channel stable --series focal --config install_method=snap`
 - run `hooks/upgrade-charm`
3.
 - download charm with `juju download telegraf --series xenial --channel stable`
 - deploy charm from local file
 - change `install_method` configuration to `snap`
 - `juju upgrade-charm <name> --switch ch:telegraf-49 --channel stable`

3.
 - download charm with `juju download telegraf --series xenial --channel stable`
 - deploy charm from local file with `--config install_method=snap`
 - `juju upgrade-charm <name> --switch ch:telegraf-49 --channel stable`

None of the approaches reproduced the error.

In this situation we have two options:
1. revert changes made in [1]
2. use `@when("telegraf.installed")`

---
[1]: https://code.launchpad.net/~hloeung/charm-telegraf/+git/charm-telegraf/+merge/428004

Revision history for this message
Simon Fels (morphis) wrote :

The Anbox Cloud CI is running into this currently and it's causing our tests to fail as we wait for the Juju model to converge to active by using juju-wait (which waits for all workload status to switch to active).

Reproducing is rather simple here:

$ juju add-model lp1988312
$ juju deploy ubuntu
$ juju deploy telegraf
$ juju relate ubuntu telegraf

Once everything has settled

Model Controller Cloud/Region Version SLA Timestamp
lp1988312 dev-controller localhost/localhost 2.9.33 unsupported 16:12:22+02:00

App Version Status Scale Charm Channel Rev Exposed Message
telegraf maintenance 1 telegraf stable 57 no Installing python3-netifaces,python3-yaml,sysstat,telegraf
ubuntu 20.04 active 1 ubuntu stable 20 no

Unit Workload Agent Machine Public address Ports Message
ubuntu/0* active idle 0 10.25.83.190
  telegraf/0* maintenance idle 10.25.83.190 Installing python3-netifaces,python3-yaml,sysstat,telegraf

Machine State Address Inst id Series AZ Message
0 started 10.25.83.190 juju-8977bc-0 focal Running

We're going to ignore the telegraf units for now when calling juju-wait

Revision history for this message
Tom Haddon (mthaddon) wrote :

Confirmed this works for me now with the candidate channel for jammy.

I've also tested with the stable channel on focal and that seems to have been reverted to revision 54 as well (Simon has revision 57 above). Confirmed that works for me too.

Tianqi Xiao (txiao)
Changed in charm-telegraf:
milestone: none → 22.10
status: In Progress → Fix Committed
Tianqi Xiao (txiao)
Changed in charm-telegraf:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers