'secret-changed' event not emitted/processed (occurs on pipelines ONLY)

Bug #2064876 reported by Judit Novak
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
In Progress
High
Yang Kelvin Liu

Bug Description

I'm coming across this most peculiar syndrome.
The `secret-changed` event is being emitted when running the folowing pipeline on AWS. However there is NO `secret-changed` event (despite VERY long waiting delays) on the pipeline.

Environment: Juju 3.1, LXD (AWS: c5a.8xlarge)

In order to support debugging, I've added a separate branch with the impacted code, and debug messages to indicate secret changes and secret-changed event handlers being triggered (or not).

Essentially when the same code is executed on AWS, `secret-changed` events are triggered as expected (allowing for the charm to reconcile and change status accordingly).

Running the code on a pipeline, the `secret-changed` events are skipped. (Thus the charm is stuck in an outdated configuration state.)

C
What you find on the pipeline is this:

unit-opensearch-dashboards-1: 21:47:23 DEBUG unit.opensearch-dashboards/1.juju-log certificates:7: [SECRET_CHANGE_DEBUG] New content is different from old content: True
unit-opensearch-dashboards-1: 21:47:23 INFO unit.opensearch-dashboards/1.juju-log certificates:7: [SECRET_CHANGE_DEBUG] New certfificates fetched, peer secret updated

The same on AWS looks like this:

unit-opensearch-dashboards-1: 22:32:43 DEBUG unit.opensearch-dashboards/1.juju-log certificates:7: [SECRET_CHANGE_DEBUG] New content is different from old content: True
unit-opensearch-dashboards-1: 22:32:43 INFO unit.opensearch-dashboards/1.juju-log certificates:7: [SECRET_CHANGE_DEBUG] New certfificates fetched, peer secret updated
unit-opensearch-dashboards-1: 22:32:44 DEBUG unit.opensearch-dashboards/1.juju-log [SECRET_CHANGE_DEBUG] Secret changed event (libs) dashboard_peers.opensearch-dashboards.unit -- OpenSearch relation handler
unit-opensearch-dashboards-1: 22:32:44 DEBUG unit.opensearch-dashboards/1.juju-log [SECRET_CHANGE_DEBUG] Secret changed event (charm) dashboard_peers.opensearch-dashboards.unit
unit-opensearch-dashboards-1: 22:32:44 DEBUG unit.opensearch-dashboards/1.juju-log [SECRET_CHANGE_DEBUG] Reconciling
unit-opensearch-dashboards-1: 22:32:44 DEBUG unit.opensearch-dashboards/1.juju-log [SECRET_CHANGE_DEBUG] Fetched secret with label dashboard_peers.opensearch-dashboards.unit
unit-opensearch-dashboards-1: 22:32:44 DEBUG unit.opensearch-dashboards/1.juju-log [SECRET_CHANGE_DEBUG] Fetched secret with label opensearch_client.6.user.secret
unit-opensearch-dashboards-1: 22:32:44 DEBUG unit.opensearch-dashboards/1.juju-log [SECRET_CHANGE_DEBUG] Fetched secret with label opensearch_client.6.tls.secret
unit-opensearch-dashboards-1: 22:32:44 INFO unit.opensearch-dashboards/1.juju-log [SECRET_CHANGE_DEBUG] Unit status ActiveStatus('')

There are NO log entries for `secret-changed` events in the pipeline logs.
There's a number of them in the AWS logs (attached).

NOTE the ***2 minutes*** sleep ensuring that it's not a matter of timing

Regarding the demonstrative PR:
 - in case you are worried for complexity: I'm glad to offer a sync to highlight on the essentials for debugging
 - despite the name of the pipeline, NO HA TESTS are invoked for this debugging
   - the syndrome happens on a simple relate/integrate
 - the pipeline failure is exactly due to the missing `secret-changed` event
   - TLS-related changes never make it to the config
 - the "failing" 'Check libs' pipeline is to signify that I've added a LOT of custom logging to data-platform-libs to facilitate debugging
   - we aplogize for the high complexity in data-platform-libs (we hope to simplify that soon)
 - NOTE that if you sufficienty incrase the time delay, the `update-status` hook will trigger a reconcile() -- thus the missing information will be populated.

Demonstrative code: https://github.com/canonical/opensearch-dashboards-operator/pull/25
Example pipeline: https://github.com/canonical/opensearch-dashboards-operator/actions/runs/8953365858/job/24591875944

 - it's the ha/test_network_cut.py pipeline to investigate
 - note the "Upload logs" step at the bottom -- you can retrieve the full `debug-log` here

Judit Novak (juditnovak)
summary: - 'secret-changed' event emitted/processed
+ 'secret-changed' event not emitted/processed
Ian Booth (wallyworld)
Changed in juju:
milestone: none → 3.3.5
status: New → Triaged
importance: Undecided → High
Raúl Zamora (zmraul)
tags: added: canonical-data-platform-eng
Judit Novak (juditnovak)
summary: - 'secret-changed' event not emitted/processed
+ 'secret-changed' event not emitted/processed (occurs on pipelines ONLY)
Changed in juju:
status: Triaged → In Progress
assignee: nobody → Yang Kelvin Liu (kelvin.liu)
Revision history for this message
Yang Kelvin Liu (kelvin.liu) wrote :

Hi Judit
Do we have a reproducing step or steps to deploy locally for me to investigate?

Revision history for this message
Judit Novak (juditnovak) wrote :

In case a live sync is possible, I'd guide you through that way. If not, I write the steps here.

I mean...

The point is that... it is not possible to reproduce locally or on AWS.
It only happens on the pipeline.

So I can explain how to run the same code locally, but the bug won't appear. That's exactly the problem :-) :-(

Changed in juju:
milestone: 3.3.5 → 3.3.6
Revision history for this message
Judit Novak (juditnovak) wrote :

All right. Going over the problem again with John Meniel, revealed the detail where the devil should be hiding.

The pipeline is on Juju 3.1.6. Now, raising `secret-changed` event for the secret owner (in this case: the unit) was introduced in Juju version 3.1.7. (Which, or above must have been installed on local/AWS executions -- thus "all just worked".)

I'm double-checking whether the theory stands (right after my week of holidays).
But it's more than likely. Worries over the ticket could be suspended until :-)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.