Some Relations hooks not firing over CMR

Bug #2022855 reported by Liam Young
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth

Bug Description

Juju version: 3.2.0-genericlinux-amd64

In a setup with cross model relations such that:

Machine Model K8s Model
hypvisor (requires) <--ceph-access--> cinder-ceph (provides)
microceph (provides) <--ceph--> cinder-ceph (requires)

In a setup with cross model relations as shown above not all hooks
fire. On one unit on one side of the relation only the hooks related
to the relation fire (created and broken) but no hooks related to
there being a unit on the other end (joined and changed hooks are
missing). This can also be seen using relation-list which also shows
no remote unit.

Reproduce (jammy machine):

# Setup ssh keys for manual provider to provision this machine.
[ -f $HOME/.ssh/id_rsa ] || ssh-keygen -b 4096 -f $HOME/.ssh/id_rsa -t rsa -N ""
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
ssh-keyscan -H $(hostname --all-ip-addresses) >> $HOME/.ssh/known_hosts
ssh-copy-id $(uname -n)

sudo snap install juju --channel 3.2/stable

echo "clouds:
  manual:
    type: manual
    endpoint: $(uname -n)
    regions:
      default:
        endpoint: $(uname -n)
" > ~/clouds.yaml
mkdir -p ~/.local/share
juju add-cloud --client manual -f ~/clouds.yaml
juju bootstrap manual

MACHINE_MODEL="admin/controller"
K8S_MODEL="openstack"

juju deploy -m $MACHINE_MODEL --to 0 --series jammy microk8s
juju config -m $MACHINE_MODEL microk8s addons='dns hostpath-storage'
juju switch $MACHINE_MODEL

# wait for deployment to complete

mkdir -p ~/.kube;
juju ssh -m $MACHINE_MODEL microk8s/0 'microk8s config' > ~/.kube/config
cat ~/.kube/config | juju add-k8s k8s-cloud --controller manual-default

juju add-model $K8S_MODEL k8s-cloud
juju model-config -m $K8S_MODEL workload-storage=microk8s-hostpath

juju deploy -m $MACHINE_MODEL --channel 2023.1/edge/gnuoy \
  --series jammy --to 0 openstack-hypervisor hypervisor
juju deploy -m $MACHINE_MODEL --channel edge \
  --series jammy microceph --to 0
juju deploy -m $K8S_MODEL --channel 2023.1/edge/gnuoy \
  --series jammy cinder-ceph-k8s cinder-ceph
juju offer $K8S_MODEL.cinder-ceph:ceph-access
juju offer $MACHINE_MODEL.microceph:ceph
juju relate -m $MACHINE_MODEL \
  hypervisor:ceph-access admin/$K8S_MODEL.cinder-ceph
juju relate -m $K8S_MODEL $MACHINE_MODEL.microceph cinder-ceph

# wait for charms to be idle in both models to complete (ignore status messages
# about data or integrations being incomplete)

juju exec -m $MACHINE_MODEL --unit hypervisor/0 "relation-ids ceph-access"
ceph-access:2
juju exec -m $MACHINE_MODEL --unit hypervisor/0 "relation-list -r ceph-access:2"
cinder-ceph/0
juju exec -m $MACHINE_MODEL --unit microceph/0 "relation-ids ceph"
ceph:3
juju exec -m $MACHINE_MODEL --unit microceph/0 "relation-list -r ceph:3"

juju debug-log -m $MACHINE_MODEL --replay | grep "ceph-relation"
unit-microceph-0: 13:39:35 INFO juju.worker.uniter.operation ran "ceph-relation-created" hook (via hook dispatching script: dispatch)

# joined and created hooks are missing

juju debug-log -m $K8S_MODEL --replay | grep "ceph-relation"
unit-cinder-ceph-0: 13:38:49 INFO juju.worker.uniter.operation ran "ceph-relation-created" hook (via hook dispatching script: dispatch)
unit-cinder-ceph-0: 13:41:15 INFO juju.worker.uniter.operation ran "ceph-relation-joined" hook (via hook dispatching script: dispatch)
unit-cinder-ceph-0: 13:41:16 INFO juju.worker.uniter.operation ran "ceph-relation-changed" hook (via hook dispatching script: dispatch)

# joined and created hooks did correctly fire on the k8s side of the relation

Revision history for this message
Liam Young (gnuoy) wrote :

Attaching the debug logs from both models (with cmr logging) and a dump of the db

Revision history for this message
Ian Booth (wallyworld) wrote :

Thanks for the attachments.
It would be wonderful to get the database dumps in YAML format - one file per (offering, consuming) model, like so:

JUJU_DEV_FEATURE_FLAGS=developer-mode juju dump-db -m <modelname>

It's a PITA to convert the bson one file at a time to get a holistic view of what we need to look at.

Changed in juju:
importance: Undecided → High
Revision history for this message
Ian Booth (wallyworld) wrote :

Is this a regression on 3.2 or do you also see it on 2.9 or 3.1?

Revision history for this message
Liam Young (gnuoy) wrote :

Attaching the debug logs from both models (with cmr logging) and a dump of the db using "juju dump-db"

Revision history for this message
Liam Young (gnuoy) wrote :

The issue also seems to exist in 3.1

John A Meinel (jameinel)
description: updated
Revision history for this message
Ian Booth (wallyworld) wrote :

It looks like a duplicate internal token is being saved in the machine model due to the same app from the k8s model acting as both a consumer and an offer.

Changed in juju:
milestone: none → 2.9.44
status: New → Confirmed
Ian Booth (wallyworld)
Changed in juju:
status: Confirmed → In Progress
assignee: nobody → Ian Booth (wallyworld)
Revision history for this message
Ian Booth (wallyworld) wrote :
Ian Booth (wallyworld)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.