[2.6-beta1] Configuration changes do not propagate correctly with HA enabled

Bug #1825500 reported by Florian Guitton on 2019-04-19
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
Joseph Phillips

Bug Description

Hello everyone !

I hope this Easter period treats you well.
I was testing the new capabilities of cross-cloud controllers in Juju 2.6-beta1 when I realised configuration changes are only propagated at times or seemingly not committed to database.
I am aware of the checksuming changes that were brought in recently, but it seems to be very inconsistent.

I have a Juju controller deployed on MAAS with HA enabled, 3 nodes.
All three controllers do not report any error in their logs. When I change the configuration of an application the config-updates triggers on the agents, but no commit is made to the database.
When I try to make the changes multiple times in a row, it will accept the input every time, proceed to triggering the config-update hook on all agent every time, but will indeed not propagate the new values nor commit them. Illustrated by the following:

root@:~# juju config percona-cluster os-access-hostname
foo.example.com
root@:~# juju config percona-cluster os-access-hostname=foo.example.com
WARNING the configuration setting "os-access-hostname" already has the value "foo.example.com"
root@:~# juju config percona-cluster os-access-hostname=bar.example.com
root@:~# juju config percona-cluster os-access-hostname
foo.example.com
root@:~# juju config percona-cluster os-access-hostname=bar.example.com
root@:~# juju config percona-cluster os-access-hostname=bar.example.com
root@:~# juju config percona-cluster os-access-hostname=bar.example.com
root@:~# juju config percona-cluster os-access-hostname
foo.example.com

I am happy to send anything that might be of use, just let me know.
Thank you very much for your support.

Best wishes,

description: updated
Ian Booth (wallyworld) wrote :

Just a note - the config checksums are used to prevent unnecessary firing of the config-changed hook on units. The fact that the bug report seems to say that the config-change hooks are correctly firing each time a config change is made suggests that the change is correctly being stored and propagated to the unit agents. The config checksum is only used on the unit agents, not elsewhere.

Perhaps there's a mongo replication issue. It could be that the config change is being written to the primary, but the secondaries are stale. I can't recall off hand if read operations are allowed to be done on secondaries. You could open a mongo client session on the primary and check the db content directly to ensure the updated config is being written. Or you could run rs.status() in the mongo shell to see that everything is healthy.

Regardless, we'll try and reproduce.

Changed in juju:
milestone: none → 2.6-rc1
importance: Undecided → High
status: New → Triaged

We don't read from secondaries. This looks more like the config isn't
getting set.

John
=:->

On Fri, Apr 19, 2019, 14:50 Ian Booth <email address hidden> wrote:

> Just a note - the config checksums are used to prevent unnecessary
> firing of the config-changed hook on units. The fact that the bug report
> seems to say that the config-change hooks are correctly firing each time
> a config change is made suggests that the change is correctly being
> stored and propagated to the unit agents. The config checksum is only
> used on the unit agents, not elsewhere.
>
> Perhaps there's a mongo replication issue. It could be that the config
> change is being written to the primary, but the secondaries are stale. I
> can't recall off hand if read operations are allowed to be done on
> secondaries. You could open a mongo client session on the primary and
> check the db content directly to ensure the updated config is being
> written. Or you could run rs.status() in the mongo shell to see that
> everything is healthy.
>
> Regardless, we'll try and reproduce.
>
>
> ** Changed in: juju
> Milestone: None => 2.6-rc1
>
> ** Changed in: juju
> Importance: Undecided => High
>
> ** Changed in: juju
> Status: New => Triaged
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1825500
>
> Title:
> [2.6-beta1] Configuration changes do not propagate correctly with HA
> enabled
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1825500/+subscriptions
>

On Fri, Apr 19, 2019, 14:50 Ian Booth <email address hidden> wrote:

> Just a note - the config checksums are used to prevent unnecessary
> firing of the config-changed hook on units. The fact that the bug report
> seems to say that the config-change hooks are correctly firing each time
> a config change is made suggests that the change is correctly being
> stored and propagated to the unit agents. The config checksum is only
> used on the unit agents, not elsewhere.
>
> Perhaps there's a mongo replication issue. It could be that the config
> change is being written to the primary, but the secondaries are stale. I
> can't recall off hand if read operations are allowed to be done on
> secondaries. You could open a mongo client session on the primary and
> check the db content directly to ensure the updated config is being
> written. Or you could run rs.status() in the mongo shell to see that
> everything is healthy.
>
> Regardless, we'll try and reproduce.
>
>
> ** Changed in: juju
> Milestone: None => 2.6-rc1
>
> ** Changed in: juju
> Importance: Undecided => High
>
> ** Changed in: juju
> Status: New => Triaged
>
> --
> You received this bug notification because you are subscribed to juju.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1825500
>
> Title:
> [2.6-beta1] Configuration changes do not propagate correctly with HA
> enabled
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1825500/+subscriptions
>

Florian Guitton (f-guitton) wrote :
Download full text (3.8 KiB)

Dear Ian,

Thank you for being so quick to respond.
I have run a rs.status() and rs.slaveOk() command on the mongo replica set.
Everything looks in order.

I have also run a query on the Primary for the settings on the app (also attached). I can see that the database record for the application seem to have been update on the mongo, yet the #next record never progresses. The command lines keeps on accepting new values but only returns the original one.

-----Original Message-----
From: <email address hidden> <email address hidden> On Behalf Of Ian Booth
Sent: 19 April 2019 11:38
To: Guitton, Florian L P <email address hidden>
Subject: [Bug 1825500] Re: [2.6-beta1] Configuration changes do not propagate correctly with HA enabled

Just a note - the config checksums are used to prevent unnecessary firing of the config-changed hook on units. The fact that the bug report seems to say that the config-change hooks are correctly firing each time a config change is made suggests that the change is correctly being stored and propagated to the unit agents. The config checksum is only used on the unit agents, not elsewhere.

Perhaps there's a mongo replication issue. It could be that the config change is being written to the primary, but the secondaries are stale. I can't recall off hand if read operations are allowed to be done on secondaries. You could open a mongo client session on the primary and check the db content directly to ensure the updated config is being written. Or you could run rs.status() in the mongo shell to see that everything is healthy.

Regardless, we'll try and reproduce.

** Changed in: juju
    Milestone: None => 2.6-rc1

** Changed in: juju
   Importance: Undecided => High

** Changed in: juju
       Status: New => Triaged

--
You received this bug notification because you are subscribed to the bug report.
https://bugs.launchpad.net/bugs/1825500

Title:
  [2.6-beta1] Configuration changes do not propagate correctly with HA
  enabled

Status in juju:
  Triaged

Bug description:
  Hello everyone !

  I hope this Easter period treats you well.
  I was testing the new capabilities of cross-cloud controllers in Juju 2.6-beta1 when I realised configuration changes are only propagated at times or seemingly not committed to database.
  I am aware of the checksuming changes that were brought in recently, but it seems to be very inconsistent.

  I have a Juju controller deployed on MAAS with HA enabled, 3 nodes.
  All three controllers do not report any error in their logs. When I change the configuration of an application the config-updates triggers on the agents, but no commit is made to the database.
  When I try to make the changes multiple times in a row, it will accept the input every time, proceed to triggering the config-update hook on all agent every time, but will indeed not propagate the new values nor commit them. Illustrated by the following:

  root@:~# juju config percona-cluster os-access-hostname
  foo.example.com
  root@:~# juju config percona-cluster os-access-hostname=foo.example.com
  WARNING the configuration setting "os-access-hostname" already has the value "foo.example.com"
  root@:~# juju config percona-clus...

Read more...

Florian Guitton (f-guitton) wrote :

Oops, apologies for the email reply quack.

Florian Guitton (f-guitton) wrote :

Playing around a little bit I am also noticing that Juju GUI and Juju CLI are updating different mongo documents on the replica set for the same application. And changes with Juju GUI don't trigger the update hook.

Tim Penhey (thumper) wrote :

Tested with HA LXD controllers with tip of 2.6 branch today.

I can't reproduce. Also watched the database closely with logging to check, and all writes were occurring as expected, and information propagating out as expected.

Changed in juju:
status: Triaged → Incomplete
Joseph Phillips (manadart) wrote :

Can you look in ~/.local/share/juju/models.yaml and paste the YAML for the model you are working with?

If the configuration is being written to a key suffixed with "#next", this looks like a bug leaking out generations functionality from behind the feature flag.

Florian Guitton (f-guitton) wrote :

Here is the content of the models.yaml:

root@juju-playground:~# cat ~/.local/share/juju/models.yaml
controllers:
  dsi-juju-controller:
    models:
      admin/controller:
        uuid: d3084b9d-7841-4ba8-8530-b43f4a8cdb80
        type: iaas
        branch: ""
      admin/dsi-r1-ceph:
        uuid: 96731720-ae69-497a-87c6-35b453e04dfe
        type: iaas
        branch: ""
      admin/dsi-r1-os:
        uuid: 306dddc9-7dc3-4b12-8eac-005943f9ca83
        type: iaas
        branch: master
    current-model: admin/dsi-r1-os

Changed in juju:
status: Incomplete → Fix Committed
assignee: nobody → Joseph Phillips (manadart)
Joseph Phillips (manadart) wrote :

This should be addressed in the imminent release candidate.

It looks to be an error stemming from work in progress that should be dormant unless the "generations" feature flag is active.

Documents in the "settings" collection are no longer modified based on the working branch. I.e. nothing is written with a "#next" key suffix.

If this turns out not to be the case, please re-open this issue.

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers