Removing lead unit from existing service deployment causes failure in related services due to incomplete contexts
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
keystone (Juju Charms Collection) |
Fix Released
|
High
|
Liam Young | ||
percona-cluster (Juju Charms Collection) |
Fix Released
|
High
|
Liam Young | ||
rabbitmq-server (Juju Charms Collection) |
Fix Released
|
High
|
Liam Young |
Bug Description
We have an OpenStack deployment using Trusty/Icehouse and 3 node HA clusters for the core infra services.
The HA instances are shmooshed onto 3 physical MAAS deployed nodes using LXC.
We recently required to remove a percona unit due to hardware issues on the parent physical node for replacement with new hardware.
After removing a unit with juju remove-unit percona-cluster/1 and terminating the machine we experienced DB related issues although percona and corosync etc looked OK.
Nova list returned no instances, juju status failed due to access to deployed environment bootstrap nodes as this affected the neutron gateway database access, which caused an outage for all VMs/environments deployed due to lack of tenant networking.
We were able to confirm that nova.conf, neutron.conf etc on the infrastructure services was missing the sql_connection details, and juju unit logs for nova-cloud-
2014-08-11 11:49:59 INFO juju-log shared-db:142: Missing required data: database_password database_host
Manually causing a config-changed hook to fire with a juju set did not resolve the issue, ultimately across all infrastructure services with a database relation we removed the shared-db relation, and then added it again. After doing this and bouncing the neutron nodes service was restored.
Charms that support HA and clustering need to support scale down as well as scale up without knock on effects to dependent services.
Related branches
- Marco Ceppi (community): Approve
-
Diff: 900 lines (+667/-24)11 files modifiedhooks/charmhelpers/contrib/hahelpers/cluster.py (+55/-13)
hooks/charmhelpers/contrib/network/ip.py (+19/-1)
hooks/charmhelpers/contrib/peerstorage/__init__.py (+50/-0)
hooks/charmhelpers/core/hookenv.py (+2/-1)
hooks/charmhelpers/core/host.py (+34/-1)
hooks/charmhelpers/core/services/__init__.py (+2/-0)
hooks/charmhelpers/core/services/base.py (+305/-0)
hooks/charmhelpers/core/services/helpers.py (+125/-0)
hooks/charmhelpers/core/templating.py (+51/-0)
hooks/charmhelpers/fetch/__init__.py (+1/-0)
hooks/percona_hooks.py (+23/-8)
- Adam Israel (community): Approve
- Review Queue (community): Approve (cbt)
- Kevin W Monroe: Needs Fixing
- charmers: Pending requested
-
Diff: 51 lines (+17/-2)1 file modifiedhooks/rabbitmq_server_relations.py (+17/-2)
- OpenStack Charmers: Pending requested
-
Diff: 124 lines (+25/-9)4 files modifiedhooks/keystone_hooks.py (+15/-2)
hooks/keystone_utils.py (+5/-4)
unit_tests/test_keystone_hooks.py (+2/-1)
unit_tests/test_keystone_utils.py (+3/-2)
summary: |
- Removing unit from existing cluster causes shared-db failure to related - services + Removing unit from existing percona cluster causes shared-db failure to + related services |
description: | updated |
Changed in charms: | |
importance: | Undecided → High |
summary: |
- Removing unit from existing percona cluster causes shared-db failure to + Removing lead unit from existing service deployment causes failure to related services |
summary: |
- Removing lead unit from existing service deployment causes failure to - related services + Removing lead unit from existing service deployment causes failure in + related services due to incomplete contexts |
Changed in percona-cluster (Juju Charms Collection): | |
assignee: | nobody → Liam Young (gnuoy) |
Changed in rabbitmq-server (Juju Charms Collection): | |
assignee: | nobody → Liam Young (gnuoy) |
Changed in percona-cluster (Juju Charms Collection): | |
status: | Triaged → Fix Committed |
Changed in rabbitmq-server (Juju Charms Collection): | |
status: | Triaged → In Progress |
Changed in keystone (Juju Charms Collection): | |
status: | Triaged → In Progress |
tags: | added: canonical-bootstack canonical-is |
tags: | added: openstack |
Changed in keystone (Juju Charms Collection): | |
status: | In Progress → Fix Released |
Changed in percona-cluster (Juju Charms Collection): | |
status: | Fix Committed → Fix Released |
Changed in rabbitmq-server (Juju Charms Collection): | |
status: | In Progress → Fix Released |
I see the issue; only the leader sets the username/password, and as we don't have hooks (yet) in Juju for when the leader of a service changes, we can't set it somewhere else easily when the existing leader is removed.
This will be easily fixed when the leader election features land in Juju; in the meantime we need to work something into the charms to deal with this.
I think the same problem will happen with any charm that provides credentials so raising a bug task for rabbitmq and keystone as well.