need to update rabbit_hosts on non-controller nodes after deploying controllers

Bug #1368445 reported by Andrew Woodward
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Dima Shulyak

Bug Description

rabbit_hosts will be a list of controllers at deployment time of that role
so if I deploy one controller in HA mode, there will only be one item in it
if i deploy 3 controllers, it will have 3 items in it
but if i deploy computes when there is only one controller
and then add more controllers later
and the compute role (or rabbit_hosts its self) isn't updated later, then it will still have the first controller in the list.
this is equally true when a single controller might be replaced.

this is fine for controllers, because all controllers re-run puppet controller role when a single controller role is deployed.

however, this isn't ok for roles like computes which will have an old list of controllers

this could be solved by running the compute role again, or just running a short manifest like for /etc/hosts via astute

for computes we need to ensure that we update neutron.conf and nova.conf and restart neutron-ovs-agent and nova-compute. Maybe others too

We need to check if other roles need this aswell

More info in the IRC conversation: http://irclog.perlgeek.de/fuel-dev/2014-09-11 (xarses, mihgen)

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Set High priority as it affects operations story. I believe we should have test for it. Can be easily checked by the following:

Lightweight test, can be considered as extension to HA-scale up test:
1) Run HA deployment with one controller
2) Add one/two more controllers, deploy
3) Check rabbit_hosts in compute config file, it must have >1 IP address in the list (2 or 3, depends how many controllers finally in env)

Heavy test, but can handle more issues:
1) Run HA deployment with one controller and one compute, run OSTF
2) Add two more controllers, run OSTF
3) Destroy first(initial) controller, run OSTF.
OSTF should pass in all 3 steps. It won't in step #3 at the current moment without redeployment of compute nodes on every controller-add.

Changed in fuel:
importance: Undecided → High
Mike Scherbakov (mihgen)
description: updated
Andrew Woodward (xarses)
description: updated
Andrew Woodward (xarses)
description: updated
Revision history for this message
Dima Shulyak (dshulyak) wrote :

We can introduce this is as additional step for post deployment in astute
it will be quite easy right now, but will add additional complexity into astute,
or we can add this is as additional tasks in our granular deployment feature,
which will be executed on computes when installing additional controllers.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/132549

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Dima Shulyak (dshulyak)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/132549
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=1d3982f606b5d0d0cba78b52a5806abee6a97918
Submitter: Jenkins
Branch: master

commit 1d3982f606b5d0d0cba78b52a5806abee6a97918
Author: Dima Shulyak <email address hidden>
Date: Mon Nov 3 13:37:37 2014 +0200

    Redeploy cinder/compute nodes if new controller added

    Addition of new controller affects cluster messaging,
    particularly cinder/computes nodes which is reliant on
    rabbitmq_hosts settings

    In current orchestration model - the only way to
    fix configuration on nodes is to redeploy them.
    Configuration will be applied from /etc/astute.yaml and
    it is not enough to simply add additional PostDeployment
    method in astute

    Introduced additional setting for roles_metadata:
    - update_required
      should store list of roles that dependends on this role

    On deployment stage:
    - make update_required list for the whole cluster
    - select ready nodes without pending_roles and deploy them

    No migration added to keep behaviour on old clusters as it is

    DocImpact
    Closes-Bug: 1368445

    Change-Id: I1735a8b06531018b1240726f5faa4f7ce6e6a631

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.