custom roles cannot update other nodes on deletion

Bug #1485505 reported by Andrey Sledzinskiy
36
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Fuel Plugins
Confirmed
Undecided
Unassigned
Fuel for OpenStack
Fix Committed
High
Vladimir Kuklin
7.0.x
Won't Fix
High
Fuel Python (Deprecated)
8.0.x
Won't Fix
High
Alexander Adamov
Mitaka
Fix Released
High
Vladimir Kuklin

Bug Description

Deleted rabbitmq node isn't deleted from amqp_hosts

UPD: Other steps to reproduce - https://bugs.launchpad.net/fuel/7.0.x/+bug/1485505/comments/10

Steps:
1. Install detach-rabbit plugin on master node (https://github.com/mattymo/detach-rabbitmq)
2. Create next cluster - Ubuntu, all default values, separate_rabbit enabled, 3 controllers, 3 rabbitmq_nodes, 1 compute, 1 cinder
3. Deploy cluster
4. After deployment add 1 rabbit node and -redeploy
5. After re-deployment delete one of rabbit nodes and re-deploy
6. After re-deployment check 'hiera amqp_hosts' on all nodes

Expected - only 3 running rabbit nodes are displayed

Actual - all 4 rabbit nodes are displayed

{

    "build_id": "2015-08-14_02-59-14",
    "build_number": "170",
    "release_versions":

{

    "2015.1.0-7.0":

{

    "VERSION":

{

    "build_id": "2015-08-14_02-59-14",
    "build_number": "170",
    "api": "1.0",
    "fuel-library_sha": "a155b3b271704a1fb67ecea2e439893b923a0fdf",
    "nailgun_sha": "0089f1ba1782bd56f3d8c6729ebc23dd5c7d3698",
    "feature_groups":

            [
                "mirantis"
            ],
            "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c",
            "openstack_version": "2015.1.0-7.0",
            "fuel-agent_sha": "57145b1d8804389304cd04322ba0fb3dc9d30327",
            "production": "docker",
            "python-fuelclient_sha": "f7284870598d84a9954efb88847b9c4b71d359b7",
            "astute_sha": "371bfd25f62fd4db39b2b1ae1d93ba76553e11c7",
            "fuel-ostf_sha": "17786b86b78e5b66d2b1c15500186648df10c63d",
            "release": "7.0",
            "fuelmain_sha": "aa89341dcebf127f3239b0a67550fdb3925d552c"
        }
    }

},
"auth_required": true,
"api": "1.0",
"fuel-library_sha": "a155b3b271704a1fb67ecea2e439893b923a0fdf",
"nailgun_sha": "0089f1ba1782bd56f3d8c6729ebc23dd5c7d3698",
"feature_groups":

    [
        "mirantis"
    ],
    "fuel-nailgun-agent_sha": "e01693992d7a0304d926b922b43f3b747c35964c",
    "openstack_version": "2015.1.0-7.0",
    "fuel-agent_sha": "57145b1d8804389304cd04322ba0fb3dc9d30327",
    "production": "docker",
    "python-fuelclient_sha": "f7284870598d84a9954efb88847b9c4b71d359b7",
    "astute_sha": "371bfd25f62fd4db39b2b1ae1d93ba76553e11c7",
    "fuel-ostf_sha": "17786b86b78e5b66d2b1c15500186648df10c63d",
    "release": "7.0",
    "fuelmain_sha": "aa89341dcebf127f3239b0a67550fdb3925d552c"

}

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Changed in fuel:
status: New → Invalid
tags: added: feature-plugins
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

Sorry, code hasn't been moved yet, I updated link

description: updated
Changed in fuel:
status: Invalid → New
status: New → Invalid
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Matthew Mosesohn (raytrac3r)
status: Invalid → Incomplete
Igor (ipukha)
Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → Igor (ipukha)
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

Set to incomplete as it is not clear should we track plugins' bugs in Fuel project scope or in Fuel plugins scope

Igor (ipukha)
Changed in fuel:
assignee: Igor (ipukha) → nobody
Changed in fuel:
status: Incomplete → Confirmed
assignee: nobody → Fuel Library Team (fuel-library)
tags: added: life-cycle-management
Revision history for this message
Dmitry Borodaenko (angdraug) wrote :

Nastya, please comment about change of bug status from Incomplete to Confirmed: why is this tracked as a Fuel bug? Is detach-rabbitmq plugin needed to reproduce it? How did you confirm that it's a bug in Fuel and not in the plugin?

Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Matthew Mosesohn (raytrac3r)
Changed in fuel-plugins:
assignee: nobody → Matthew Mosesohn (raytrac3r)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

This is a nailgun issue. It seems that update_required role field will update other nodes on addition, but not on removal.
http://paste.openstack.org/show/z1KXnWoGrw0qSrBKIZZE/

Changed in fuel:
assignee: Matthew Mosesohn (raytrac3r) → Fuel Python Team (fuel-python)
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

I think we should just add node.pending_deletion to this if statement: https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/task/helpers.py#L169

Dmitry Pyzhov (dpyzhov)
tags: added: feature
Changed in fuel-plugins:
status: New → Confirmed
Revision history for this message
Dmitry Pyzhov (dpyzhov) wrote :

It is dangerous to merge in 7.0. But 'separate services' feature cannot be fully tested without this fix. So we are going to fix it and merge into master after HCF. Keeping in 7.0 release in order to have it on the plate for the team. Adding 'non-release' tag in order to highlight that this fix does not affect HCF.

tags: added: non-release
Revision history for this message
Dima Shulyak (dshulyak) wrote :

There is legit workaround for this case. After operator removed node, he should re-run all required tasks with CLI [0].
Ofcourse he need to know exactly what tasks needs to be executed, but in worst case it is possible just to re-run whole node, like

  fuel node --node <id> --deploy

0. https://review.openstack.org/#/c/161192/20/pages/reference-architecture/task-deployment/0020-api.rst

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 7.0 → 8.0
tags: removed: non-release
Changed in fuel:
status: Confirmed → Triaged
summary: - Deleted rabbitmq node isn't deleted from amqp_hosts
+ custom roles cannot update other nodes on deletion
Changed in fuel-plugins:
assignee: Matthew Mosesohn (raytrac3r) → nobody
Dmitry Pyzhov (dpyzhov)
tags: added: area-python
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :
description: updated
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :

Addition to #10:
file /etc/hiera/override/detach-keystone wasn't updated and contains
http://paste.openstack.org/show/478650/

Changed in fuel:
milestone: 8.0 → 9.0
status: Triaged → New
Artem Roma (aroma-x)
Changed in fuel:
status: New → Confirmed
Revision history for this message
Vladimir Khlyunev (vkhlyunev) wrote :
tags: added: swarm-blocker
Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

Well, since Fuel 8.0 we support "reexecute_on" tasks. So plugin developers could add task:

  - id: my-task
    groups: [primary-controller, controller]
    reexecute_on: [deploy_changes]

and that would mean: "Run this task on all controllers *EACH* time deploy changes is pressed".

So I believe plugin developer could handle scale-down case using this approach.

no longer affects: fuel/future
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

So for 8.0 we can workaround this, but it doesn't resolve the issue that update_required field only applies on new nodes, but deletion doesn't work.

Revision history for this message
Simon Pasquier (simon-pasquier) wrote :

I confirm that using the reexecute_on stanza works as expected since this is what we're using for the LMA toolchain since MOS 8.

And I agree with Matt: the behavior should be consistent between scale up and scale down operations. It's very frustrating right now for plugin developers to understand exactly when the tasks are executed or not.

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

Matt, Simon,

I agree with you folks, but this issue *is not* about plugin SDK. It's about detached rabbit plugin. And the solution is to use what we have now. We simply can't *solve* scale down trigger problem two days before HCF. So you either fix it in your plugin, or I move bug to next release (since it's not regression, and feature request).

Revision history for this message
Dmitriy Novakovskiy (dnovakovskiy) wrote :

This has to proceed in the following way:
1) SDK needs to be updated with instructions for plugin developers on how to handle "downscale" case right (in plugins like detached Keystone, separated RabbitMQ, etc)
2) Upon SDK update, Mirantis developers of detached Keystone and separated RabbitMQ plugins need to be notified on how to make changes in their plugins

With these 2 conditions we can move this bug out of 8.0 and treat it as a feature request/Pluggable Framework enhancement in one of the upcoming cycles

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

Dmitry,

We can and we should treat it as a feature request. It's something what we never supported. We simply have no time to implement this enhancement.

Yet, what the action items? Who is going to write this documentation?

Revision history for this message
Ihor Kalnytskyi (ikalnytskyi) wrote :

Well, I've created a bug #1541329 to track limitations of Fuel Plugin SDK.

Doc Team, could you please contact Simon P. and document the workaround of this problem? It would be very useful for plugin developers.

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

reexecute_on: [deploy_changes] doesn't reflect the use case and adds even more burden to plugin developers. If I'm a plugin developer, I need to override the following base tasks if I want to update rabbitmq and memcache nodes list:
${plugin}_hiera_config
hiera
globals (actually this is already in master)
cluster
memcache

But then you will re-execute these tasks on controller as well... and again on add/remove of controllers, cinder nodes, compute nodes... etc. I want to narrow the scope of repeating puppet just for the case of adding/removing a node with my plugin's role.

The fact remains that a parameter "update_required" exists and has a very clear criteria, but does not work as expected.

tags: added: release-notes
Revision history for this message
Alexander Adamov (aadamov) wrote :
tags: added: release-notes-done
removed: release-notes
Revision history for this message
Bug Checker Bot (esikachev-l) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

version

expected result

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

The original bug still exists: only controller role can be redeployed on node deletion. You can repeat individual tasks, but only applying to ALL changes. Either we need a better set of conditions for "redeploy-on:" field or fix update_required node role to work for deletion as well.

This bug is NOT high for detach-rabbitmq plugin because controller is updated on node deletion and the list is accurate for listing all amqp hosts.

Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

I think this issue was reproduced on 9.0 iso #101

https://product-ci.infra.mirantis.net/job/9.0.system_test.ubuntu.plugins.thread_keystone_separate_services/56/console

Test: separate_keystone_service_add_delete_node

2016-03-24 02:01:19,732 - ERROR decorators.py:123 -- Traceback (most recent call last):
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.plugins.thread_keystone_separate_services/fuelweb_test/helpers/decorators.py", line 117, in wrapper
    result = func(*args, **kwargs)
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.plugins.thread_keystone_separate_services/fuelweb_test/tests/tests_separate_services/test_separate_keystone.py", line 269, in separate_keystone_service_add_delete_node
    cmd='hiera memcache_roles')
  File "/home/jenkins/workspace/9.0.system_test.ubuntu.plugins.thread_keystone_separate_services/fuelweb_test/helpers/checkers.py", line 1132, in check_hiera_hosts
    ' others'.format(node['hostname']))
  File "/home/jenkins/venv-nailgun-tests-2.9/local/lib/python2.7/site-packages/proboscis/asserts.py", line 163, in assert_true
    raise ASSERTION_ERROR(message)
AssertionError: Hosts on node node-8 differ from others

Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

LCM changes will land soon. I'm assigning to Vladimir Kuklin and he can pass it back to me when the capability is introduced for 9.0.

tags: removed: need-info
Revision history for this message
Matthew Mosesohn (raytrac3r) wrote :

Now we can define 2.1.0 tasks in 9.0. An example task is here:
https://review.openstack.org/#/c/298341/6/deployment/puppet/osnailyfacter/modular/cluster/tasks.yaml

if network_scheme is updated (added/deleted nodes), we can repeat a given task.

Maksym Strukov (unbelll)
tags: added: on-verification
Revision history for this message
Maksym Strukov (unbelll) wrote :
Revision history for this message
Maksym Strukov (unbelll) wrote :

Verified as fixed in 9.0-mos-485

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.