Nailgun node status wasn't changed to "error" after a deletion of node's networkgroup

Bug #1644630 reported by Sergey Novikov
22
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Dmitry
Nominated for Ocata by Georgy Kibardin
Newton
Fix Committed
High
Georgy Kibardin

Bug Description

Detailed bug description:
 the issue was found by https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.thread_7/135/testReport/(root)/delete_custom_nodegroup/

Steps to reproduce:
        1. Deploy cluster with custom nodegroup
        3. Reset cluster
        4. Remove custom nodegroup
        5. Check nodes from custom nodegroup have 'error' status
        6. Re-create custom nodegroup and upload saved network configuration
        7. Assign 'error' nodes to new nodegroup
        8. Check nodes from custom nodegroup are in 'discover' state
Expected results: all is fine
Actual result: step #5 fails

Description of the environment:
snapshot #549

Revision history for this message
Sergey Novikov (snovikov) wrote :
Changed in fuel:
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
importance: Undecided → Medium
status: New → Confirmed
tags: added: area-library
tags: added: area-python
removed: area-library
Changed in fuel:
importance: Medium → High
tags: added: swarm-fail
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

I am not sure that this behaviour is not expected. From what I see - if we have a cluster that was reset - there is no need to mark nodes as error at all. Error is the status of some real 'action' not of juggling the nodes in the UI within almost brand new cluster.

Changed in fuel:
status: Confirmed → Incomplete
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Fuel QA Team (fuel-qa)
status: Incomplete → Confirmed
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

> I am not sure that this behaviour is not expected

That's definitely expected [0] unless it was changed by some another feature/bugfix.

https://specs.openstack.org/openstack/fuel-specs/specs/8.0/multi-rack-static.html#notifications-impact

Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

After many re-tries I could get the error state after 4 Step on my local env:

[root@nailgun ~]# fuel node
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---+----------+---------------------+---------+------------+-------------------+-------+---------------+--------+---------
 1 | error | slave-04_controller | 1 | 10.109.7.3 | 64:0a:33:89:23:da | | controller | 1 |
 4 | discover | slave-01_compute | 1 | 10.109.0.3 | 64:30:c8:41:e4:29 | | compute | 1 | 1
 5 | discover | slave-02_cinder | 1 | 10.109.0.4 | 64:18:52:c1:08:39 | | cinder | 1 | 1
 3 | error | slave-05_controller | 1 | 10.109.7.4 | 64:f7:74:89:07:16 | | controller | 1 |
 2 | error | slave-06_controller | 1 | 10.109.7.5 | 64:fd:85:11:22:4e | | controller | 1 |

Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Fuel Sustaining (fuel-sustaining-team)
Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Okay, so, according to the nailgun code, there is not a single place where nodes status is reset to error on node group deletion. It looks like this is a regression, but at the same time there is a simple workaround and it can be easily fixed. This means that this bug should be classified as medium and may be targeted to later versions

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Anyways, the regression seems to be introduced between 21st and 23rd of December

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/422166

Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Vladimir Kuklin (vkuklin)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/422179

Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Georgy Kibardin (gkibardin)
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 9.2 → 9.3
Changed in fuel:
assignee: Georgy Kibardin (gkibardin) → Bulat Gaifullin (bulat.gaifullin)
Changed in fuel:
assignee: Bulat Gaifullin (bulat.gaifullin) → Georgy Kibardin (gkibardin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/422166
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=cd2ee13830dbed7405cf5587c0fa30cf1f1f97fd
Submitter: Jenkins
Branch: master

commit cd2ee13830dbed7405cf5587c0fa30cf1f1f97fd
Author: Vladimir Kuklin <email address hidden>
Date: Wed Jan 18 21:46:42 2017 +0300

    Set nodes' statuses to 'error' when their nodegroup is deleted

    According to the bug below and the spec, we did not implement
    one multirack feature aspect.

    https://specs.openstack.org/openstack/fuel-specs/specs/8.0/multi-rack-static.html#notifications-impact

    Now we add resetting node to error to node group deletion callback and
    send a notification.

    Change-Id: I6b2bae5601ba7dbca620bb3861e95b0e554f8699
    Closes-bug: #1644630

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/431441

Changed in fuel:
status: Fix Committed → In Progress
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

Changed to 'in progress' as fix was not merged to stable/mitaka. Please add correct targets.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/newton)

Reviewed: https://review.openstack.org/431441
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=f26a168b72ded08c7982eb9f869b6539f28cd3cf
Submitter: Jenkins
Branch: stable/newton

commit f26a168b72ded08c7982eb9f869b6539f28cd3cf
Author: Vladimir Kuklin <email address hidden>
Date: Wed Jan 18 21:46:42 2017 +0300

    Set nodes' statuses to 'error' when their nodegroup is deleted

    According to the bug below and the spec, we did not implement
    one multirack feature aspect.

    https://specs.openstack.org/openstack/fuel-specs/specs/8.0/multi-rack-static.html#notifications-impact

    Now we add resetting node to error to node group deletion callback and
    send a notification.

    Change-Id: I6b2bae5601ba7dbca620bb3861e95b0e554f8699
    Closes-bug: #1644630
    (cherry picked from commit cd2ee13830dbed7405cf5587c0fa30cf1f1f97fd)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/fuel-web 11.0.0.0rc1

This issue was fixed in the openstack/fuel-web 11.0.0.0rc1 release candidate.

Changed in fuel:
milestone: 9.x-updates → 9.2-mu-2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/422179
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=e0f33824b49c80950350ad2b5774600b7777d8fc
Submitter: Jenkins
Branch: stable/mitaka

commit e0f33824b49c80950350ad2b5774600b7777d8fc
Author: Vladimir Kuklin <email address hidden>
Date: Wed Jan 18 21:46:42 2017 +0300

    Set nodes' statuses to 'error' when their nodegroup is deleted

    According to the bug below and the spec, we did not implement
    one multirack feature aspect.

    https://specs.openstack.org/openstack/fuel-specs/specs/8.0/multi-rack-static.html#notifications-impact

    Now we add resetting node to error to node group deletion callback and
    send a notification.

    Change-Id: I6b2bae5601ba7dbca620bb3861e95b0e554f8699
    Closes-bug: #1644630
    (cherry picked from commit cd2ee13830dbed7405cf5587c0fa30cf1f1f97fd)

tags: added: in-stable-mitaka
Changed in fuel:
status: In Progress → Fix Committed
Dmitry (dtsapikov)
tags: added: on-verification
Revision history for this message
Dmitry (dtsapikov) wrote :

The problem was reproduced on 9.2+mu2.
Step #5 fails.

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Denis Meltsaykin (dmeltsaykin) wrote :

Moving from 9.2-mu-2 to mu-3 since it's not critical and takes more time to re-investigate.

Changed in fuel:
milestone: 9.2-mu-2 → 9.2-mu-3
Changed in fuel:
assignee: Georgy Kibardin (gkibardin) → Alexey Stupnikov (astupnikov)
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Worked for me in a lab, so moving back to Fix Commited. Assigning to dtsapikov to verify and Release.

Changed in fuel:
assignee: Alexey Stupnikov (astupnikov) → Dmitry (dtsapikov)
Changed in fuel:
status: Confirmed → Fix Committed
Revision history for this message
Vladimir Jigulin (vjigulin) wrote :

Verified on 9.2 + mu3 proposed repo

With proposed:
New notifications: "Node 'node-4' nodegroup was deleted which means that it may not be able to boot correctly unless it is a member of another node group admin network" and
"Node 'node-4' has IP '10.109.20.3' that does not match any Admin network"
And status for nodes from removed nodegroup become error

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.