After successful cluster reset, slave node is marked as 'online', but it is unavailable

Bug #1588193 reported by Artem Panchenko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Georgy Kibardin

Bug Description

Right after environment reset ('reset_environment' task is 'ready'), slave node could be marked as 'online' and 'discover', but actually it is unavailable (see bug #1586422). I believe it happens because node erasing action takes less than 2 minutes (online/offline timeout): since 9.0 release we erase only MBR https://github.com/openstack/fuel-astute/commit/e770d4ec7d302e958ffae8db87e633e9d5e3db91 So nodes online status isn't changed to 'False' before 'reset_environment' task is completed.

Seems marking target slave node as 'offline' from Astute, before erasing its drives, is a good fix this issue.

Steps to reproduce:

 1. Reset some environment
 2. Right after reset task is finished (e.g. done on GUI), try to SSH to erased 'online' nodes

Expected result:

there are no online nodes (which belong to reset cluster) or all online nodes are accessible via SSH and in 'bootstrap' state

Actual result:

online nodes are not accessible via SSH (still rebooting after erase)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-qa (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/324267

Changed in fuel:
status: New → Confirmed
tags: added: area-python area-qa
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/324267
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=2cb874ce00416817c2286b5b921e226713ce294c
Submitter: Jenkins
Branch: master

commit 2cb874ce00416817c2286b5b921e226713ce294c
Author: Artem Panchenko <email address hidden>
Date: Thu Jun 2 10:13:13 2016 +0300

    Add sleep after reset to detect offline nodes

    There is a bug #1588193 in product which we have
    to workaround in tests by adding 'sleep(181)'
    after environment reset.

    Change-Id: I01b2fb0899e3d1cc5b3b6a323117d5da171517e1
    Related-bug: #1588193
    Closes-bug: #1586422

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-qa (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/326931

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-qa (stable/mitaka)

Reviewed: https://review.openstack.org/326931
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=85713d0e8a8809b8a571fcab45398bb876f9a8b5
Submitter: Jenkins
Branch: stable/mitaka

commit 85713d0e8a8809b8a571fcab45398bb876f9a8b5
Author: Artem Panchenko <email address hidden>
Date: Thu Jun 2 10:13:13 2016 +0300

    Add sleep after reset to detect offline nodes

    There is a bug #1588193 in product which we have
    to workaround in tests by adding 'sleep(181)'
    after environment reset.

    Change-Id: I01b2fb0899e3d1cc5b3b6a323117d5da171517e1
    Related-bug: #1588193
    Closes-bug: #1586422
    (cherry picked from commit 2cb874ce00416817c2286b5b921e226713ce294c)

tags: added: in-stable-mitaka
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Georgy Kibardin (gkibardin)
Revision history for this message
Georgy Kibardin (gkibardin) wrote :

When environment is reset, UpdateDnsmasq task is executed asynchronously, it is still active while UI reports that env has been reset. This leads to a situation when the dnsmasq still allows to resolve nodes by name. At the same time nodes are not yet rebooted which make ssh connects successful.

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Georgy Kibardin (gkibardin) wrote :

BTW, I didn't manage to login to nodes - they were already rebooting at the time. After nodes are rebooted I'm able to login to them again.
So, what the bug point is, a lag between node real and dashboard status?

Changed in fuel:
status: In Progress → Incomplete
Revision history for this message
Georgy Kibardin (gkibardin) wrote :

Actually, currently nodes are set to "discovered" status when they are still rebooting.

Revision history for this message
Georgy Kibardin (gkibardin) wrote :

No activity for a month already, making it invalid. Please, reopen if necessary.

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.