Fuel for OpenStack

After successful cluster reset, slave node is marked as 'online', but it is unavailable

Bug #1588193 reported by Artem Panchenko on 2016-06-02

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	Medium	Georgy Kibardin	Fuel for OpenStack 10.0

Bug Description

Right after environment reset ('reset_environment' task is 'ready'), slave node could be marked as 'online' and 'discover', but actually it is unavailable (see bug #1586422). I believe it happens because node erasing action takes less than 2 minutes (online/offline timeout): since 9.0 release we erase only MBR https://github.com/openstack/fuel-astute/commit/e770d4ec7d302e958ffae8db87e633e9d5e3db91 So nodes online status isn't changed to 'False' before 'reset_environment' task is completed.

Seems marking target slave node as 'offline' from Astute, before erasing its drives, is a good fix this issue.

Steps to reproduce:

1. Reset some environment
2. Right after reset task is finished (e.g. done on GUI), try to SSH to erased 'online' nodes

Expected result:

there are no online nodes (which belong to reset cluster) or all online nodes are accessible via SSH and in 'bootstrap' state

Actual result:

online nodes are not accessible via SSH (still rebooting after erase)

Tags:

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-02: Related fix proposed to fuel-qa (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/324267

Maksim Malchuk (mmalchuk) on 2016-06-02

Changed in fuel:
status:	New → Confirmed
tags:	added: area-python area-qa

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-08: Related fix merged to fuel-qa (master)

Reviewed: https://review.openstack.org/324267
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=2cb874ce00416817c2286b5b921e226713ce294c
Submitter: Jenkins
Branch: master

commit 2cb874ce00416817c2286b5b921e226713ce294c
Author: Artem Panchenko <email address hidden>
Date: Thu Jun 2 10:13:13 2016 +0300

Add sleep after reset to detect offline nodes

    There is a bug #1588193 in product which we have
    to workaround in tests by adding 'sleep(181)'
    after environment reset.

    Change-Id: I01b2fb0899e3d1cc5b3b6a323117d5da171517e1
    Related-bug: #1588193
    Closes-bug: #1586422

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-08: Related fix proposed to fuel-qa (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/326931

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-06-08: Related fix merged to fuel-qa (stable/mitaka)

Reviewed: https://review.openstack.org/326931
Committed: https://git.openstack.org/cgit/openstack/fuel-qa/commit/?id=85713d0e8a8809b8a571fcab45398bb876f9a8b5
Submitter: Jenkins
Branch: stable/mitaka

commit 85713d0e8a8809b8a571fcab45398bb876f9a8b5
Author: Artem Panchenko <email address hidden>
Date: Thu Jun 2 10:13:13 2016 +0300

Add sleep after reset to detect offline nodes

    There is a bug #1588193 in product which we have
    to workaround in tests by adding 'sleep(181)'
    after environment reset.

    Change-Id: I01b2fb0899e3d1cc5b3b6a323117d5da171517e1
    Related-bug: #1588193
    Closes-bug: #1586422
    (cherry picked from commit 2cb874ce00416817c2286b5b921e226713ce294c)

tags:

added: in-stable-mitaka

Georgy Kibardin (gkibardin) on 2016-06-28

Changed in fuel:
assignee:	Fuel Sustaining (fuel-sustaining-team) → Georgy Kibardin (gkibardin)

Revision history for this message

Georgy Kibardin (gkibardin) wrote on 2016-06-29:

When environment is reset, UpdateDnsmasq task is executed asynchronously, it is still active while UI reports that env has been reset. This leads to a situation when the dnsmasq still allows to resolve nodes by name. At the same time nodes are not yet rebooted which make ssh connects successful.

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

Georgy Kibardin (gkibardin) wrote on 2016-06-29:

BTW, I didn't manage to login to nodes - they were already rebooting at the time. After nodes are rebooted I'm able to login to them again.
So, what the bug point is, a lag between node real and dashboard status?

Changed in fuel:
status:	In Progress → Incomplete

Revision history for this message

Georgy Kibardin (gkibardin) wrote on 2016-06-29:

Actually, currently nodes are set to "discovered" status when they are still rebooting.

Revision history for this message

Georgy Kibardin (gkibardin) wrote on 2016-08-04:

No activity for a month already, making it invalid. Please, reopen if necessary.

Changed in fuel:
status:	Incomplete → Invalid

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.