reboot_plugin: Cluster failed on deploy because nodes are offline

Bug #1624439 reported by ElenaRossokhina
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Alexey Stupnikov

Bug Description

Detailed bug description:
Found on CI https://product-ci.infra.mirantis.net/job/9.x.system_test.ubuntu.fuel_plugin_reboot/62/testReport/(root)/deploy_cluster_with_reboot_plugin/
Steps to reproduce:
    1. Revert snapshot with 5 nodes
    2. Download and install fuel-plugin-builder
    3. Create plugin with reboot task
    4. Build plugin and copy it in var directory
    5. Install plugin to fuel
    6. Create cluster and enable plugin
    7. Provision nodes
    8. Collect timestamps from nodes
    9. Deploy cluster (fail here)
    10. Check if timestamps are changed

or system test deploy_cluster_with_reboot_plugin

Expected results:
Deploy is successful
Actual result:
Deploy failed with following message 'Nodes "slave-02_compute_ceph-osd (id=1, mac=64:2f:76:17:82:c4),slave-01_controller_ceph-osd (id=2, mac=64:c1:99:2c:34:33),slave-03_compute (id=3, mac=64:46:17:fb:91:a7)" are offline. Remove them from environment and try again.'
It is possible, time limit for reboot was expired.

Tags: swarm-fail
Revision history for this message
ElenaRossokhina (esolomina) wrote :

it's strange that slave-03_compute (id=3, mac=64:46:17:fb:91:a7) was offline
in test we suppose, that compute node is not rebooted by this task

Changed in fuel:
assignee: nobody → Fuel QA Team (fuel-qa)
milestone: none → 9.1
Revision history for this message
ElenaRossokhina (esolomina) wrote :
Changed in fuel:
assignee: Fuel QA Team (fuel-qa) → Fuel Sustaining (fuel-sustaining-team)
importance: Undecided → High
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
status: New → Confirmed
milestone: 9.1 → 9.2
Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

There was a bug #1617329 with the same symptoms: slave nodes just didn't finish reboot process in time. A time-out value was increased to fix this issue.

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

I have doublechecked everything and it looks like we have another situation at the moment (the reported problem was solved): tests are no longer failing because of offline nodes. Right now failures are caused by 500 error:

2016-11-24 22:29:09.533 ERROR [7f3540682880] (logger) Response code '500 Internal Server Error' for POST /api/clusters from 10.109.0.1:46530

This bug is invalid and a new one should be opened with new issue reported.

Changed in fuel:
status: Confirmed → Invalid
assignee: Fuel Sustaining (fuel-sustaining-team) → Alexey Stupnikov (astupnikov)
Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

Opened new bug #1644794

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.