5.1.1 check_before_deployment task in error state

Bug #1393809 reported by Tatyanka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Dima Shulyak

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "5.1.1"
  api: "1.0"
  build_number: "19"
  build_id: "2014-11-17_21-00-23"
  astute_sha: "fce051a6d013b1c30aa07320d225f9af734545de"
  fuellib_sha: "add3fdd3e2af57b20dbb73a6bc53a9ccc4701c9a"
  ostf_sha: "64cb59c681658a7a55cc2c09d079072a41beb346"
  nailgun_sha: "2fcab95dc43a248ba867065e96ab764ee73882d1"
  fuelmain_sha: "ff22ca819e6eb7c63b6d7978fdd80ef9b84457d9"

Scenario:
            1. Create cluster
            2. Add 1 node with controller role
            3. Add 1 node with compute role
            4. Run provisioning task
            5. Run deployment task
            6. Stop deployment
            7. Add 1 node with cinder role
            8. Re-deploy cluster
            9. Run OSTF

Actual Result:
all steps from 1 to 7 passed without error. Cluster has stopped status, nodes are discovered, but on aatepts to re-deploy cluster fuel task check_before_deployment fail with error and as result mark deploy with error.

But when I run the same env mannualy over cli - cluster was re-deployed successfully and get ready status and all ostf test passed there

So for me it is not clear what was heppens if nodes ere online and nets were fine..

Tags: nailgun
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :
Dima Shulyak (dshulyak)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Dima Shulyak (dshulyak)
Revision history for this message
Dima Shulyak (dshulyak) wrote :

On first attempt slave-01_controller was offline, or atleast it was considered offline by nailgun database, that is why this error occured.

from nailgun/app.log

2014-11-18 01:15:48.669 WARNING [7f93d1652740] (manager) Checking prerequisites failed: Nodes "slave-01_controller (id=1, mac=64:a6:70:79:87:68)" are offline. Remove them from environment and try again.

from nailgun/assasind.log

2014-11-18 01:15:21.540 INFO [7ff0f23b0700] (notification) Notification: topic: error message: Node 'slave-01_controller' has gone away

On second attempt (via cli) this node was brought back by nailgun and deployment was started correctly

Revision history for this message
Dima Shulyak (dshulyak) wrote :

Last message from slave-01 was 2014-11-18 01:12:16.096 DEBUG [7f93d1652740

Revision history for this message
Dima Shulyak (dshulyak) wrote :

So, it is just an expected lag in nailgun-agent workflow, and it error message was appropriate

Changed in fuel:
status: New → Invalid
Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

But before start deploy we check that nailgun see this nodes as online, so How it could be possible that api says nodes are online and bd stiil has not up to date info?

Changed in fuel:
status: Invalid → Incomplete
Revision history for this message
Dima Shulyak (dshulyak) wrote :

You can see from log above that lag between nodes was marked as offline and deployment started is ~27 sec.
I'm quite sure that node status was checked not in this interval

Revision history for this message
Tatyanka (tatyana-leontovich) wrote :

Dima thaks, I got it now. so move to invalid and will test under scale such behavior. If It it reproduced - I'll be back :)

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.