Fuel for OpenStack

5.1.1 check_before_deployment task in error state

Bug #1393809 reported by Tatyanka on 2014-11-18

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Invalid	High	Dima Shulyak	Fuel for OpenStack 5.1.1

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "5.1.1"
  api: "1.0"
  build_number: "19"
  build_id: "2014-11-17_21-00-23"
  astute_sha: "fce051a6d013b1c30aa07320d225f9af734545de"
  fuellib_sha: "add3fdd3e2af57b20dbb73a6bc53a9ccc4701c9a"
  ostf_sha: "64cb59c681658a7a55cc2c09d079072a41beb346"
  nailgun_sha: "2fcab95dc43a248ba867065e96ab764ee73882d1"
  fuelmain_sha: "ff22ca819e6eb7c63b6d7978fdd80ef9b84457d9"

Scenario:
            1. Create cluster
            2. Add 1 node with controller role
            3. Add 1 node with compute role
            4. Run provisioning task
            5. Run deployment task
            6. Stop deployment
            7. Add 1 node with cinder role
            8. Re-deploy cluster
            9. Run OSTF

Actual Result:
all steps from 1 to 7 passed without error. Cluster has stopped status, nodes are discovered, but on aatepts to re-deploy cluster fuel task check_before_deployment fail with error and as result mark deploy with error.

But when I run the same env mannualy over cli - cluster was re-deployed successfully and get ready status and all ostf test passed there

So for me it is not clear what was heppens if nodes ere online and nets were fine..

Tags:

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2014-11-18:

fail_error_deploy_flat_stop_on_deploying-2014_11_18__01_16_04.tar.gz Edit (1.0 MiB, application/x-tar)

Dima Shulyak (dshulyak) on 2014-11-19

Changed in fuel:
assignee:	Fuel Python Team (fuel-python) → Dima Shulyak (dshulyak)

Revision history for this message

Dima Shulyak (dshulyak) wrote on 2014-11-19:

On first attempt slave-01_controller was offline, or atleast it was considered offline by nailgun database, that is why this error occured.

from nailgun/app.log

2014-11-18 01:15:48.669 WARNING [7f93d1652740] (manager) Checking prerequisites failed: Nodes "slave-01_controller (id=1, mac=64:a6:70:79:87:68)" are offline. Remove them from environment and try again.

from nailgun/assasind.log

2014-11-18 01:15:21.540 INFO [7ff0f23b0700] (notification) Notification: topic: error message: Node 'slave-01_controller' has gone away

On second attempt (via cli) this node was brought back by nailgun and deployment was started correctly

Revision history for this message

Dima Shulyak (dshulyak) wrote on 2014-11-19:

Last message from slave-01 was 2014-11-18 01:12:16.096 DEBUG [7f93d1652740

Revision history for this message

Dima Shulyak (dshulyak) wrote on 2014-11-19:

So, it is just an expected lag in nailgun-agent workflow, and it error message was appropriate

Changed in fuel:
status:	New → Invalid

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2014-11-19:

But before start deploy we check that nailgun see this nodes as online, so How it could be possible that api says nodes are online and bd stiil has not up to date info?

Changed in fuel:
status:	Invalid → Incomplete

Revision history for this message

Dima Shulyak (dshulyak) wrote on 2014-11-19:

You can see from log above that lag between nodes was marked as offline and deployment started is ~27 sec.
I'm quite sure that node status was checked not in this interval

Revision history for this message

Tatyanka (tatyana-leontovich) wrote on 2014-11-19:

Dima thaks, I got it now. so move to invalid and will test under scale such behavior. If It it reproduced - I'll be back :)