Fail to immediately stop provisioning

Bug #1322573 reported by Kate Pimenova
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Vladimir Sharshov
5.0.x
Fix Committed
High
Vladimir Sharshov

Bug Description

Any last ISO.
It is reproduced not every time - float bug.

Steps to reproduce:

Сreate new cluster with any configuration, add controller and couple more nodes.
Click "Deploy changes" button - and VERY FAST click "stop deployment".
Wait till stop deployment success.

Expected: nodes reboots, and it needs some time to wait till "pending addition" status again. Then we can redeploy.

Observed:
 - I didnt observe node reboot, actually I didn't observe any activity on the slave nodes. They all are in active status, not offline. I can log in on every node using console.
- on UI: all nodes have "offline" status. Redeploy is not possible. I can delete cluster and try to create new one - i don't see this nodes any more.
- in messages list on UI there are alert: " Fuel couldn't reach these nodes during deployment stopping: 'Untitled (b3:b5)', 'Untitled (9c:05)', 'Untitled (87:d8)', 'Untitled (54:e8)', 'Untitled (4c:0f)'"
 this alert is placed before "Deployment of environment 'test' was successfully stopped"
- to return nodes to live - I need reset VMs

Revision history for this message
Kate Pimenova (kpimenova) wrote :
description: updated
Changed in fuel:
milestone: none → 5.1
assignee: nobody → Fuel Python Team (fuel-python)
importance: Undecided → High
Dima Shulyak (dshulyak)
Changed in fuel:
status: New → Confirmed
Revision history for this message
Dima Shulyak (dshulyak) wrote :

So nodes was erased and /var/run/nodiscover created,
but reboot is failed, i think it is fixed using ip address instead of host name,
this fix was added in https://review.openstack.org/#/c/96116/

going to test it on newer iso, and close as duplicate

Revision history for this message
Dima Shulyak (dshulyak) wrote :

nailgun sends message with ip from static pool - 10.108.10.3:

{
    "args": {
        "engine": {
            "url": "http://10.108.10.2:80/cobbler_api",
            "username": "cobbler",
            "password": "cobbler",
            "master_ip": "10.108.10.2"
        },
        "task_uuid": "a8be0271-936c-4c40-bf39-6fd7097f44ec",
        "stop_task_uuid": "f76909ad-8f32-436c-8461-ed3104960ed2",
        "nodes": [
            {
                "admin_ip": "10.108.10.3",
                "uid": "1",
                "roles": [
                    "controller"
                ],
                "slave_name": "node-1"
            }
        ]
    },
    "respond_to": "stop_deployment_resp",
    "method": "stop_deploy_task",
    "api_version": "1.0"
}

but in fact node assigned with ip from dhcp pool, which in this case is 10.108.10.217

Revision history for this message
Dima Shulyak (dshulyak) wrote :

Adding logs, but i have no troubles to reproduce it...

In my understanding astute receives provision task and add /var/run/nodiscover on target node,
but after stop_provision it cant do anything via ssh with wrong ip address received from nailgun

summary: - Sometimes nodes stays offline after stop deployment
+ Fail to immediately stop provisioning
Changed in fuel:
status: Confirmed → In Progress
assignee: Fuel Python Team (fuel-python) → Vladimir Sharshov (vsharshov)
Changed in fuel:
assignee: Vladimir Sharshov (vsharshov) → Artem Panchenko (apanchenko-8)
Changed in fuel:
assignee: Artem Panchenko (apanchenko-8) → Vladimir Sharshov (vsharshov)
Changed in fuel:
assignee: Vladimir Sharshov (vsharshov) → Artem Panchenko (apanchenko-8)
Changed in fuel:
assignee: Artem Panchenko (apanchenko-8) → Vladimir Sharshov (vsharshov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/96554
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=5fa18e8e0873dd2455a984014b7b29a2382cfd0b
Submitter: Jenkins
Branch: master

commit 5fa18e8e0873dd2455a984014b7b29a2382cfd0b
Author: Vladimir Sharshov <email address hidden>
Date: Thu May 29 22:49:01 2014 +0400

    Erase provisioned node when cancel provisioning

    * always erase node in boostrap state (failsafe optimization);
    * do erase using shell script nodes in provisioned/boostrap state;
    * for provisioned/boostrap state use mcollective agent.

    Change-Id: I2a3df52920f57f9c66e237de0d0d48a814ebf409
    Related-Bug: #1316583
    Closes-Bug: #1322573

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (stable/5.0)

Fix proposed to branch: stable/5.0
Review: https://review.openstack.org/105260

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (stable/5.0)

Reviewed: https://review.openstack.org/105260
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=a4edb51661f50c66e247e0b8d00f2d01e0658fe6
Submitter: Jenkins
Branch: stable/5.0

commit a4edb51661f50c66e247e0b8d00f2d01e0658fe6
Author: Vladimir Sharshov <email address hidden>
Date: Thu May 29 22:49:01 2014 +0400

    Erase provisioned node when cancel provisioning

    * always erase node in boostrap state (failsafe optimization);
    * do erase using shell script nodes in provisioned/boostrap state;
    * for provisioned/boostrap state use mcollective agent.

    Change-Id: I2a3df52920f57f9c66e237de0d0d48a814ebf409
    Related-Bug: #1316583
    Closes-Bug: #1322573

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/5.1.x
Revision history for this message
Dmitry Tyzhnenko (dtyzhnenko) wrote :

Verified on 5.1 iso 11

api: '1.0'
astute_sha: f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13
auth_required: true
build_id: 2014-09-17_21-40-34
build_number: '11'
feature_groups:
- mirantis
fuellib_sha: d9b16846e54f76c8ebe7764d2b5b8231d6b25079
fuelmain_sha: 8ef433e939425eabd1034c0b70e90bdf888b69fd
nailgun_sha: eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d
ostf_sha: 64cb59c681658a7a55cc2c09d079072a41beb346
production: docker
release: '5.1'
release_versions:
  2014.1.1-5.1:
    VERSION:
      api: '1.0'
      astute_sha: f5fbd89d1e0e1f22ef9ab2af26da5ffbfbf24b13
      build_id: 2014-09-17_21-40-34
      build_number: '11'
      feature_groups:
      - mirantis
      fuellib_sha: d9b16846e54f76c8ebe7764d2b5b8231d6b25079
      fuelmain_sha: 8ef433e939425eabd1034c0b70e90bdf888b69fd
      nailgun_sha: eb8f2b358ea4bb7eb0b2a0075e7ad3d3a905db0d
      ostf_sha: 64cb59c681658a7a55cc2c09d079072a41beb346
      production: docker
      release: '5.1'

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.