"Pending Addition" node switched to "Error" state after deployment starts for part of nodes

Bug #1561994 reported by Grigory Mikhailov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Bulat Gaifullin
Mitaka
Fix Released
High
Bulat Gaifullin
Newton
Fix Committed
High
Bulat Gaifullin

Bug Description

"Pending Addition" node switched to "Error" state after deployment starts for part of nodes

Steps to reproduce on build 101:

01. Create cluster with default settings with 2 "Controller" and 1 "Compute" nodes.
-> All in "Pending Addition" state.
02. Select "Provisioning only" deployment mode.
03. Provision 1 "Controller" and 1 "Compute" nodes.
-> 1 "Controller" in "Pending Addition" state. 2 nodes in "Ubuntu is installed" state.
04. Select "Deployment only" deployment mode.
05. Deploy both provisioned 1 "Controller" and 1 "Compute" nodes.
-> 1 "Controller" state switched from "Pending Addition" to "Error" state, error_type="provision", after deploy is started.
06. Wait until deployment finish.
-> 1 "Controller" still in "Error" state. 2 nodes in "Ready" state.

Version list in attachment.
Diagnostic Snapshot in attachment.
Screenshots in attachments.

Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :

Also reproduced on build 111:

01. Create cluster with default settings with 2 "Controller" and 1 "Compute" nodes.
-> All in "Pending Addition" state.
02. Select "Provisioning only" deployment mode.
03. Provision 1 "Controller" and 1 "Compute" nodes.
-> 1 "Controller" in "Pending Addition" state. 2 nodes in "Ubuntu is installed" state.
04. Select "Deployment only" deployment mode.
05. Deploy both provisioned 1 "Controller" and 1 "Compute" nodes.
-> 1 "Controller" and 1 "Compute" states are switched from "Ubuntu is installed" to "Error" state, error_type="deploy", after deploy is started.
-> 1 "Controller" still in "Pending Addition" state.

Version list in attachment.
Diagnostic Snapshot in attachment.
Screenshots in attachments.

description: updated
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Dmitry Klenov (dklenov)
Changed in fuel:
milestone: none → 9.0
assignee: nobody → Fuel Library Team (fuel-library)
importance: Undecided → High
status: New → Confirmed
tags: added: area-library
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Kyrylo Galanov (kgalanov)
Revision history for this message
Bug Checker Bot (bug-checker) wrote : Autochecker

(This check performed automatically)
Please, make sure that bug description contains the following sections filled in with the appropriate data related to the bug you are describing:

actual result

expected result

For more detailed information on the contents of each of the listed sections see https://wiki.openstack.org/wiki/Fuel/How_to_contribute#Here_is_how_you_file_a_bug

tags: added: need-info
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Kyrylo Galanov (kgalanov) wrote :

It is not a library bug. Nailgun tries to deploy to a node (controller) which was to not selected for deployment.

Pay attention to node-1: http://paste.openstack.org/show/492254/

Changed in fuel:
status: In Progress → Confirmed
assignee: Kyrylo Galanov (kgalanov) → Fuel Python Team (fuel-python)
Dmitry Pyzhov (dpyzhov)
tags: added: area-python
removed: area-library
Revision history for this message
Alexey Shtokolov (ashtokolov) wrote :

It looks like we send all existing nodes to Astute instead of subset which should be deployed

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Bulat Gaifullin (bgaifullin)
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :

Two new scenarios from build 129 with the same error.

Scenario 129_6:
Steps:
01. Create cluster with default settings with 2 "Controller" and 1 "Compute" nodes.
02. Provision 1 "Controller" and 1 "Compute" nodes.
03. Deploy 1 "Controller" node.
Expected results:
01. All 3 nodes in "Pending Addition" state.
02. 1 "Controller" and 1 "Compute" nodes in "Ubuntu is installed" state. 1 "Controller" still in "Pending Addition" state.
03. 1 "Controller" node in "Ready" state. 1 "Controller" node still in "Pending Addition" state. 1 "Compute" node still in "Ubuntu is installed" state.
Results:
03. Deploying "Controller" node state switched to "Error" state. Error message is observed.

Scenario 129_7:
Steps:
01. Create cluster with default settings with 2 "Controller" and 1 "Compute" nodes.
02. Provision 1 "Controller" and 1 "Compute" nodes.
03. Deploy 1 "Compute" node.
Expected results:
01. All 3 nodes in "Pending Addition" state.
02. 1 "Controller" and 1 "Compute" nodes in "Ubuntu is installed" state. 1 "Controller" still in "Pending Addition" state.
03. 1 "Compute" node in "Ready" state. 1 "Controller" node still in "Pending Addition" state. 1 "Controller" node still in "Ubuntu is installed" state.
Results:
03. Deploying "Compute" node state switched to "Error" state. Error message is observed.

Version list, diagnostic snapshots and screenshots in attachments.

Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Revision history for this message
Grigory Mikhailov (gmikhailov) wrote :
Dmitry Pyzhov (dpyzhov)
tags: added: team-enhancements
Revision history for this message
Alexandr Kostrikov (akostrikov-mirantis) wrote :

Provisioning:
Calles with two nodes, executed on two nodes.
[pid: 24087|app: 0|req: 118/268] 10.109.15.1 () {54 vars in 996 bytes} [Fri Apr 1 11:35:02 2016] PUT /api/clusters/1/provision?nodes=1,3 => generated 159 bytes in 395 msecs (HTTP/1.1 202) 4 headers in 191 bytes
(2 switches on core 0)
      "nodes"=>
       [{"profile"=>"ubuntu_1404_x86_64",
         "name_servers_search"=>"\"test.domain.local\"",
         "uid"=>"1",
        {"profile"=>"ubuntu_1404_x86_64",
         "name_servers_search"=>"\"test.domain.local\"",
         "uid"=>"3",

Deployment:
Calles with two nodes, executed on three nodes.
[pid: 24085|app: 0|req: 110/1005] 10.109.15.1 () {54 vars in 990 bytes} [Fri Apr 1 11:55:53 2016] PUT /api/clusters/1/deploy?nodes=1,3 => generated 160 bytes in 1998 msecs (HTTP/1.1 202) 4 headers in 191 bytes (2 switches on core 0)
[29689] 'task_deploy' method called with data:
{"args"=>
  {"task_uuid"=>"2fb5011d-195a-4e79-b41c-69ce0c682b0d",
   "deployment_info"=>

      "nodes"=>
       [{"user_node_name"=>"Untitled (f6:5b)",
         "uid"=>"1",
         "public_address"=>"10.109.18.4",
         "internal_netmask"=>"255.255.255.0",
         "fqdn"=>"node-1.test.domain.local",
         "role"=>"primary-controller",
         "public_netmask"=>"255.255.255.0",
         "internal_address"=>"10.109.16.5",
         "storage_address"=>"10.109.17.3",
         "swift_zone"=>"1",
         "storage_netmask"=>"255.255.255.0",
         "name"=>"node-1"},
        {"user_node_name"=>"Untitled (d1:f6)",
         "uid"=>"2",
         "public_address"=>"10.109.18.5",
         "internal_netmask"=>"255.255.255.0",
         "fqdn"=>"node-2.test.domain.local",
         "role"=>"controller",
         "public_netmask"=>"255.255.255.0",
         "internal_address"=>"10.109.16.6",
         "storage_address"=>"10.109.17.4",
         "swift_zone"=>"2",
         "storage_netmask"=>"255.255.255.0",
         "name"=>"node-2"},
        {"user_node_name"=>"Untitled (d5:20)",
         "uid"=>"3",
         "internal_netmask"=>"255.255.255.0",
         "fqdn"=>"node-3.test.domain.local",
         "role"=>"compute",
         "internal_address"=>"10.109.16.4",
         "storage_address"=>"10.109.17.2",
         "swift_zone"=>"3",
         "storage_netmask"=>"255.255.255.0",
         "name"=>"node-3"}],

Changed in fuel:
status: Confirmed → In Progress
Changed in fuel:
assignee: Bulat Gaifullin (bgaifullin) → Vladimir Kuklin (vkuklin)
Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Bulat Gaifullin (bgaifullin)
Changed in fuel:
assignee: Bulat Gaifullin (bgaifullin) → Vladimir Kuklin (vkuklin)
Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Bulat Gaifullin (bgaifullin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/302162
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=fe73beacc60c8dd5b1438ad0f8bde13ca9a3b3ae
Submitter: Jenkins
Branch: master

commit fe73beacc60c8dd5b1438ad0f8bde13ca9a3b3ae
Author: Bulat Gaifullin <email address hidden>
Date: Mon Apr 4 18:02:48 2016 +0300

    Reworked ApplyChanges for LCM

    The following legacy tasks were reworked to use ClusterTransaction:
     - OpenstackConfigTaskManager
     - SpawnVMsTaskManager

    Change-Id: I4a6f5f37161e4290050ec4926cf029cd7af566e4
    Closes-Bug: 1565885
    Closes-Bug: 1561994
    Closes-Bug: 1565760

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/305962

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/305962
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=43ec837228fff8e1da65a400258b2f6148a27bb2
Submitter: Jenkins
Branch: stable/mitaka

commit 43ec837228fff8e1da65a400258b2f6148a27bb2
Author: Bulat Gaifullin <email address hidden>
Date: Mon Apr 4 18:02:48 2016 +0300

    Reworked ApplyChanges for LCM

    The following legacy tasks were reworked to use ClusterTransaction:
     - OpenstackConfigTaskManager
     - SpawnVMsTaskManager

    Change-Id: I4a6f5f37161e4290050ec4926cf029cd7af566e4
    (cherry picked from commit I4a6f5f37161e4290050ec4926cf029cd7af566e4)
    Closes-Bug: 1565885
    Closes-Bug: 1561994
    Closes-Bug: 1565760

tags: added: on-verification
Revision history for this message
ElenaRossokhina (esolomina) wrote :

Verified using all initial and comment #14 scenarios
cat /etc/fuel_build_id:
 417
cat /etc/fuel_build_number:
 417
cat /etc/fuel_release:
 9.0
cat /etc/fuel_openstack_version:
 mitaka-9.0
rpm -qa | egrep 'fuel|astute|network-checker|nailgun|packetary|shotgun':
 fuel-release-9.0.0-1.mos6347.noarch
 fuel-bootstrap-cli-9.0.0-1.mos284.noarch
 fuel-migrate-9.0.0-1.mos8398.noarch
 fuel-mirror-9.0.0-1.mos137.noarch
 fuel-notify-9.0.0-1.mos8398.noarch
 nailgun-mcagents-9.0.0-1.mos746.noarch
 python-fuelclient-9.0.0-1.mos316.noarch
 fuelmenu-9.0.0-1.mos270.noarch
 fuel-9.0.0-1.mos6347.noarch
 fuel-utils-9.0.0-1.mos8398.noarch
 fuel-setup-9.0.0-1.mos6347.noarch
 fuel-library9.0-9.0.0-1.mos8398.noarch
 shotgun-9.0.0-1.mos90.noarch
 fuel-agent-9.0.0-1.mos284.noarch
 fuel-ui-9.0.0-1.mos2706.noarch
 fuel-ostf-9.0.0-1.mos934.noarch
 fuel-misc-9.0.0-1.mos8398.noarch
 python-packetary-9.0.0-1.mos137.noarch
 fuel-nailgun-9.0.0-1.mos8709.noarch
 rubygem-astute-9.0.0-1.mos746.noarch
 fuel-provisioning-scripts-9.0.0-1.mos8709.noarch
 network-checker-9.0.0-1.mos72.x86_64
 fuel-openstack-metadata-9.0.0-1.mos8709.noarch

tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.