[nailgun] It is not possible to rerun deployment after exception in deployment task

Bug #1436821 reported by Vadim Rovachev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Roman Prykhodchenko

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "229"
  build_id: "2015-03-25_21-15-27"
  nailgun_sha: "64a3d380cefde4f341cd39be350c4c9f78b59b7d"
  python-fuelclient_sha: "e5e8389d8d481561a4d7107a99daae07c6ec5177"
  astute_sha: "631f96d5a09cc48bfbddcbf056b946c8a80438f0"
  fuellib_sha: "345a98b34dd0cd450a45d405ac47a6a9fa48b6d8"
  ostf_sha: "a4cf5f218c6aea98105b10c97a4aed8115c15867"
  fuelmain_sha: "320b5f46fc1b2798f9e86ed7df51d3bda1686c10"

Steps to reproduce:

1. deploy KVM Fuel master node

2. Start 3 KVM slaves

3. Reboot Supermicro
All KVMs and Supermicro in one admin network
Also KVMs and Supermicro have networks for other

4. Wait while all KVM slaves and Supermicro connected to Fuel Master

5. Create env with parameters:
3 KVMs: controller + mongo
networks on KVMs:
eth0: admin
eth1(All networks on this interface have vlan): public, management, storage

Supermicro: Compute + Ceph
eth0: empty
eth1(All networks on this interface have vlan, for admin network eth1 interface has native vlan): admin, public, management, storage

parameters:
virt_type=kvm
config_mode=ha_compact
release_name=centos
net_provider=neutron
net_segment_type=gre
provision_method=cobbler/image
debug=true
nova_quota=true
volumes_lvm=false,
volumes_ceph=true,
images_ceph=true,
ephemeral_ceph=false,
objects_ceph=true,
osd_pool_size=1,
sahara=true,
murano=true,
ceilometer=true

6. Start deploy.
The progress of env deployment does not change(awlays equals to 0%). If provision method is cobbler, slave nodes (KVMs and Supermicro) don't reboot.

Revision history for this message
Vadim Rovachev (vrovachev) wrote :
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
importance: Undecided → High
milestone: none → 6.1
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Aleksey Kasatkin (alekseyk-ru)
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

It was an error during deployment task. Seems smth was wrong with public IPs assigning:

2015-03-26 11:44:46.557 INFO [7f63a46a9740] (manager) Assigning IP for node '1' in network 'management'
2015-03-26 11:44:46.557 INFO [7f63a46a9740] (manager) Assigning IP for node '2' in network 'management'
2015-03-26 11:44:46.557 INFO [7f63a46a9740] (manager) Assigning IP for node '3' in network 'management'
2015-03-26 11:44:46.557 INFO [7f63a46a9740] (manager) Assigning IP for node '4' in network 'management'
2015-03-26 11:44:46.584 DEBUG [7f63a46a9740] (task) Updating task: 770d10d1-1174-45fa-b09a-324706e7d135
2015-03-26 11:44:46.585 DEBUG [7f63a46a9740] (task) Updating cluster status: 770d10d1-1174-45fa-b09a-324706e7d135 cluster_id: 1 status: error
2015-03-26 11:44:46.585 DEBUG [7f63a46a9740] (task) Updating cluster (centos-neutron-gre-ha (id=1, mode=ha_compact)) status: from new to error

Although, it's not clear yet what is the reason of the error and why tasks were not deleted from DB.

Changed in fuel:
status: New → In Progress
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

Oh.. Public IP range is "172.16.49.199 - 172.16.49.202". It's not enough for 4 nodes and 2 VIPs.
Problem with tasks is being investigated.

Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

More exactly, for 3 controllers + 2 VIPs, still not enuogh.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
importance: High → Medium
Dmitry Pyzhov (dpyzhov)
summary: - Environment deployment does not started
+ Deployment hangs if there are not enought public IPs
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote : Re: Deployment hangs if there are not enought public IPs

Actually, there are two issues here:
1. IP address count checking is wrong (it assumes that 1 IP is required for VIP instead of 2).
2. Nailgun doesn't allow to rerun deployment after exception in deployment task (in UI it looks like deploy is hanged).

I created separate ticket regarding the first issue: https://bugs.launchpad.net/fuel/+bug/1437354 .
Current ticket will be preserved to track second one.

summary: - Deployment hangs if there are not enought public IPs
+ [nailgun] It is not possible to rerun deployment after exception in
+ deployment task
Changed in fuel:
status: In Progress → Confirmed
Dmitry Pyzhov (dpyzhov)
tags: added: module-serialization
Changed in fuel:
assignee: Aleksey Kasatkin (alekseyk-ru) → Fuel Python Team (fuel-python)
milestone: 6.1 → 7.0
Changed in fuel:
importance: Medium → High
milestone: 7.0 → 6.1
Revision history for this message
Roman Prykhodchenko (romcheg) wrote :

I discovered that even a generic exception w/o any special conditions causes the described problem.

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Roman Prykhodchenko (romcheg)
Kamil Sambor (ksambor)
Changed in fuel:
assignee: Roman Prykhodchenko (romcheg) → Kamil Sambor (ksambor)
Revision history for this message
Kamil Sambor (ksambor) wrote :

This look like the same problem with not changing deployment tasks status like in the bug: https://bugs.launchpad.net/fuel/+bug/1446651

Changed in fuel:
assignee: Kamil Sambor (ksambor) → Roman Prykhodchenko (romcheg)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/177279

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/177279
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=5c6deb65230126cdc10748daa2355243e5232fe9
Submitter: Jenkins
Branch: master

commit 5c6deb65230126cdc10748daa2355243e5232fe9
Author: Roman Prykhodchenko <email address hidden>
Date: Fri Apr 24 16:13:32 2015 +0200

    Commit transactions after an exception in a task

    Tasks are now executed in mules so when an exception
    happens during an execution of one, task status is updated
    but not commited to the database because mules start separate
    transactions.

    This patch adds a commit clause to the error handler in tasks.

    Closes-bug: #1436821
    Closes-bug: #1446651
    Change-Id: I6c17d1cd88321940f2e23379965675e6b185ab1e

Changed in fuel:
status: In Progress → Fix Committed
Nikolay Markov (nmarkov)
Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.