Fuel for OpenStack

[nailgun] It is not possible to rerun deployment after exception in deployment task

Bug #1436821 reported by Vadim Rovachev on 2015-03-26

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Fuel for OpenStack	Fix Released	High	Roman Prykhodchenko	Fuel for OpenStack 6.1

Bug Description

VERSION:
  feature_groups:
    - mirantis
  production: "docker"
  release: "6.1"
  api: "1.0"
  build_number: "229"
  build_id: "2015-03-25_21-15-27"
  nailgun_sha: "64a3d380cefde4f341cd39be350c4c9f78b59b7d"
  python-fuelclient_sha: "e5e8389d8d481561a4d7107a99daae07c6ec5177"
  astute_sha: "631f96d5a09cc48bfbddcbf056b946c8a80438f0"
  fuellib_sha: "345a98b34dd0cd450a45d405ac47a6a9fa48b6d8"
  ostf_sha: "a4cf5f218c6aea98105b10c97a4aed8115c15867"
  fuelmain_sha: "320b5f46fc1b2798f9e86ed7df51d3bda1686c10"

Steps to reproduce:

1. deploy KVM Fuel master node

2. Start 3 KVM slaves

3. Reboot Supermicro
All KVMs and Supermicro in one admin network
Also KVMs and Supermicro have networks for other

4. Wait while all KVM slaves and Supermicro connected to Fuel Master

5. Create env with parameters:
3 KVMs: controller + mongo
networks on KVMs:
eth0: admin
eth1(All networks on this interface have vlan): public, management, storage

Supermicro: Compute + Ceph
eth0: empty
eth1(All networks on this interface have vlan, for admin network eth1 interface has native vlan): admin, public, management, storage

parameters:
virt_type=kvm
config_mode=ha_compact
release_name=centos
net_provider=neutron
net_segment_type=gre
provision_method=cobbler/image
debug=true
nova_quota=true
volumes_lvm=false,
volumes_ceph=true,
images_ceph=true,
ephemeral_ceph=false,
objects_ceph=true,
osd_pool_size=1,
sahara=true,
murano=true,
ceilometer=true

6. Start deploy.
The progress of env deployment does not change(awlays equals to 0%). If provision method is cobbler, slave nodes (KVMs and Supermicro) don't reboot.

Tags:

Revision history for this message

Vadim Rovachev (vrovachev) wrote on 2015-03-26:

fuel-snapshot-2015-03-26_12-16-27.tar.xz Edit (8.2 MiB, application/octet-stream)

Dmitry Pyzhov (dpyzhov) on 2015-03-26

Changed in fuel:
importance:	Undecided → High
milestone:	none → 6.1

Dmitry Pyzhov (dpyzhov) on 2015-03-26

Changed in fuel:
assignee:	nobody → Fuel Python Team (fuel-python)

Aleksey Kasatkin (alekseyk-ru) on 2015-03-26

Changed in fuel:
assignee:	Fuel Python Team (fuel-python) → Aleksey Kasatkin (alekseyk-ru)

Revision history for this message

Aleksey Kasatkin (alekseyk-ru) wrote on 2015-03-26:

It was an error during deployment task. Seems smth was wrong with public IPs assigning:

2015-03-26 11:44:46.557 INFO [7f63a46a9740] (manager) Assigning IP for node '1' in network 'management'
2015-03-26 11:44:46.557 INFO [7f63a46a9740] (manager) Assigning IP for node '2' in network 'management'
2015-03-26 11:44:46.557 INFO [7f63a46a9740] (manager) Assigning IP for node '3' in network 'management'
2015-03-26 11:44:46.557 INFO [7f63a46a9740] (manager) Assigning IP for node '4' in network 'management'
2015-03-26 11:44:46.584 DEBUG [7f63a46a9740] (task) Updating task: 770d10d1-1174-45fa-b09a-324706e7d135
2015-03-26 11:44:46.585 DEBUG [7f63a46a9740] (task) Updating cluster status: 770d10d1-1174-45fa-b09a-324706e7d135 cluster_id: 1 status: error
2015-03-26 11:44:46.585 DEBUG [7f63a46a9740] (task) Updating cluster (centos-neutron-gre-ha (id=1, mode=ha_compact)) status: from new to error

Although, it's not clear yet what is the reason of the error and why tasks were not deleted from DB.

Changed in fuel:
status:	New → In Progress

Revision history for this message

Aleksey Kasatkin (alekseyk-ru) wrote on 2015-03-26:

Oh.. Public IP range is "172.16.49.199 - 172.16.49.202". It's not enough for 4 nodes and 2 VIPs.
Problem with tasks is being investigated.

Revision history for this message

Aleksey Kasatkin (alekseyk-ru) wrote on 2015-03-26:

More exactly, for 3 controllers + 2 VIPs, still not enuogh.

Dmitry Pyzhov (dpyzhov) on 2015-03-27

Changed in fuel:
importance:	High → Medium

Dmitry Pyzhov (dpyzhov) on 2015-03-27

summary:

- Environment deployment does not started
+ Deployment hangs if there are not enought public IPs

Revision history for this message

Aleksey Kasatkin (alekseyk-ru) wrote on 2015-03-27: Re: Deployment hangs if there are not enought public IPs

Actually, there are two issues here:
1. IP address count checking is wrong (it assumes that 1 IP is required for VIP instead of 2).
2. Nailgun doesn't allow to rerun deployment after exception in deployment task (in UI it looks like deploy is hanged).

I created separate ticket regarding the first issue: https://bugs.launchpad.net/fuel/+bug/1437354 .
Current ticket will be preserved to track second one.

summary:	- Deployment hangs if there are not enought public IPs + [nailgun] It is not possible to rerun deployment after exception in + deployment task
Changed in fuel:
status:	In Progress → Confirmed

Dmitry Pyzhov (dpyzhov) on 2015-03-27

tags:

added: module-serialization

Aleksey Kasatkin (alekseyk-ru) on 2015-04-02

Changed in fuel:
assignee:	Aleksey Kasatkin (alekseyk-ru) → Fuel Python Team (fuel-python)
milestone:	6.1 → 7.0

Nastya Urlapova (aurlapova) on 2015-04-19

Changed in fuel:
importance:	Medium → High
milestone:	7.0 → 6.1

Revision history for this message

Roman Prykhodchenko (romcheg) wrote on 2015-04-23:

Screenshot Edit (782.3 KiB, image/png)

I discovered that even a generic exception w/o any special conditions causes the described problem.

Changed in fuel:
assignee:	Fuel Python Team (fuel-python) → Roman Prykhodchenko (romcheg)

Kamil Sambor (ksambor) on 2015-04-23

Changed in fuel:
assignee:	Roman Prykhodchenko (romcheg) → Kamil Sambor (ksambor)

Revision history for this message

Kamil Sambor (ksambor) wrote on 2015-04-24:

This look like the same problem with not changing deployment tasks status like in the bug: https://bugs.launchpad.net/fuel/+bug/1446651

Changed in fuel:
assignee:	Kamil Sambor (ksambor) → Roman Prykhodchenko (romcheg)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-04-24: Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/177279

Changed in fuel:
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2015-05-05: Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/177279
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=5c6deb65230126cdc10748daa2355243e5232fe9
Submitter: Jenkins
Branch: master

commit 5c6deb65230126cdc10748daa2355243e5232fe9
Author: Roman Prykhodchenko <email address hidden>
Date: Fri Apr 24 16:13:32 2015 +0200

Commit transactions after an exception in a task

    Tasks are now executed in mules so when an exception
    happens during an execution of one, task status is updated
    but not commited to the database because mules start separate
    transactions.

This patch adds a commit clause to the error handler in tasks.

    Closes-bug: #1436821
    Closes-bug: #1446651
    Change-Id: I6c17d1cd88321940f2e23379965675e6b185ab1e

Changed in fuel:
status:	In Progress → Fix Committed

Nikolay Markov (nmarkov) on 2015-05-26

Changed in fuel:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.