Nailgun failed the sucessfully completed deploy after timeout in 2 hours

Bug #1392462 reported by Dennis Dmitriev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
Critical
Artem Panchenko

Bug Description

Deploy of the cluster has sucessfully completed.
Astute reported that to nailgun.

But nailgun failed the task after timeout in 2 hour:

=== /var/log/docker-logs/nailgun/receiverd.log:
http://paste.openstack.org/show/132924/

[root@nailgun nailgun]# fuel node
id | status | name | cluster | ip | mac | roles | pending_roles | online | group_id
---|--------|---------------------|---------|------------|-------------------|------------|---------------|--------|---------
3 | ready | slave-02_compute | 1 | 10.108.0.4 | 64:45:59:7b:08:8e | compute | | True | 1
2 | ready | slave-01_controller | 1 | 10.108.0.3 | 64:19:f0:69:56:9c | controller | | True | 1
1 | ready | slave-03_cinder | 1 | 10.108.0.5 | 64:4f:23:ee:2f:9f | cinder | | True | 1

Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Reproduced also on bvt_2: http://jenkins-product.srt.mirantis.net:8080/job/6.0.ubuntu.bvt_2/85/

================= /var/log/docker-logs/nailgun/receiverd.log:
2014-11-13 18:11:31.150 INFO [7f1fe7aad700] (notification) Notification: topic: done message: Deployment of environment 'TestHaVLAN' is done. Access the Ope
nStack dashboard (Horizon) at http://10.108.1.2/
2014-11-13 18:11:31.150 DEBUG [7f1fe7aad700] (task) Updating task: 94b5e7b1-39f4-4b13-83e2-21767eb6705f
2014-11-13 18:11:31.152 DEBUG [7f1fe7aad700] (task) Updating cluster status: 94b5e7b1-39f4-4b13-83e2-21767eb6705f cluster_id: 1 status: ready
2014-11-13 18:11:31.154 DEBUG [7f1fe7aad700] (task) Updating parent task: 4a324528-2fd5-4071-851e-735b6a1f224f.
2014-11-13 19:44:47.918 INFO [7f1fe7aad700] (receiver) RPC method dump_environment_resp received: {"status": "ready", "progress": 100, "task_uuid": "f6078c8
6-c49f-4169-8b2b-c8b0910dfaf0", "msg": "/var/www/nailgun/dump/fuel-snapshot-2014-11-13_19-43-24.tgz"}

Revision history for this message
Dima Shulyak (dshulyak) wrote :

interesting status of tasks:

nailgun=# select uuid, name, status from tasks;
                 uuid | name | status
--------------------------------------+----------------+---------
 c581cffe-7ec9-4871-82b3-71891f97ed19 | check_networks | ready
 bacae6bb-b314-49e0-9118-5ad92d9fe238 | provision | running
 4a324528-2fd5-4071-851e-735b6a1f224f | deploy | running
 94b5e7b1-39f4-4b13-83e2-21767eb6705f | deployment | ready
 f6078c86-c49f-4169-8b2b-c8b0910dfaf0 | dump | running
(5 rows)

looks like provision wasnt reported properly, and main task which is deploy, was considered hanged by system tests

Revision history for this message
Dima Shulyak (dshulyak) wrote :

Scanned through the logs and there is no status=ready message for provision task from astute.

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Fuel Astute Team (fuel-astute)
status: New → Confirmed
Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

I think that this issue is caused by the following commit:

https://github.com/stackforge/fuel-astute/commit/41c78572ef8c1c2ce44df96770cfc1695f5c2d02

Call of 'report_about_progress' method was moved higher than condition 'if nodes_not_booted.empty?', so when all nodes have 100% progress status Astute reports that to Nailgun and the next report with changing status to 'ready' is cancelled due to:

https://github.com/stackforge/fuel-astute/blob/master/lib/astute/reporter.rb#L100

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/134380

Changed in fuel:
assignee: Fuel Astute Team (fuel-astute) → Artem Panchenko (apanchenko-8)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/134380
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=0085021fe327f6f910901b3ca55051b1df33a96e
Submitter: Jenkins
Branch: master

commit 0085021fe327f6f910901b3ca55051b1df33a96e
Author: Artem Panchenko <email address hidden>
Date: Fri Nov 14 00:08:55 2014 +0200

    Don't report about 100% provisioning status twice

    Do not report to Nailgun that status of provisioning
    on all nodes is 100%, because it cancels sending
    of 'provisioning task is ready' message.

    Change-Id: I0f677bd743413d68a98a4c817fa3694d28f57ef8
    Closes-bug: #1392462

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Please backport it to 5.1.1 too, because of https://review.openstack.org/#/c/134491/

Revision history for this message
Artem Panchenko (apanchenko-8) wrote :

Backport of the fix was included to https://review.openstack.org/#/c/134491/ , so this bug doesn't affect 5.1.x

no longer affects: fuel/5.1.x
Revision history for this message
Egor Kotko (ykotko) wrote :

{u'build_id': u'2014-11-18_22-00-23', u'ostf_sha': u'82465a94eed4eff1fc8d8e1f2fb7e9993c22f068', u'build_number': u'114', u'auth_required': True, u'nailgun_sha': u'b0add09c4361fee8fc70637c9a6ef42fbe738abe', u'production': u'docker', u'api': u'1.0', u'fuelmain_sha': u'e556f0e1b00c30ec5c4b374ca2878c047c8686c2', u'astute_sha': u'65eb911c38afc0e23d187772f9a05f703c685896', u'feature_groups': [u'mirantis'], u'release': u'6.0', u'release_versions': {u'2014.2-6.0': {u'VERSION': {u'build_id': u'2014-11-18_22-00-23', u'ostf_sha': u'82465a94eed4eff1fc8d8e1f2fb7e9993c22f068', u'build_number': u'114', u'api': u'1.0', u'nailgun_sha': u'b0add09c4361fee8fc70637c9a6ef42fbe738abe', u'production': u'docker', u'fuelmain_sha': u'e556f0e1b00c30ec5c4b374ca2878c047c8686c2', u'astute_sha': u'65eb911c38afc0e23d187772f9a05f703c685896', u'feature_groups': [u'mirantis'], u'release': u'6.0', u'fuellib_sha': u'5a5275370b33ab3b9a403728a1c7ad173289e4a0'}}}, u'fuellib_sha': u'5a5275370b33ab3b9a403728a1c7ad173289e4a0'}

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.