[Scale] Receiverd process message too slow (1 in 2 seconds)

Bug #1570509 reported by Vladimir Sharshov
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Vladimir Sharshov
Mitaka
Fix Released
High
Vladimir Sharshov

Bug Description

Detailed bug description:

In big env ~ 200 nodes deployment was end by Astute log with error, but UI and Nailgun do not have such information after 5 hours.
After investigation was found that problem in Receiverd which processed message from Astute about progress and status very slow. (0,5 in second, expected behavior 15 in second). Also it took too much memory - 1,3 GB.

Steps to reproduce:

Use big cluster with many nodes (at least 100), create cluster and run deployment.

Expected results:

Deployment succeed. Information about deployment ending should be delivered and processed in real time.

Actual result:

Reproducibility:

Always in big env

Workaround:

Slice deployment on small part of nodes, for example, 10-20 and run it one by one.

Impact:

Scale

Description of the environment:

Fuel 9.0 #188, 200+ nodes

tags: added: scale
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/307361

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Leontii Istomin (listomin) wrote :

The issue has been reproduced with the patch above:
1. Deploy fuel
2. Add fuel-agent from master due https://bugs.launchpad.net/fuel/+bug/1543233 and regen ubuntu repo
3. Build and activate new bootstrap image
4. power on 197 nodes
5. Create and configure environment 3 controller,20 compute+Ceph, 174 computes and click deploy changes
Result:
The env has been successfully deployed. From astute.log: http://paste.openstack.org/show/494553/
But UI shows "Deploying..." and fuel task shows that deployment is in progress:
[root@fuel ~]# fuel task
id | status | name | cluster | progress | uuid
---|---------|----------------|---------|----------|-------------------------------------
7 | running | deployment | 1 | 24 | b6f8e7a6-08d5-47aa-9218-ad8d1fc15386
2 | ready | check_networks | 1 | 100 | 1101cc91-983b-4d94-9a0b-2fb879b76c80
3 | running | deploy | 1 | 46 | 05ef70bb-8a56-452d-8b2e-3cc43a4d903c
6 | ready | provision | 1 | 100 | d9f348ae-4716-41df-876d-23c6bd04d935

9.0-217 was used

Changed in fuel:
assignee: Vladimir Sharshov (vsharshov) → Igor Kalnitsky (ikalnitsky)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/307361
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=d2ec3c1e49e87958b71eca9f65dc1674f444f907
Submitter: Jenkins
Branch: master

commit d2ec3c1e49e87958b71eca9f65dc1674f444f907
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Mon Apr 18 19:26:32 2016 +0300

    Mark additional tasks info column as deferred

    Fields:

    - deployment_info;
    - cluster_settings;
    - network_settings.

    in sum take 150Mb for cluster with 200 nodes. Every task
    request will create heavy db select, so it will take
    a lot of memory, cpu load and dramatically decrease
    receiver performance.

    With this change this fields will be loaded only if
    Nailgun directly required them.

    Change-Id: I2417b0370914a0bd3a7f92e897826d9d86e76773
    Closes-Bug: #1570509

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
assignee: Igor Kalnitsky (ikalnitsky) → Vladimir Sharshov (vsharshov)
Revision history for this message
Leontii Istomin (listomin) wrote :

Reproduced the issue the following way:
The issue has been reproduced the following way:
1. Install Fuel 9.0-244
2. add fuel-agent into ubuntu repo on fuel and rebuild bootstrap image due https://bugs.launchpad.net/fuel/+bug/1543233
3. Patch nailgun https://review.openstack.org/gitweb?p=openstack/fuel-web.git;a=patch;h=d2ec3c1e49e87958b71eca9f65dc1674f444f907 due https://bugs.launchpad.net/fuel/+bug/1570509
4. apply changes https://review.openstack.org/gitweb?p=openstack/fuel-web.git;a=patch;h=67d97c00632601a2d5d4dc8100906cc1543bd3f4
(manager.py is attached. there was manual changing of the file)
5. install uwsgi uwsgi-plugin-python python-uwsgidecorator:
wget https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
rpm -Uvh epel-release-7*.rpm
sed -i s/^enabled=1/enabled=0/g /etc/yum.repos.d/epel.repo
yum --enablerepo=epel install uwsgi uwsgi-plugin-python python-uwsgidecorator
6. restart nailgun and receiverd

UI shows that deployment is in progress.
[root@fuel ~]# fuel task
id | status | name | cluster | progress | uuid
---|---------|----------------|---------|----------|-------------------------------------
2 | ready | check_networks | 1 | 100 | 5d511722-1660-4d16-b7e1-9e3480673d01
3 | running | deploy | 1 | 43 | 48a26d07-2806-462c-8ac3-30ea16ee7cf0
6 | ready | provision | 1 | 100 | 618af080-b71e-4525-a3a0-81993e461384
7 | running | deployment | 1 | 20 | 43743e19-afc0-449a-8238-3752c08c42aa

But from astute log: 2016-04-25 12:29:31 INFO [7935] Deployment summary: time was spent 00:51:39
2016-04-25 12:29:31 INFO [7935] Casting message to Nailgun:
{"method"=>"deploy_resp",
 "args"=>
  {"task_uuid"=>"43743e19-afc0-449a-8238-3752c08c42aa",
   "status"=>"ready",
   "progress"=>100}}

http://mos-scale-share.mirantis.com/fuel_logs_1569859_25-04-2016-2.tar.gz

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (stable/mitaka)

Related fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/314702

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to fuel-web (refs/changes/01/314701/1)

Related fix proposed to branch: refs/changes/01/314701/1
Review: https://review.openstack.org/314703

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-web (refs/changes/01/314701/1)

Change abandoned by Vladimir Sharshov (<email address hidden>) on branch: refs/changes/01/314701/1
Review: https://review.openstack.org/314703

Changed in fuel:
status: Fix Committed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/315615

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/315615
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=ab032632f749b05da3c6ddcd542460f406d20d02
Submitter: Jenkins
Branch: stable/mitaka

commit ab032632f749b05da3c6ddcd542460f406d20d02
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Mon Apr 18 19:26:32 2016 +0300

    Mark additional tasks info column as deferred

    Fields:

    - deployment_info;
    - cluster_settings;
    - network_settings.

    in sum take 150Mb for cluster with 200 nodes. Every task
    request will create heavy db select, so it will take
    a lot of memory, cpu load and dramatically decrease
    receiver performance.

    With this change this fields will be loaded only if
    Nailgun directly required them.

    Change-Id: I2417b0370914a0bd3a7f92e897826d9d86e76773
    Closes-Bug: #1570509
    (cherry picked from commit d2ec3c1e49e87958b71eca9f65dc1674f444f907)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/310035
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=a1cafe9837b3d04ed4b2510eaf679b31f47c3fd0
Submitter: Jenkins
Branch: master

commit a1cafe9837b3d04ed4b2510eaf679b31f47c3fd0
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Mon Apr 25 21:24:28 2016 +0300

    Speed up processing of deploy message

    Load only progress column in nodes instead of upload all node
    columns in recalculate_deployment_task_progress in TaskHelper

    Related-Bug: #1570509

    Change-Id: Id3a7a0f8e2ae62d4130d36ae9d4c13b0e4419acf

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/314702
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=5f77be8bf75f33774fee3ceb675afba11fe22ac9
Submitter: Jenkins
Branch: stable/mitaka

commit 5f77be8bf75f33774fee3ceb675afba11fe22ac9
Author: Vladimir Sharshov (warpc) <email address hidden>
Date: Mon Apr 25 21:24:28 2016 +0300

    Speed up processing of deploy message

    Load only progress column in nodes instead of upload all node
    columns in recalculate_deployment_task_progress in TaskHelper

    Related-Bug: #1570509

    Change-Id: Id3a7a0f8e2ae62d4130d36ae9d4c13b0e4419acf
    (cherry picked from commit a1cafe9837b3d04ed4b2510eaf679b31f47c3fd0)

tags: added: in-stable-mitaka
Dmitry Pyzhov (dpyzhov)
tags: added: 9.1-proposed
Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.