deployment, deploy and provisioning tasks in pending status

Bug #1596987 reported by Leontii Istomin
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
High
Stanislaw Bogatkin
Mitaka
Invalid
High
Fuel Sustaining
Newton
Invalid
High
Fuel Sustaining
Ocata
Invalid
High
Stanislaw Bogatkin

Bug Description

Detailed bug description:
 1. deployed 200 nodes (3 controllers, 20 computes+ceph, 177 computes)
 2. added 200 computes, click "deploy changes" and successfully applied the changes
 3. added 200 computes, click "deploy changes", but at the time the cluster wasn't updated:
   a) fuel task "check_networks" was created
   b) fuel task "check_networks" has gone
   c) deployment, deploy and provisioning tasks are in pending status
Steps to reproduce:
 1. deployed 200 nodes (3 controllers, 20 computes+ceph, 177 computes)
 2. added 200 computes, click "deploy changes" and successfully applied the changes
 3. added 200 computes, click "deploy changes", but at the time the cluster wasn't updated:
Expected results:
 Cluster successfully deployed
Actual result:
 Nothing was changed
Reproducibility:
 each time
Workaround:
 -
Impact:
 Adding new nodes
Description of the environment:
 Operation system: ubuntu
 Versions of components: MOS-9.0
 Reference architecture: 3 controllers, 20 computes+ceph, 577 computes
 Network model: vxlan+dvr
 Related projects installed: LMA
Additional information:
 Logs from fuel node: http://mos-scale-share.mirantis.com/1596987_fuel_logs.tar.gz

description: updated
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: none → 10.0
assignee: nobody → Fuel Sustaining (fuel-sustaining-team)
importance: Undecided → High
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Sustaining (fuel-sustaining-team) → Dmitry Guryanov (dguryanov)
Revision history for this message
Dmitry Guryanov (dguryanov) wrote :

It semms we hit 1Gb limit of postgresql: http://www.paste.org/81103

Dmitry Klenov (dklenov)
tags: added: area-python
Changed in fuel:
status: New → Confirmed
Revision history for this message
Georgy Okrokvertskhov (gokrokvertskhov) wrote :

Please add Nailgun tests to test serialization for 10K nodes.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/336467

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/336467
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=052082bd71d80e1298116b88aaf7cce2a12c5231
Submitter: Jenkins
Branch: master

commit 052082bd71d80e1298116b88aaf7cce2a12c5231
Author: Dmitry Guryanov <email address hidden>
Date: Fri Jul 1 13:08:02 2016 +0300

    add node_deployment_info table

    Size of deployment_info field in tasks table grows as n**2
    (depending on number of nodes). If we have 200 nodes, the
    size of the structure is about 20Mb. In case of 600 nodes it
    would be theoretically about 720Mb, in practice it doesn't fit
    into 1Gb.

    Good solution is to put common part to separate place. But it's
    not so fast. Also it will not help if all nodes will be going to
    be deployed with customized deployment info.

    Change-Id: Id3154ab423b0863d9cc4952335293bf5fc30df38
    Partial-Bug: #1596987

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/340839

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/342880

Changed in fuel:
assignee: Dmitry Guryanov (dguryanov) → Bulat Gaifullin (bgaifullin)
Changed in fuel:
assignee: Bulat Gaifullin (bgaifullin) → Dmitry Guryanov (dguryanov)
Changed in fuel:
assignee: Dmitry Guryanov (dguryanov) → Bulat Gaifullin (bgaifullin)
Changed in fuel:
assignee: Bulat Gaifullin (bgaifullin) → Dmitry Guryanov (dguryanov)
Changed in fuel:
assignee: Dmitry Guryanov (dguryanov) → Bulat Gaifullin (bgaifullin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/342880
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=5167527dc442b81aeca56da2af96c7f1c835d993
Submitter: Jenkins
Branch: master

commit 5167527dc442b81aeca56da2af96c7f1c835d993
Author: Bulat Gaifullin <email address hidden>
Date: Wed Jul 13 20:09:39 2016 +0300

    Added methods for patching deployment info per object

    Instead of wide callbacks process_deployment or process_provision
    implemented methods to patch deployemnt or provision info per
    cluster or node.
    The code of extensions and tests was updated accordingly.
    Also added helper to mark methods of extension as deprecated.
    the extension load behaviour was modified, instead of fail operation
    when extension cannot be loaded, the nailgun only write error in log
    that extension is not loaded and continue operation.

    Partial-Bug: 1596987
    Change-Id: I577c8ffc105734e12646ca7c6a4fe4927e70b119
    DocImpact

Changed in fuel:
assignee: Bulat Gaifullin (bgaifullin) → Dmitry Guryanov (dguryanov)
Dmitry Pyzhov (dpyzhov)
tags: added: 9.1-proposed
Revision history for this message
Sergey Galkin (sgalkin) wrote :

FYI
I have created cluster with 378 nodes (added 50-60 nodes in 7 steps)
When I try to add 40 nodes to 378 nodes oom_killer has came when deployment has started.
Fuel VM RAM - 64261340
swap - 32243708

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/359749

Changed in fuel:
assignee: Dmitry Guryanov (dguryanov) → Bulat Gaifullin (bgaifullin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/340839
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=7a83ee0cacf4584e247118cb959f8f1241be0fb2
Submitter: Jenkins
Branch: master

commit 7a83ee0cacf4584e247118cb959f8f1241be0fb2
Author: Dmitry Guryanov <email address hidden>
Date: Thu Aug 25 17:31:50 2016 +0300

    don't merge common_attrs with node data in deployment_info

    The size of deployment_info grows as n^2 depending on
    nodes number. That's because common_attrs, which is
    merged into each node's contains info about all nodes.

    For example for 600 nodes we store about 1Gb of data in
    the database. So as first step let's store common_attrs
    separately in deployment_info structure inside python
    code and in the database.
    Also removed old test for migrations, which are not related
    to actual database state.

    Change-Id: I431062b3f9c8dedd407570729166072b780dc59a
    Partial-Bug: #1596987

Changed in fuel:
assignee: Bulat Gaifullin (bgaifullin) → Dmitry Guryanov (dguryanov)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/359749
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=5c2b87951d359cfd083fc525edae940f4567a507
Submitter: Jenkins
Branch: stable/mitaka

commit 5c2b87951d359cfd083fc525edae940f4567a507
Author: Dmitry Guryanov <email address hidden>
Date: Fri Sep 2 16:06:20 2016 +0300

    add node_deployment_info table

    Size of deployment_info field in tasks table grows as n**2
    (depending on number of nodes). If we have 200 nodes, the
    size of the structure is about 20Mb. In case of 600 nodes it
    would be theoretically about 720Mb, in practice it doesn't fit
    into 1Gb.

    Good solution is to put common part to separate place. But it's
    not so fast. Also it will not help if all nodes will be going to
    be deployed with customized deployment info.

    Backported from 052082bd71d80e1298116b88aaf7cce2a12c5231

    Change-Id: Id3154ab423b0863d9cc4952335293bf5fc30df38
    Partial-Bug: #1596987

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/365666

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/365666
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=bac4afcb34276f1d9110a89fbb426de4a3e40d1c
Submitter: Jenkins
Branch: stable/mitaka

commit bac4afcb34276f1d9110a89fbb426de4a3e40d1c
Author: Bulat Gaifullin <email address hidden>
Date: Wed Jul 13 20:09:39 2016 +0300

    Added methods for patching deployment info per object

    Instead of wide callbacks process_deployment or process_provision
    implemented methods to patch deployemnt or provision info per
    cluster or node.
    The code of extensions and tests was updated accordingly.
    Also added helper to mark methods of extension as deprecated.
    the extension load behaviour was modified, instead of fail operation
    when extension cannot be loaded, the nailgun only write error in log
    that extension is not loaded and continue operation.

    Cherry-pick of 5167527dc442b81aeca56da2af96c7f1c835d993

    Partial-Bug: 1596987
    Change-Id: I577c8ffc105734e12646ca7c6a4fe4927e70b119
    DocImpact

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/344687
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=1f31a850b8f162952b2b3fd816e005bf93e566ee
Submitter: Jenkins
Branch: master

commit 1f31a850b8f162952b2b3fd816e005bf93e566ee
Author: Dmitry Guryanov <email address hidden>
Date: Fri Sep 2 18:21:04 2016 +0300

    Add UnionDict class for lcm

    We don't merge common_attrs with node's custom attributes
    in deployment info. But YAQL expressions still need to
    access 'unified' deployment info, related to node. So
    we were merging dicts every time yaql want to access it.
    It's a bit slow, so this patch introduces class, which
    works like union of several dicts without creation of the new
    data structure.

    Partial-Bug: #1596987

    Change-Id: Iea32ab222421fc7a3c5df66e7e48f4d1a4b931f5

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/366647

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/366822

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/366844

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/366647
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=311611e5673a839a20f806e805eaec037145e7e5
Submitter: Jenkins
Branch: stable/mitaka

commit 311611e5673a839a20f806e805eaec037145e7e5
Author: Dmitry Guryanov <email address hidden>
Date: Thu Aug 25 17:31:50 2016 +0300

    don't merge common_attrs with node data in deployment_info

    The size of deployment_info grows as n^2 depending on
    nodes number. That's because common_attrs, which is
    merged into each node's contains info about all nodes.

    For example for 600 nodes we store about 1Gb of data in
    the database. So as first step let's store common_attrs
    separately in deployment_info structure inside python
    code and in the database.
    Also removed old test for migrations, which are not related
    to actual database state.

    Change-Id: I431062b3f9c8dedd407570729166072b780dc59a
    Partial-Bug: #1596987

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/367592

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/367592
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=96fe943020630f979d26be06d994c9695e38dfac
Submitter: Jenkins
Branch: stable/mitaka

commit 96fe943020630f979d26be06d994c9695e38dfac
Author: Dmitry Guryanov <email address hidden>
Date: Fri Sep 2 18:21:04 2016 +0300

    Add UnionDict class for lcm

    We don't merge common_attrs with node's custom attributes
    in deployment info. But YAQL expressions still need to
    access 'unified' deployment info, related to node. So
    we were merging dicts every time yaql want to access it.
    It's a bit slow, so this patch introduces class, which
    works like union of several dicts without creation of the new
    data structure.

    Partial-Bug: #1596987

    Change-Id: Iea32ab222421fc7a3c5df66e7e48f4d1a4b931f5
    (cherry picked from commit 1f31a850b8f162952b2b3fd816e005bf93e566ee)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/368661

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/366822
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=88e9f5a209dc3cf710141c05e5e1be9311593b21
Submitter: Jenkins
Branch: master

commit 88e9f5a209dc3cf710141c05e5e1be9311593b21
Author: Dmitry Guryanov <email address hidden>
Date: Fri Sep 9 13:23:57 2016 +0300

    Add ability to get common or node part of context in lcm

    The main purpose of this commit is to have an ability
    to split configuration file astute.yaml into common
    and node parts. Common part is huge and we will
    dump it once and also there will be only one instance
    of this data in RAM which saves a lot of memory when
    you run deploy on many nodes (>100).

    This patch adds two new variables to context, which can be
    used in yaql_exp: $node and $common for node and common
    parts of the context. Functions changed, changed_all,
    changed_any, added, deleted don't work for these variables.

    DocImpact
    Change-Id: I56bf982652a5dc27882e4a401ca9ec124899fed7
    Partial-Bug: #1596987

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (stable/mitaka)

Reviewed: https://review.openstack.org/368661
Committed: https://git.openstack.org/cgit/openstack/fuel-web/commit/?id=8fbcf681798144d11b650b05b47f6f2e56cbd081
Submitter: Jenkins
Branch: stable/mitaka

commit 8fbcf681798144d11b650b05b47f6f2e56cbd081
Author: Dmitry Guryanov <email address hidden>
Date: Fri Sep 9 13:23:57 2016 +0300

    Add ability to get common or node part of context in lcm

    The main purpose of this commit is to have an ability
    to split configuration file astute.yaml into common
    and node parts. Common part is huge and we will
    dump it once and also there will be only one instance
    of this data in RAM which saves a lot of memory when
    you run deploy on many nodes (>100).

    This patch adds two new variables to context, which can be
    used in yaql_exp: $node and $common for node and common
    parts of the context. Functions changed, changed_all,
    changed_any, added, deleted don't work for these variables.

    DocImpact
    Change-Id: I56bf982652a5dc27882e4a401ca9ec124899fed7
    Partial-Bug: #1596987

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/369358

Changed in fuel:
assignee: Dmitry Guryanov (dguryanov) → Vladimir Kuklin (vkuklin)
Changed in fuel:
assignee: Vladimir Kuklin (vkuklin) → Stanislaw Bogatkin (sbogatkin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/mitaka)

Change abandoned by Fuel DevOps Robot (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/369358
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 10.0 → 10.1
Changed in fuel:
assignee: Stanislaw Bogatkin (sbogatkin) → Alexey Shtokolov (ashtokolov)
Changed in fuel:
assignee: Alexey Shtokolov (ashtokolov) → Stanislaw Bogatkin (sbogatkin)
Revision history for this message
Sergey Galkin (sgalkin) wrote :

Reproduced in 9.2 from http://mirror.fuel-infra.org/mos-repos/centos/mos9.0-centos7/snapshots/proposed-2017-01-13-184421/x86_64

deployment hangs after cluster reset, 'systemctl restart astute.service' did not help

Task status
http://paste.openstack.org/show/596243/

Last astute logs
http://paste.openstack.org/show/596244/

Revision history for this message
Sergey Galkin (sgalkin) wrote :

Cluster with 243 nodes

Revision history for this message
Stanislaw Bogatkin (sbogatkin) wrote :

This bug looks too big for one patch, so I close it. All related fixes will be tracked in a separate bug reports.

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

the logs provided above seem to be related to another bug https://bugs.launchpad.net/fuel/+bug/1630299

This particular bug is related to all variants of 'slow' deployment regardless of the root cause. This makes this bug a cinder of reviews and logs attachments. All remaining issues with Fuel performance should be reported separately, so that we can triage and fix them effectively.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Fuel DevOps Robot (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/369358
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.