MOS/Fuel masternode high load during 99 nodes cluster provisioning/deployment

Bug #1381757 reported by Aleksandr Shaposhnikov
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Committed
High
Łukasz Oleś

Bug Description

Once provision/deployment started at some point service receiverd grab a 100% resources of one core and remains in that state very long time.

/api/versions

{"build_id": "2014-10-15_17-45-06", "ostf_sha": "de177931b53fbe9655502b73d03910b8118e25f1", "build_number": "23", "auth_required": true, "api": "1.0", "nailgun_sha": "d9facd6a32293da786b738d1a9b1459e36aa3006", "production": "docker", "fuelmain_sha": "18b8264c17782c4dbb77412d3d4fe256a2083d7d", "astute_sha": "c3e7c7a18528cf9acca48021488a93dff74f5c97", "feature_groups": ["mirantis"], "release": "6.0", "release_versions": {"2014.2-6.0": {"VERSION": {"build_id": "2014-10-15_17-45-06", "ostf_sha": "de177931b53fbe9655502b73d03910b8118e25f1", "build_number": "23", "api": "1.0", "nailgun_sha": "d9facd6a32293da786b738d1a9b1459e36aa3006", "production": "docker", "fuelmain_sha": "18b8264c17782c4dbb77412d3d4fe256a2083d7d", "astute_sha": "c3e7c7a18528cf9acca48021488a93dff74f5c97", "feature_groups": ["mirantis"], "release": "6.0", "fuellib_sha": "b3f6943326dac065464555a320ed4b2d4bdbb699"}}}, "fuellib_sha": "b3f6943326dac065464555a320ed4b2d4bdbb699"}

Tags: scale nailgun
Revision history for this message
Mike Scherbakov (mihgen) wrote :

I believe that's because it's getting a lot of stuff from Astute over RPC - like progress values. We need to take a look on logs, and ideally profile - what procedures take that much of CPU.
Alex - can you attach a diagnostic snapshot please? Should help to faster resolve an issue.

tags: added: scale
Revision history for this message
Aleksandr Shaposhnikov (alashai8) wrote :

Last time when I've tried to do so launchpad thread which serviced that upload died in pain because of size (around half gig) =(

description: updated
Revision history for this message
Aleksandr Shaposhnikov (alashai8) wrote :
Mike Scherbakov (mihgen)
Changed in fuel:
milestone: none → 6.0
assignee: nobody → Fuel Python Team (fuel-python)
tags: added: nailgun
Changed in fuel:
importance: Undecided → High
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Łukasz Oleś (loles)
Changed in fuel:
status: New → Triaged
Revision history for this message
Aleksandr Shaposhnikov (alashai8) wrote :

top - 20:24:10 up 56 min, 1 user, load average: 1.44, 1.46, 0.89
Tasks: 370 total, 3 running, 367 sleeping, 0 stopped, 0 zombie
Cpu0 : 7.8%us, 15.4%sy, 0.0%ni, 61.8%id, 0.0%wa, 0.0%hi, 15.0%si, 0.0%st
Cpu1 : 70.0%us, 1.3%sy, 0.0%ni, 28.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.3%st
Cpu2 : 36.0%us, 5.1%sy, 0.0%ni, 56.2%id, 0.0%wa, 0.0%hi, 2.7%si, 0.0%st
Cpu3 : 5.0%us, 2.3%sy, 0.0%ni, 92.3%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu4 : 2.3%us, 2.7%sy, 0.0%ni, 94.6%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st
Cpu5 : 8.8%us, 4.8%sy, 0.0%ni, 85.0%id, 0.0%wa, 0.0%hi, 1.4%si, 0.0%st
Mem: 8193372k total, 8031336k used, 162036k free, 454028k buffers
Swap: 5242872k total, 2172k used, 5240700k free, 4494040k cached

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
16989 root 20 0 389m 231m 4736 R 95.6 2.9 1:57.99 receiverd
 5627 root 20 0 1365m 33m 6480 S 27.6 0.4 1:26.19 docker
12142 root 20 0 765m 2692 1524 S 18.3 0.0 0:59.83 rsyslogd
14627 saslauth 20 0 97292 3688 1108 S 9.3 0.0 0:28.68 nginx

Revision history for this message
Tomasz 'Zen' Napierala (tzn) wrote :

I belive Łukasz is preparing fixes for fuel.
What is general responsivenes of fuel master node during the test? Is acceptable?

Changed in fuel:
status: Triaged → Confirmed
Revision history for this message
Aleksandr Shaposhnikov (alashai8) wrote :

It pretty responsive mainly because only one core used by receiverd and we have 6 of them.

Łukasz Oleś (loles)
Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/129759

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
status: Fix Committed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/129759
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=8c20977025067c662d99437f33226375d03e471a
Submitter: Jenkins
Branch: master

commit 8c20977025067c662d99437f33226375d03e471a
Author: Łukasz Oleś <email address hidden>
Date: Tue Oct 21 01:46:06 2014 +0200

    Do not load Task.cache column by default

    For 100 nodes, cache column may have 70MB. For sqlalchemy it takes almost 10s
    to load it. This column wil be loaded only when accessed.

    Closes-Bug: 1381757
    blueprint 100-nodes-support

    Change-Id: I220faf3b49244f3eda8bde26ede85f149eb2caa2

Changed in fuel:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.