Network verification process is very long for 100 nodes

Bug #1378500 reported by Aleksandr Shaposhnikov
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Łukasz Oleś

Bug Description

On 100 nodes network verification will take at least 15-20 minutes to confirm that everything is fine or not. Probably FUEL/MOS should execute network tests in parallel instead doing them one-by-one. I'm not aware of processes and communications behind network verification procedure but some optimizations should be done.
Network verification should also print current findings on page during procedure to keep used updated/aware about process itself and ability to stop it if something went wrong without have to wait until it ends.
Network verifications also shouldn't began if some of nodes are offline because it will take forever to wait for it (until astute timeout actually).

Tags: scale
Changed in fuel:
assignee: nobody → Fuel Python Team (fuel-python)
importance: Undecided → Medium
status: New → Triaged
milestone: none → 6.0
tags: added: scale
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Tomasz 'Zen' Napierala (tzn)
Changed in fuel:
importance: Medium → High
Łukasz Oleś (loles)
Changed in fuel:
assignee: Tomasz 'Zen' Napierala (tzn) → Łukasz Oleś (loles)
Revision history for this message
Łukasz Oleś (loles) wrote :

Current status: still looking for the best solution

Revision history for this message
Łukasz Oleś (loles) wrote :

I have working solution. Just waiting for working scale lab to test it

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Lukasz,
any update on it?

Revision history for this message
Łukasz Oleś (loles) wrote :

Still waiting for stable Lab. Lat time I tried they had miss configured VLANs ans verification was always failing.

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/138760

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 6.0 → 6.1
Revision history for this message
Łukasz Oleś (loles) wrote :

It will be implemented as a part of 200 nodes bp

Changed in fuel:
status: In Progress → Triaged
Revision history for this message
Dina Belova (dbelova) wrote :

Link to the BP, please

Revision history for this message
Łukasz Oleś (loles) wrote :
Changed in fuel:
status: Triaged → In Progress
Changed in fuel:
assignee: Łukasz Oleś (loles) → Kamil Sambor (ksambor)
Łukasz Oleś (loles)
Changed in fuel:
assignee: Kamil Sambor (ksambor) → Łukasz Oleś (loles)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/138760
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=f595715750a2c4820722a96e0236f5c89ca6521c
Submitter: Jenkins
Branch: master

commit f595715750a2c4820722a96e0236f5c89ca6521c
Author: Łukasz Oleś <email address hidden>
Date: Wed Dec 3 16:27:11 2014 +0100

    Speedup network verification

    Instead of sending test packets from each node one by one, sending them from group of nodes.
    Number of nodes may be increased in future.

    Closes-Bug: #1378500
    Change-Id: Ia970692de1778b15d2401b4758d47bc1343c02b7
    Blueprint: 200-nodes-support

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Leontii Istomin (listomin) wrote :

api: '1.0'
astute_sha: 055b2d82fe8499b27c7047295e2e36a7a2c5d430
auth_required: true
build_id: 2015-04-16_21-30-10
build_number: '317'
feature_groups:
- mirantis
fuellib_sha: db5f39e96e7ab9f79691202755e547bf8242661f
fuelmain_sha: 0de2d2039e76839d339f977df45b111bef7200d6
nailgun_sha: 52d92c86e68602fb5dd2f3b8870a773d20a387ef
openstack_version: 2014.2-6.1
ostf_sha: b0991dbad159f53d335efa5d31cb94016ad5312e
production: docker
python-fuelclient_sha: 279ffc358e40dbdc162cfe76898cbd0874529f1f
release: '6.1'

baremetal 203 nodes
I have faced with the similar issue:
http://paste.openstack.org/show/212600/

Revision history for this message
Leontii Istomin (listomin) wrote :

here is the snapshot for the previous message mos-scale-share.mirantis.com/fuel-snapshot-2015-04-29_19-33-40.tar.xz

Revision history for this message
Łukasz Oleś (loles) wrote :

It looks like get_probing_info also should be run in chunks

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-astute (master)

Fix proposed to branch: master
Review: https://review.openstack.org/181769

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-astute (master)

Reviewed: https://review.openstack.org/181769
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=5e188df55c5052898f0768f05ce5d5c07ebdcc7d
Submitter: Jenkins
Branch: master

commit 5e188df55c5052898f0768f05ce5d5c07ebdcc7d
Author: Łukasz Oleś <email address hidden>
Date: Mon May 11 01:37:21 2015 +0200

    Increase timeout to 10 min for net_probe actions

    For 200 nodes get_probing_info action may take more than 5 minutes
    to finish. To fix it net_checker requires some refactoring.
    For now increasing time should be enough.

    Change-Id: Ia3451392b1bfca9d9812c6a397626cdc295fefde
    Closes-bug: #1378500

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
Leontii Istomin (listomin) wrote :

network verification with 200 nodes works well (9min). 511, 521 builds

Changed in fuel:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.