[nailgun] Network verification failed if controller is offline

Bug #1318659 reported by Egor Kotko
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Dima Shulyak
5.0.x
Won't Fix
High
Fuel Python (Deprecated)
6.0.x
Won't Fix
High
Fuel Python (Deprecated)

Bug Description

{"build_id": "2014-05-12_11-37-35", "mirantis": "yes", "build_number": "194", "ostf_sha": "cdb075090b752246a9c43db3e918c42f645b5873", "nailgun_sha": "4477ba3a6efc4379a6509386e7a9e2e6ae832041", "production": "docker", "api": "1.0", "fuelmain_sha": "97d7f6d5461db3afc27f58160cf9f6985230d255", "astute_sha": "5813d9b537ba6ac95f668321c682f339aac57e05", "release": "5.0", "fuellib_sha": "ff4e0182a94f9b17e5a02bcc65faaf4452a0ad35"}

Steps to reproduce:
1. Create env - Centos, 3 Controllers, Neutron Vlan.
2. Shutdown primary controller
3. Verify networks

Network Verification failed:

2014-05-12 13:17:25 INFO
[398] Casting message to fuel: {"method"=>"verify_networks_resp", "args"=>{"task_uuid"=>"77387ab2-2412-4261-9f3e-ab7b4874f27a", "status"=>"error", "error"=>"Error occurred while running method 'verify_networks'. Inspect Orchestrator logs for the details."}}
2014-05-12 13:17:25 ERR
[398] Error running RPC method verify_networks: 77387ab2-2412-4261-9f3e-ab7b4874f27a: MCollective agents '1' didn't respond within the allotted time.
, trace: ["/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/mclient.rb:114:in `check_results_with_retries'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/mclient.rb:62:in `method_missing'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/network.rb:78:in `block in start_frame_listeners'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/network.rb:71:in `each'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/network.rb:71:in `start_frame_listeners'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/network.rb:39:in `check_network'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/orchestrator.rb:173:in `verify_networks'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/dispatcher.rb:114:in `verify_networks'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:126:in `dispatch_message'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:89:in `block in dispatch'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:64:in `call'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:64:in `block in each'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:56:in `each'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/task_queue.rb:56:in `each'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:87:in `each_with_index'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:87:in `dispatch'", "/usr/lib64/ruby/gems/2.1.0/gems/astute-0.0.2/lib/astute/server/server.rb:72:in `block in perform_main_job'"]
2014-05-12 13:17:25 ERR
[398] MCollective agents '1' didn't respond within the allotted time.
2014-05-12 13:15:23 DEBUG
[398] Retry #5 to run mcollective agent on nodes: '1'
2014-05-12 13:13:20 DEBUG
[398] Retry #4 to run mcollective agent on nodes: '1'
2014-05-12 13:11:17 DEBUG
[398] Retry #3 to run mcollective agent on nodes: '1'
2014-05-12 13:09:15 DEBUG
[398] Retry #2 to run mcollective agent on nodes: '1'
2014-05-12 13:07:13 DEBUG
[398] Retry #1 to run mcollective agent on nodes: '1'

Revision history for this message
Egor Kotko (ykotko) wrote :
Revision history for this message
Dima Shulyak (dshulyak) wrote :

It will happen not only after primary controller shutdown, but also if any node goes offline.

I think about adding warning about that some nodes of cluster if offline, but do not send them to network verification task.

Also it would be cool if titles for bugs was a bit more descriptive.

Changed in fuel:
status: New → Confirmed
importance: Undecided → Medium
assignee: nobody → Fuel Python Team (fuel-python)
Mike Scherbakov (mihgen)
summary: - Network verification failed
+ Network verification failed if controller is offline
Dmitry Ilyin (idv1985)
summary: - Network verification failed if controller is offline
+ [nailgun] Network verification failed if controller is offline
Revision history for this message
Dima Shulyak (dshulyak) wrote :

I think it should be fixed in astute. If some node is not reachable - astute should simply skip response info from that node.

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Fuel Astute Team (fuel-astute)
tags: added: release-notes
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
milestone: 5.1 → 6.0
Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

Astute do not know node status unlike Nailgun. If node has offline status and this is fine, i think Nailgun should not send it to Astute. Hiding such logic in Astute can generate problem in future.

Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Astute Team (fuel-astute) → Dima Shulyak (dshulyak)
Revision history for this message
Roman Prykhodchenko (romcheg) wrote :

Moving to 6.1 because it hits code freeze in 6.0

Changed in fuel:
milestone: 6.0 → 6.1
Dima Shulyak (dshulyak)
Changed in fuel:
assignee: Dima Shulyak (dshulyak) → Fuel Python Team (fuel-python)
Dmitry Pyzhov (dpyzhov)
tags: added: module-netcheck
removed: nailgun
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Aleksey Kasatkin (alekseyk-ru)
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

Nodes' statuses should be encountered when the message to astute is being assembled: https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/task/task.py#L588

Changed in fuel:
assignee: Aleksey Kasatkin (alekseyk-ru) → Fuel Python Team (fuel-python)
milestone: 6.1 → 7.0
Changed in fuel:
importance: Medium → High
milestone: 7.0 → 6.1
Dmitry Pyzhov (dpyzhov)
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Aleksey Kasatkin (alekseyk-ru)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/175978

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/175978
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=c3d5ceb8dc4cbcf7af3853120b62eae89e72f4e5
Submitter: Jenkins
Branch: master

commit c3d5ceb8dc4cbcf7af3853120b62eae89e72f4e5
Author: Aleksey Kasatkin <email address hidden>
Date: Tue Apr 21 19:33:24 2015 +0300

    Do not check networks on offline nodes

    net_checker should not try to verify connectivity with offline nodes.
    There should be two or more online nodes in order to run net_checker.
    Error message is generated otherwise.
    Offline nodes quantity is saved in task's message to generate
    corresponding notice upon task completion.
    Notice messages should not replace error messages and each other. They
    are concatenated with '\n' delimiter.

    DocImpact
    Closes-Bug: #1318659
    Change-Id: Ic89593bb98241ec3d708bcbfc020b3df9610f284

Changed in fuel:
status: In Progress → Fix Committed
Vasily Gorin (vgorin)
tags: added: on-verification
Revision history for this message
Vasily Gorin (vgorin) wrote :

Verified on build #357

{"build_id": "2015-04-27_22-54-38", "build_number": "357", "release_versions": {"2014.2.2-6.1": {"VERSION": {"build_id": "2015-04-27_22-54-38", "build_number": "357", "api": "1.0", "fuel-library_sha": "0e5b82d24853304befb22145ac4aaf3545d295e1", "nailgun_sha": "5e52637d9944c2f4170012560d15ecf89a691af6", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "8cd6cf575d3c101dee1032abb6877dfa8487e077", "astute_sha": "c1793f982fda7e3fc7b937ccaa613c649be6a144", "fuel-ostf_sha": "b38602c841deaa03ddffc95c02f319360462cbe3", "release": "6.1", "fuelmain_sha": "1ec588d364b9b97f124f6d602dbcc4aa13327218"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "0e5b82d24853304befb22145ac4aaf3545d295e1", "nailgun_sha": "5e52637d9944c2f4170012560d15ecf89a691af6", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "8cd6cf575d3c101dee1032abb6877dfa8487e077", "astute_sha": "c1793f982fda7e3fc7b937ccaa613c649be6a144", "fuel-ostf_sha": "b38602c841deaa03ddffc95c02f319360462cbe3", "release": "6.1", "fuelmain_sha": "1ec588d364b9b97f124f6d602dbcc4aa13327218"}

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
Revision history for this message
Vasily Gorin (vgorin) wrote :

Reproduced on build #466

{"build_id": "2015-05-25_20-55-26", "build_number": "466", "release_versions": {"2014.2.2-6.1": {"VERSION": {"build_id": "2015-05-25_20-55-26", "build_number": "466", "api": "1.0", "fuel-library_sha": "d7128c27a1b76f4813f3697609f82875c68e85ed", "nailgun_sha": "61ef0edfbfe0c457265a62f0eab05af634ec3b91", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "e19f1b65792f84c4a18b5a9473f85ef3ba172fce", "astute_sha": "0bd72c72369e743376864e8e8dabfe873d40450a", "fuel-ostf_sha": "87819878bc0ca572900e1f6933d9b99e666d6f62", "release": "6.1", "fuelmain_sha": "5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "d7128c27a1b76f4813f3697609f82875c68e85ed", "nailgun_sha": "61ef0edfbfe0c457265a62f0eab05af634ec3b91", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "e19f1b65792f84c4a18b5a9473f85ef3ba172fce", "astute_sha": "0bd72c72369e743376864e8e8dabfe873d40450a", "fuel-ostf_sha": "87819878bc0ca572900e1f6933d9b99e666d6f62", "release": "6.1", "fuelmain_sha": "5c8ebddf64ea93000af2de3ccdb4aa8bb766ce93"}

Changed in fuel:
status: Fix Released → Confirmed
Revision history for this message
Aleksey Kasatkin (alekseyk-ru) wrote :

@vgorin, please provide diagnostic snapshot if possible.

Revision history for this message
Dima Shulyak (dshulyak) wrote :

I think this regression cause by repo connectivity.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-web (master)

Fix proposed to branch: master
Review: https://review.openstack.org/185655

Changed in fuel:
assignee: Aleksey Kasatkin (alekseyk-ru) → Dima Shulyak (dshulyak)
status: Confirmed → In Progress
Revision history for this message
Vasily Gorin (vgorin) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-web (master)

Reviewed: https://review.openstack.org/185655
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=10cb262275707743187ce1919673470fef6d3e33
Submitter: Jenkins
Branch: master

commit 10cb262275707743187ce1919673470fef6d3e33
Author: Dmitry Shulyak <email address hidden>
Date: Tue May 26 18:05:47 2015 +0200

    Do not include offline nodes in repo connectivity check

    Nodes are excluded from several tasks:
    - repo connectivity
    - repo connectivity with setup

    Change-Id: Iffdaae71b9e74e4e6a6f34f480f74bd52b93c085
    Partial-Bug: 1318659

Dima Shulyak (dshulyak)
Changed in fuel:
status: In Progress → Fix Committed
tags: added: on-verification
Revision history for this message
Anastasia Palkina (apalkina) wrote :

Verified on ISO #490

"build_id": "2015-05-31_20-55-26", "build_number": "490", "release_versions": {"2014.2.2-6.1": {"VERSION": {"build_id": "2015-05-31_20-55-26", "build_number": "490", "api": "1.0", "fuel-library_sha": "c9a86ac0e6da95d36e328ce5130715792a2eb177", "nailgun_sha": "3830bdcb28ec050eed399fe782cc3dd5fbf31bde", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "4fc55db0265bbf39c369df398b9dc7d6469ba13b", "astute_sha": "5d570ae5e03909182db8e284fbe6e4468c0a4e3e", "fuel-ostf_sha": "7413186490e8d651b8837b9eee75efa53f5e230b", "release": "6.1", "fuelmain_sha": "6b5712a7197672d588801a1816f56f321cbceebd"}}}, "auth_required": true, "api": "1.0", "fuel-library_sha": "c9a86ac0e6da95d36e328ce5130715792a2eb177", "nailgun_sha": "3830bdcb28ec050eed399fe782cc3dd5fbf31bde", "feature_groups": ["mirantis"], "openstack_version": "2014.2.2-6.1", "production": "docker", "python-fuelclient_sha": "4fc55db0265bbf39c369df398b9dc7d6469ba13b", "astute_sha": "5d570ae5e03909182db8e284fbe6e4468c0a4e3e", "fuel-ostf_sha": "7413186490e8d651b8837b9eee75efa53f5e230b", "release": "6.1", "fuelmain_sha": "6b5712a7197672d588801a1816f56f321cbceebd"

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
Revision history for this message
Vitaly Sedelnik (vsedelnik) wrote :

Won't Fix for 6.0-updates as we don't expect new 6.0 deployments.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.