Error occurred while running method 'verify_networks'. In Orchestrator logs: MCollective call failed in agent 'net_probe', method 'start_frame_listeners', failed nodes: ID: 10 - Reason: Transport endpoint is not connected

Bug #1259935 reported by Anastasia Palkina on 2013-12-11
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Dima Shulyak
4.1.x
High
Dima Shulyak
5.0.x
High
Dima Shulyak

Bug Description

ISO #124
"release": "4.0",
"nailgun_sha": "8d80f823c38c2af6dc98173bcbe348d022960a3d",
"ostf_sha": "cf48dac2a6e7ad284fc93c529f3d1e4668504028",
"astute_sha": "ae026938f272f69afbe89c9900bf1c3df483557c",
"fuellib_sha": "687a554eb9b6ae4dcc114f34e9690e601b40610c"

1. Create new environment (Ubuntu, simple mode)
2. Add controller, compute, cinder
3. Move floating and public networks to eth1
4. Verifying networks was successful
5. Start deployment. Success
6. Start "Verify networks"
Verification failed.
Error occurred while running method 'verify_networks'. Inspect Orchestrator logs for the details.

In orchestrator logs:
2013-12-11 13:42:30 ERR

[8830] Error running RPC method verify_networks: 3e5cd019-735d-41ea-a7a3-2380e123dcc4: MCollective call failed in agent 'net_probe', method 'start_frame_listeners', failed nodes:
ID: 10 - Reason: Transport endpoint is not connected
, trace: ["/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/mclient.rb:98:in `check_results_with_retries'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/mclient.rb:51:in `method_missing'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:79:in `block in start_frame_listeners'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:72:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:72:in `start_frame_listeners'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:40:in `check_network'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/orchestrator.rb:156:in `verify_networks'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/dispatcher.rb:108:in `verify_networks'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:103:in `dispatch_message'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:70:in `block in dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each_with_index'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:38:in `block (2 levels) in server_loop'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:49:in `block (2 levels) in consume_one'"]

2013-12-11 13:42:30 ERR

[8830] MCollective call failed in agent 'net_probe', method 'start_frame_listeners', failed nodes:
ID: 10 - Reason: Transport endpoint is not connected

Anastasia Palkina (apalkina) wrote :
Dmitry Pyzhov (dpyzhov) wrote :

Seen on ubuntu two times

Dmitry Pyzhov (dpyzhov) on 2013-12-12
Changed in fuel:
assignee: Dmitry Pyzhov (lux-place) → Andrey Danin (gcon-monolake)
importance: Undecided → High
Dmitry Pyzhov (dpyzhov) wrote :

Issue is in init script. OSCI will create new package.

Changed in fuel:
status: New → Fix Committed
Changed in fuel:
status: Fix Committed → In Progress
Changed in fuel:
status: In Progress → Fix Committed
Andrew (box857+launchpad) wrote :

I'm seeing the same from 4.0 GA version when verifying networks with just the bootstrapped nodes.

Dmitry Borodaenko (angdraug) wrote :

Reopening: same problem is observed with bootstrap nodes. Does the CentOS init script have the same problem and also needs to be updated?

Changed in fuel:
status: Fix Committed → New
assignee: Andrey Danin (gcon-monolake) → nobody
assignee: nobody → Fuel Library Team (fuel-library)
Andrey Korolyov (xdeller) wrote :

No, this time problem should be different. Of course you can check if two mco instances are running for match original case.

Dmitry Pyzhov (dpyzhov) on 2014-01-14
Changed in fuel:
status: New → Fix Released
Tatyanka (tatyana-leontovich) wrote :

reopened

Changed in fuel:
status: Fix Released → Confirmed
milestone: 4.0 → 4.1
Aleksandr Didenko (adidenko) wrote :
Download full text (3.6 KiB)

I've got the same problems on 4.1 ISOs sporadically on bootstrap stage during pre-deployment network verification. The last time 2 of 4 nodes were reporting problems with "net_probe" Mcollectiva agent, here is an example from naily.log:

2014-01-23T16:53:10 debug: [9064] 085e75e9-8d1c-4cc5-9756-a1e3aed2f30e: MC agent 'net_probe', method 'start_frame_listeners', results: {:sender=>"5", :statuscode=>5, :statusmsg=>"Transport endpoint is not connected", :data=>{}}
2014-01-23T16:53:10 err: [9064] MCollective call failed in agent 'net_probe', method 'start_frame_listeners', failed nodes:
ID: 5 - Reason: Transport endpoint is not connected
2014-01-23T16:53:10 err: [9064] Error running RPC method verify_networks: 085e75e9-8d1c-4cc5-9756-a1e3aed2f30e: MCollective call failed in agent 'net_probe', method 'start_frame_listeners', failed nodes:
ID: 5 - Reason: Transport endpoint is not connected
, trace: ["/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/mclient.rb:114:in `check_results_with_retries'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/mclient.rb:60:in `method_missing'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:79:in `block in start_frame_listeners'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:72:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:72:in `start_frame_listeners'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/network.rb:40:in `check_network'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/astute-0.0.2/lib/astute/orchestrator.rb:157:in `verify_networks'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/dispatcher.rb:108:in `verify_networks'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:103:in `dispatch_message'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:70:in `block in dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `each_with_index'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:68:in `dispatch'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:38:in `block (2 levels) in server_loop'", "/opt/rbenv/versions/1.9.3-p392/lib/ruby/gems/1.9.1/gems/naily-0.1.0/lib/naily/server.rb:49:in `block (2 levels) in consume_one'"]
2014-01-23T16:53:10 info: [9064] Casting message to fuel: {"method"=>"verify_networks_resp", "args"=>{"task_uuid"=>"085e75e9-8d1c-4cc5-9756-a1e3aed2f30e", "status"=>"error", "error"=>"Error occurred while running method 'verify_networks'. Inspect Orchestrator logs for the details."}}

I've checked all 4 bootstrapped nodes and found processes like these ones on 2 of them:

root 715 0.1 0.1 86620 22872 ? Sl 16:52 0:03 /usr/bin/ruby /usr/sbin/mcollectived --pid=/var/run/mcol...

Read more...

Mike Scherbakov (mihgen) on 2014-02-11
Changed in fuel:
assignee: Fuel Library Team (fuel-library) → Roman Vyalov (r0mikiam)
Aleksandr Didenko (adidenko) wrote :

I've checked on 4.1 ISO builds 116+, Mcollective init script is the same on bootstrapped and provisioned CentOS nodes.

Roman Vyalov (r0mikiam) wrote :

Probe apply resolution for Centos https://mirantis.jira.com/browse/OSCI-1072

Changed in fuel:
status: Confirmed → Incomplete
Roman Vyalov (r0mikiam) wrote :
Changed in fuel:
assignee: Roman Vyalov (r0mikiam) → Dima Shulyak (dshulyak)
Dima Shulyak (dshulyak) wrote :

Was reproduced by restarting master node, and then running net verification

Changed in fuel:
milestone: 4.1 → 5.0
status: Incomplete → Confirmed
Matthew Mosesohn (raytrac3r) wrote :

What is the status of this bug? How much effort is it to fix?

Dima Shulyak (dshulyak) wrote :

Im moving it back to incomplete, i wasnt able to break rabbitmq client connection with restarting master or mcollective agent.

Changed in fuel:
status: Confirmed → Incomplete
Mike Scherbakov (mihgen) wrote :

There was no activity on this bug for a while. If you reproduce it again, please reopen.

Changed in fuel:
status: Incomplete → Invalid
Aviram Bar-Haim (aviramb) wrote :

Reproduced with 3 NICs over custom 4.1 ISO.
diagnostic snapshot attached.

Dima Shulyak (dshulyak) wrote :

Need to check for existing net_probe processes on each start_frame_listeners call, and kill if any

Changed in fuel:
status: Invalid → Confirmed
milestone: 5.0 → 5.1

Fix proposed to branch: master
Review: https://review.openstack.org/102251

Changed in fuel:
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/102251
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=d4edf32197c59495fd48662ede67f3751ee15d9b
Submitter: Jenkins
Branch: master

commit d4edf32197c59495fd48662ede67f3751ee15d9b
Author: Dima Shulyak <email address hidden>
Date: Tue Jun 24 17:21:23 2014 +0300

    Perform cleanup before starting net_probe.py

    If net_probe.py fails for unknown reason,
    daemon process will hang there forever and next
    net_probe.py run will fail with socket.bind error

    Change-Id: I1f5889d59dc60aa374fdf3f9dd8e3e5e7c0a115b
    Closes-Bug: #1259935

Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/103013
Committed: https://git.openstack.org/cgit/stackforge/fuel-astute/commit/?id=d5bebe745319c065752bfd891392c85aa94b5ccf
Submitter: Jenkins
Branch: stable/5.0

commit d5bebe745319c065752bfd891392c85aa94b5ccf
Author: Dima Shulyak <email address hidden>
Date: Tue Jun 24 17:21:23 2014 +0300

    Perform cleanup before starting net_probe.py

    If net_probe.py fails for unknown reason,
    daemon process will hang there forever and next
    net_probe.py run will fail with socket.bind error

    Change-Id: I1f5889d59dc60aa374fdf3f9dd8e3e5e7c0a115b
    Closes-Bug: #1259935
    (cherry picked from commit d4edf32197c59495fd48662ede67f3751ee15d9b)

Dmitry Tyzhnenko (dtyzhnenko) wrote :

Verified on ISO fuel-5.1-3

api: '1.0'
astute_sha: b622d9b36dbdd1e03b282b9ee5b7435ba649e711
auth_required: true
build_id: 2014-09-11_01-04-40
build_number: '3'
feature_groups:
- experimental
fuellib_sha: 6fc7ac9041894aa76b2e18d385149166e34f7b23
fuelmain_sha: d899675a5a393625f8166b29099d26f45d527035
nailgun_sha: 720e83bca37561fbc0452ad4e99f1f8cfe8e40cf
ostf_sha: 1de6ed1c0b72f6687ffb4bebc2c939b135a88e34
production: docker
release: '5.1'
release_versions:
  2014.1.1-5.1:
    VERSION:
      api: '1.0'
      astute_sha: b622d9b36dbdd1e03b282b9ee5b7435ba649e711
      build_id: 2014-09-11_01-04-40
      build_number: '3'
      feature_groups:
      - experimental
      fuellib_sha: 6fc7ac9041894aa76b2e18d385149166e34f7b23
      fuelmain_sha: d899675a5a393625f8166b29099d26f45d527035
      nailgun_sha: 720e83bca37561fbc0452ad4e99f1f8cfe8e40cf
      ostf_sha: 1de6ed1c0b72f6687ffb4bebc2c939b135a88e34
      production: docker
      release: '5.1'

Changed in fuel:
status: Fix Committed → Fix Released
Dmitry Pyzhov (dpyzhov) on 2015-03-30
tags: added: module-netcheck

Change abandoned by Vladimir Sharshov (<email address hidden>) on branch: stable/4.1
Review: https://review.openstack.org/103014
Reason: expired for 1 year

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers