nailgun-agent hangs and node show false positive offline with fuel node command

Bug #1510989 reported by JohnsonYi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Invalid
Medium
Fuel Python (Deprecated)

Bug Description

After issue command "fuel node", one ceph node was stated as "online False", but I can ssh to this node, and ceph health, ceph osd tree are all fine.

what I found is that the nailgun agent hangs:
root@node-4:/var/log# tail -f /var/log/nailgun-agent.log
I, [2015-10-27T09:11:01.898152 #21274] INFO -- : Trying to load agent config /etc/nailgun-agent/config.yaml
I, [2015-10-27T09:11:01.898631 #21274] INFO -- : Obtained service url from config file: 'http://10.14.20.2:8000/api'
I, [2015-10-27T09:11:03.156727 #21274] INFO -- : MCollective is up to date with identity = 4
I, [2015-10-27T09:11:03.157039 #21274] INFO -- : Wrote data to file '/etc/nailgun_uid'. Data: 4
I, [2015-10-27T09:12:17.431656 #21966] INFO -- : Trying to load agent config /etc/nailgun-agent/config.yaml
I, [2015-10-27T09:12:17.432275 #21966] INFO -- : Obtained service url from config file: 'http://10.14.20.2:8000/api'
I, [2015-10-27T09:12:18.744303 #21966] INFO -- : MCollective is up to date with identity = 4
I, [2015-10-27T09:12:18.744621 #21966] INFO -- : Wrote data to file '/etc/nailgun_uid'. Data: 4
I, [2015-10-27T09:13:04.994953 #22625] INFO -- : Trying to load agent config /etc/nailgun-agent/config.yaml
I, [2015-10-27T09:13:04.995538 #22625] INFO -- : Obtained service url from config file: 'http://10.14.20.2:8000/api'

and some XFS, I/O error on this ceph node:
root@node-4:/var/log# dmesg | grep error
[1062182.525661] XFS (nbd11): SB validate failed with error 22.
[1062182.839223] XFS (nbd11): SB validate failed with error 22.
[1062381.143688] XFS (nbd9): SB validate failed with error 22.
[1062381.455537] XFS (nbd9): SB validate failed with error 22.
[2646674.388851] end_request: I/O error, dev nbd1, sector 0
[2646674.528707] end_request: I/O error, dev nbd5, sector 0
[2646674.668831] end_request: I/O error, dev nbd6, sector 0
[2646674.810431] end_request: I/O error, dev nbd9, sector 0
[2646674.952581] end_request: I/O error, dev nbd10, sector 0
[2646675.096139] end_request: I/O error, dev nbd11, sector 0
[2646675.239707] end_request: I/O error, dev nbd12, sector 0

Tags: area-python
Revision history for this message
JohnsonYi (yichengli) wrote :

For more information, mcollective is working fine:
root@node-4:/var/log# tail -f /var/log/mcollective.log
D, [2015-10-28T15:31:24.516765 #8291] DEBUG -- : rabbitmq.rb:66:in `on_hbfire' Publishing heartbeat to stomp://mcollective@10.14.20.2:61613: send_fire, {:curt=>1446046284.5165255, :last_sleep=>30.499642848968506}
D, [2015-10-28T15:31:47.750883 #8291] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat from stomp://mcollective@10.14.20.2:61613: receive_fire, {:curt=>1446046307.7507}
D, [2015-10-28T15:31:55.016810 #8291] DEBUG -- : rabbitmq.rb:66:in `on_hbfire' Publishing heartbeat to stomp://mcollective@10.14.20.2:61613: send_fire, {:curt=>1446046315.016635, :last_sleep=>30.49959111213684}
D, [2015-10-28T15:32:17.251278 #8291] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat from stomp://mcollective@10.14.20.2:61613: receive_fire, {:curt=>1446046337.2510948}
D, [2015-10-28T15:32:25.516946 #8291] DEBUG -- : rabbitmq.rb:66:in `on_hbfire' Publishing heartbeat to stomp://mcollective@10.14.20.2:61613: send_fire, {:curt=>1446046345.5167692, :last_sleep=>30.499600172042847}
D, [2015-10-28T15:32:46.751646 #8291] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat from stomp://mcollective@10.14.20.2:61613: receive_fire, {:curt=>1446046366.7514794}
D, [2015-10-28T15:32:56.017065 #8291] DEBUG -- : rabbitmq.rb:66:in `on_hbfire' Publishing heartbeat to stomp://mcollective@10.14.20.2:61613: send_fire, {:curt=>1446046376.0168877, :last_sleep=>30.499657154083252}
D, [2015-10-28T15:33:16.252112 #8291] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat from stomp://mcollective@10.14.20.2:61613: receive_fire, {:curt=>1446046396.2519305}
D, [2015-10-28T15:33:26.517184 #8291] DEBUG -- : rabbitmq.rb:66:in `on_hbfire' Publishing heartbeat to stomp://mcollective@10.14.20.2:61613: send_fire, {:curt=>1446046406.5170097, :last_sleep=>30.499665021896362}
D, [2015-10-28T15:33:45.752556 #8291] DEBUG -- : rabbitmq.rb:64:in `on_hbfire' Received heartbeat from stomp://mcollective@10.14.20.2:61613: receive_fire, {:curt=>1446046425.752374}
D, [2015-10-28T15:33:57.017319 #8291] DEBUG -- : rabbitmq.rb:66:in `on_hbfire' Publishing heartbeat to stomp://mcollective@10.14.20.2:61613: send_fire, {:curt=>1446046437.0171432, :last_sleep=>30.49962067604065}

Revision history for this message
JohnsonYi (yichengli) wrote :

Fuel 6.1

Revision history for this message
JohnsonYi (yichengli) wrote :

Error msg from docker-nailgun.log on Fuel node.
2015-10-30 02:46:36,281 DEBG 'oswl_tenant_collectord' stdout output:^M
2015-10-30 02:46:36.280 INFO [7f21a46a9700] (utils) Deleting set http_proxy environment variable^M
^M
2015-10-30 02:46:44,052 DEBG 'oswl_flavor_collectord' stdout output:^M
2015-10-30 02:46:44.051 INFO [7f839268d700] (utils) Deleting set http_proxy environment variable^M
^M
2015-10-30 02:47:12,812 DEBG 'oswl_image_collectord' stdout output:^M
2015-10-30 02:47:12.810 INFO [7f5fb0176700] (utils) Deleting set http_proxy environment variable^M
^M
2015-10-30 02:47:21,958 DEBG 'oswl_volume_collectord' stdout output:^M
2015-10-30 02:47:21.958 INFO [7f8b62a8d700] (utils) Deleting set http_proxy environment variable^M
^M
2015-10-30 02:47:58,715 DEBG 'oswl_vm_collectord' stdout output:^M
2015-10-30 02:47:58.711 ERROR [7f1390385700] (utils) Error while talking to proxy. Details: string indices must be integers^M
Traceback (most recent call last):^M
  File "/usr/lib/python2.6/site-packages/nailgun/statistics/utils.py", line 127, in set_proxy^M
    yield^M
  File "/usr/lib/python2.6/site-packages/nailgun/statistics/oswl/collector.py", line 65, in collect^M
    client_provider, resource_type)^M
  File "/usr/lib/python2.6/site-packages/nailgun/statistics/oswl/helpers.py", line 180, in get_info_from_os_resource_manager^M
    additional_display_options^M
  File "/usr/lib/python2.6/site-packages/nailgun/statistics/oswl/helpers.py", line 202, in _get_data_from_resource_manager^M
    rule.path, rule.transform_func, obj_dict^M
  File "/usr/lib/python2.6/site-packages/nailgun/statistics/utils.py", line 84, in get_attr_value^M
    attrs = attrs[p]^M
TypeError: string indices must be integers^M
^M
2015-10-30 02:47:58,718 DEBG 'oswl_vm_collectord' stdout output:^M
2015-10-30 02:47:58.714 INFO [7f1390385700] (utils) Deleting set http_proxy environment variable^M

Revision history for this message
Maciej Relewicz (rlu) wrote :

please attach diagnostic snapshot of your environment

Changed in fuel:
status: New → Incomplete
Dmitry Klenov (dklenov)
tags: added: area-python
Changed in fuel:
milestone: none → 8.0
assignee: nobody → Fuel Python Team (fuel-python)
importance: Undecided → Medium
Revision history for this message
Dmitry Klenov (dklenov) wrote :

@JohnsonYi,

We still need diagnostic snapshot for this issue. Can you please attach the snapshot?

Revision history for this message
JohnsonYi (yichengli) wrote :

@Dmitry,

Sorry for neglect to submit the snapshot, the environment had been upgraded to Kilo, I am not sure if I can still submit the snapshot?

Revision history for this message
Roman Prykhodchenko (romcheg) wrote :

Marking as Invalid since it was Incomplete for longer than 3 weeks. If you find a proper snapshot. Please submit it and set a proper status for this bug.

Changed in fuel:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.