ceilometer-agent-ipmi fails to start

Bug #1746736 reported by Dmitriy Rabotyagov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
Medium
gordon chung

Bug Description

Hi,

I'd tried to launch ceilometer ipmi agent on the nova computing node, but it fails with the following error:

======================================================================
2018-02-01 14:49:43.161 19797 ERROR ceilometer.agent.manager [-] Skip loading extension for hardware.ipmi.node.temperature: ExtensionLoadError
2018-02-01 14:49:43.161 19797 ERROR ceilometer.agent.manager Traceback (most recent call last):
2018-02-01 14:49:43.161 19797 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/stevedore/extension.py", line 186, in _load_plugins
2018-02-01 14:49:43.161 19797 ERROR ceilometer.agent.manager verify_requirements,
2018-02-01 14:49:43.161 19797 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/stevedore/extension.py", line 218, in _load_one_plugin
2018-02-01 14:49:43.161 19797 ERROR ceilometer.agent.manager obj = plugin(*invoke_args, **invoke_kwds)
2018-02-01 14:49:43.161 19797 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/ceilometer/agent/plugin_base.py", line 169, in __init__
2018-02-01 14:49:43.161 19797 ERROR ceilometer.agent.manager raise ExtensionLoadError(err)
2018-02-01 14:49:43.161 19797 ERROR ceilometer.agent.manager ExtensionLoadError
2018-02-01 14:49:43.161 19797 ERROR ceilometer.agent.manager
2018-02-01 14:49:43.168 18883 INFO cotyledon._service_manager [-] Child 19797 exited with status 1
2018-02-01 14:49:43.169 18883 INFO cotyledon._service_manager [-] Forking too fast, sleeping
2018-02-01 14:49:43.206 18883 INFO cotyledon._service_manager [-] Caught SIGTERM signal, graceful exiting of master process
======================================================================

This error message raises for every metric within hardware.ipmi.node.*
I tried to upgrade ceilometer to the latest developer version (9.0.5) from 9.0.2 but it didn't resolved the problem.

Now the following versions are installed:

======================================================================
ii ceilometer-agent-compute 1:9.0.5.dev4.201801152158.xenial-0ubuntu1 all ceilometer compute agent
ii ceilometer-agent-ipmi 1:9.0.5.dev4.201801152158.xenial-0ubuntu1 all ceilometer ipmi agent
ii ceilometer-common 1:9.0.5.dev4.201801152158.xenial-0ubuntu1 all ceilometer common files
ii python-ceilometer 1:9.0.5.dev4.201801152158.xenial-0ubuntu1 all ceilometer python libraries
ii python-ceilometerclient 2.9.0-0ubuntu1~cloud0 all Client library for Openstack Ceilometer API server - Python 2.7
======================================================================

In ceilometer.conf in ipmi section only basic options are specified.
======================================================================
[ipmi]

#
# From ceilometer
#

# Number of retries upon Intel Node Manager initialization failure (integer
# value)
node_manager_init_retry = 3

# Tolerance of IPMI/NM polling failures before disable this pollster. Negative
# indicates retrying forever. (integer value)
polling_retry = 3
======================================================================

And ipmitool is installed and working correctly:

======================================================================
root@cl-comp-node01:~# python
Python 2.7.12 (default, Nov 20 2017, 18:23:56)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from ceilometer.ipmi.platform import ipmi_sensor
>>> sensor = ipmi_sensor.IPMISensor()
>>> sensor.check_ipmi()
True
>>> sensor.read_sensor_any('Current')
{'Current': {'System Level (0x98)': {'Status': 'ok', 'Deassertions Enabled': 'unc+ ucr+', 'Sensor Reading': '196 (+/- 0) Watts', 'Entity ID': '7.1 (System Board)', 'Assertions Enabled': 'unc+ ucr+', 'Positive Hysteresis': 'Unspecified', 'Assertion Events': '', 'Upper non-critical': '917.000', 'Event Message Control': 'Per-threshold', 'Normal Maximum': '336.000', 'Maximum sensor range': '1778.000', 'Sensor Type (Threshold)': 'Current (0x03)', 'Readable Thresholds': 'unc ucr', 'Negative Hysteresis': 'Unspecified', 'Upper critical': '966.000', 'Sensor ID': 'System Level (0x98)', 'Settable Thresholds': '', 'Minimum sensor range': 'Unspecified', 'Nominal Reading': '329.000'}, 'Current (0x95)': {'Status': 'ok', 'Sensor Reading': '0.400 (+/- 0) Amps', 'Entity ID': '10.2 (Power Supply)', 'Assertions Enabled': '', 'Positive Hysteresis': 'Unspecified', 'Assertion Events': '', 'Event Message Control': 'Per-threshold', 'Normal Maximum': '0.000', 'Sensor Type (Threshold)': 'Current (0x03)', 'Readable Thresholds': 'No Thresholds', 'Negative Hysteresis': 'Unspecified', 'Maximum sensor range': 'Unspecified', 'Sensor ID': 'Current (0x95)', 'Settable Thresholds': 'No Thresholds', 'Minimum sensor range': 'Unspecified', 'Nominal Reading': '0.000'}, 'Current (0x94)': {'Status': 'ok', 'Sensor Reading': '0.480 (+/- 0) Amps', 'Entity ID': '10.1 (Power Supply)', 'Assertions Enabled': '', 'Positive Hysteresis': 'Unspecified', 'Assertion Events': '', 'Event Message Control': 'Per-threshold', 'Normal Maximum': '0.000', 'Sensor Type (Threshold)': 'Current (0x03)', 'Readable Thresholds': 'No Thresholds', 'Negative Hysteresis': 'Unspecified', 'Maximum sensor range': 'Unspecified', 'Sensor ID': 'Current (0x94)', 'Settable Thresholds': 'No Thresholds', 'Minimum sensor range': 'Unspecified', 'Nominal Reading': '0.000'}}}
>>>
======================================================================

Thanks in advance.

description: updated
gordon chung (chungg)
Changed in ceilometer:
status: New → Triaged
importance: Undecided → Medium
importance: Medium → High
Revision history for this message
gordon chung (chungg) wrote :

i don't have this enabled in my environment but ipmi_sensor is not the error you pasted. that would be the pollster attempting to query NodeManager.

can you try against that? i'm guessing it's not detecting nodemanager version correctly.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Hi,

We actually doesn't have Intel NodeManager on the node. And it seems, that /ceilometer/ipmi/platform/intel_node_manager.py correctly detects, that NodeManager is not available:

>>> from ceilometer.ipmi.platform import intel_node_manager
>>> manager = intel_node_manager.NodeManager(config)
>>> manager.node_manager_version()
0
>>> manager.check_node_manager()
0
>>>

Is NodeManager required for agent-ipmi to work and ipmitool is not enough?

Revision history for this message
gordon chung (chungg) wrote :

thanks for confirming.

in truth, i don't think it should be raising an error but rather just log silently and skip over it.

can you selectively choose only the sensor metrics?

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

I'm actually new in openstack, maybe you can give me a clue, where the requested method is placed, that I could try to reproduce the selection?

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

If you mean, to select metrics via ipmitool from ipmi_sensor, then yes, and I've provided the output in the beginning of the bug. Otherwise I actually do not understand quite straight what you meant:(

Revision history for this message
gordon chung (chungg) wrote :

in polling.yaml file you can selectively choose what metrics get generated. https://docs.openstack.org/ceilometer/latest/admin/telemetry-data-collection.html#polling

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Oh, yep, I've thought about that, so I've specified only 1 metric for test. My polling.yaml looks like that now:
---
sources:
    - name: node
      interval: 60
      meters:
        - compute.node.cpu.frequency
        - compute.node.cpu.kernel.percent
        - compute.node.cpu.idle.percent
        - compute.node.cpu.user.percent
        - compute.node.cpu.iowait.percent

    - name: ipmi
      interval: 600
      meters:
        - hardware.ipmi.node.temperature

However, for some reason in error.log, after starting ceilometer-agent-ipmi service I see exceptions, related to not included in polling.yaml checks, like hardware.ipmi.fan, hardware.ipmi.node.mem_util, hardware.ipmi.node.io_util, etc.
So probably, it ignores polling.yaml for some reason.

Revision history for this message
gordon chung (chungg) wrote :

what are the exceptions?

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :
Download full text (3.4 KiB)

The same, that I've posted at the begging of the bug:

2018-02-05 15:30:43.903 17195 ERROR ceilometer.agent.manager [-] Skip loading extension for hardware.ipmi.node.mem_util: ExtensionLoadError
2018-02-05 15:30:43.903 17195 ERROR ceilometer.agent.manager Traceback (most recent call last):
2018-02-05 15:30:43.903 17195 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/stevedore/extension.py", line 186, in _load_plugins
2018-02-05 15:30:43.903 17195 ERROR ceilometer.agent.manager verify_requirements,
2018-02-05 15:30:43.903 17195 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/stevedore/extension.py", line 218, in _load_one_plugin
2018-02-05 15:30:43.903 17195 ERROR ceilometer.agent.manager obj = plugin(*invoke_args, **invoke_kwds)
2018-02-05 15:30:43.903 17195 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/ceilometer/agent/plugin_base.py", line 169, in __init__
2018-02-05 15:30:43.903 17195 ERROR ceilometer.agent.manager raise ExtensionLoadError(err)
2018-02-05 15:30:43.903 17195 ERROR ceilometer.agent.manager ExtensionLoadError
2018-02-05 15:30:43.903 17195 ERROR ceilometer.agent.manager
2018-02-05 15:30:43.932 17195 ERROR ceilometer.agent.manager [-] Skip loading extension for hardware.ipmi.node.io_util: ExtensionLoadError
2018-02-05 15:30:43.932 17195 ERROR ceilometer.agent.manager Traceback (most recent call last):
2018-02-05 15:30:43.932 17195 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/stevedore/extension.py", line 186, in _load_plugins
2018-02-05 15:30:43.932 17195 ERROR ceilometer.agent.manager verify_requirements,
2018-02-05 15:30:43.932 17195 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/stevedore/extension.py", line 218, in _load_one_plugin
2018-02-05 15:30:43.932 17195 ERROR ceilometer.agent.manager obj = plugin(*invoke_args, **invoke_kwds)
2018-02-05 15:30:43.932 17195 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/ceilometer/agent/plugin_base.py", line 169, in __init__
2018-02-05 15:30:43.932 17195 ERROR ceilometer.agent.manager raise ExtensionLoadError(err)
2018-02-05 15:30:43.932 17195 ERROR ceilometer.agent.manager ExtensionLoadError
2018-02-05 15:30:43.932 17195 ERROR ceilometer.agent.manager
2018-02-05 15:30:43.960 17195 ERROR ceilometer.agent.manager [-] Skip loading extension for hardware.ipmi.node.temperature: ExtensionLoadError
2018-02-05 15:30:43.960 17195 ERROR ceilometer.agent.manager Traceback (most recent call last):
2018-02-05 15:30:43.960 17195 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/stevedore/extension.py", line 186, in _load_plugins
2018-02-05 15:30:43.960 17195 ERROR ceilometer.agent.manager verify_requirements,
2018-02-05 15:30:43.960 17195 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/stevedore/extension.py", line 218, in _load_one_plugin
2018-02-05 15:30:43.960 17195 ERROR ceilometer.agent.manager obj = plugin(*invoke_args, **invoke_kwds)
2018-02-05 15:30:43.960 17195 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/ceilometer/agent/plugin_base.py...

Read more...

Revision history for this message
gordon chung (chungg) wrote :

does it still fail to start? or does it just start but just logs errors?

when you start service, it should log the polling.yaml file it's loaded with "Config File: <polling.yaml>" [1]

[1] https://github.com/openstack/ceilometer/blob/stable/pike/ceilometer/pipeline.py#L670

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

It still fails with the same error. When I was submitting the bug, the polling.yaml had been already set as provided.

I don't have the following message in the log. So it probably fails before this line. Here's the beggining of /var/log/ceilometer/ceilometer-agent-ipmi.log once service starts:

2018-02-05 20:22:48.612 13372 WARNING oslo_reports.guru_meditation_report [-] Guru meditation now registers SIGUSR1 and SIGUSR2 by default for backward compatibility. SIGUSR1 will no longer be registered in a future release, so please use SIGUSR2 to generate reports.
2018-02-05 20:22:48.615 13372 WARNING oslo_config.cfg [-] Option "meter_dispatchers" from group "DEFAULT" is deprecated for removal (This option only be used in collector service, the collector service has been deprecated and will be removed in the future, this should also be deprecated for removal with collector service.). Its value may be silently ignored in the future.
2018-02-05 20:22:48.660 13614 ERROR ceilometer.agent.manager [-] Skip loading extension for hardware.ipmi.voltage: ExtensionLoadError
2018-02-05 20:22:48.660 13614 ERROR ceilometer.agent.manager Traceback (most recent call last):
2018-02-05 20:22:48.660 13614 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/stevedore/extension.py", line 186, in _load_plugins
2018-02-05 20:22:48.660 13614 ERROR ceilometer.agent.manager verify_requirements,
2018-02-05 20:22:48.660 13614 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/stevedore/extension.py", line 218, in _load_one_plugin
2018-02-05 20:22:48.660 13614 ERROR ceilometer.agent.manager obj = plugin(*invoke_args, **invoke_kwds)
2018-02-05 20:22:48.660 13614 ERROR ceilometer.agent.manager File "/usr/lib/python2.7/dist-packages/ceilometer/agent/plugin_base.py", line 169, in __init__
2018-02-05 20:22:48.660 13614 ERROR ceilometer.agent.manager raise ExtensionLoadError(err)
2018-02-05 20:22:48.660 13614 ERROR ceilometer.agent.manager ExtensionLoadError
2018-02-05 20:22:48.660 13614 ERROR ceilometer.agent.manager

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Ok, I guess, that the problem is that ceilometer-agent-ipmi is launched from ceilometer user, and in order to get ipmi info, user should have root privileges. But I actually don't know, how to do in in a correct way. Can you help me with that?

root@cl-comp-node01:~# python
Python 2.7.12 (default, Nov 20 2017, 18:23:56)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from ceilometer.ipmi.platform import ipmi_sensor
>>> sensor = ipmi_sensor.IPMISensor()
>>>
root@cl-comp-node01:~# su ceilometer
ceilometer@cl-comp-node01:/home/techs$ python
Python 2.7.12 (default, Nov 20 2017, 18:23:56)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from ceilometer.ipmi.platform import ipmi_sensor
>>> sensor = ipmi_sensor.IPMISensor()
[sudo] password for ceilometer:

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

I've tried to set following line in /etc/sudoers:

ceilometer ALL=NOPASSWD: CEILOMETER
Cmnd_Alias CEILOMETER = /usr/bin/ipmitool, \
                        /dev/ipmi0

But this line works and agent start:
ceilometer ALL=(ALL) NOPASSWD:ALL

So the question is - what else except ipmitool is required for ceilometer ipmi to work correctly?

Revision history for this message
gordon chung (chungg) wrote :

hmm.. ok, i think the code unfortunately loads all extensions before even looking at polling.yaml. i'll try to address this.

if i understand correctly, the ipmi agent should already run as root. we use rootwrap utility to execute commands with elevated privileges.

Revision history for this message
gordon chung (chungg) wrote :

is ERROR ceilometer.agent.manager ExtensionLoadError the last log you see before the the service stops running? can you turn on debug=True in ceilomter.conf

the code should really keep going even with those errors.

Revision history for this message
gordon chung (chungg) wrote :

you need something like https://docs.openstack.org/nova/pike/admin/root-wrap-reference.html to setup rootwrap

you can also look at devstack configure_rootwrap command.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Ok, I've configured rootwrap for ceilometer and it works as expected. Actually, I haven't configured rootwrap for any other service yet, so it wasn't actually obvious that rootwrap is required.

But it seems, that the pollers still don't work, as I don't see in log pollers launch (as for agent-compute).

I've enabled debug in ceilometer.conf and attached the output.

PS: I've changed a bit polling.yaml and it looks like this:
---
sources:
    - name: instance
      interval: 300
      meters:
        - cpu
        - memory.usage
        #- network.incoming.bytes
        #- network.incoming.packets
        #- network.outgoing.bytes
        #- network.outgoing.packets
        - disk.read.bytes
        - disk.read.requests
        - disk.write.bytes
        - disk.write.requests
        - hardware.cpu.util
        - hardware.memory.used
        - hardware.memory.total
        - hardware.memory.buffer
        - hardware.memory.cached
        - hardware.memory.swap.avail
        - hardware.memory.swap.total
        - hardware.system_stats.io.outgoing.blocks
        - hardware.system_stats.io.incoming.blocks
        - hardware.network.ip.incoming.datagrams
        - hardware.network.ip.outgoing.datagrams

    - name: node
      interval: 60
      meters:
        - compute.node.cpu.frequency
        - compute.node.cpu.kernel.percent
        - compute.node.cpu.idle.percent
        - compute.node.cpu.user.percent
        - compute.node.cpu.iowait.percent

    - name: ipmi
      interval: 600
      meters:
        - hardware.ipmi.node.temperature
        - hardware.ipmi.node.power

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Debug messages are pretty strange:

2018-02-06 12:56:56.794 14897 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): sudo ceilometer-rootwrap /etc/ceilometer/rootwrap.conf ipmitool sdr info execute /usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:355
2018-02-06 12:56:56.934 14897 DEBUG oslo_concurrency.processutils [-] CMD "sudo ceilometer-rootwrap /etc/ceilometer/rootwrap.conf ipmitool sdr info" returned: 0 in 0.140s execute /usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:385

But I don't see any problems in executing mentioned CMD neither via shell nor with communicate module:

ceilometer@cl-comp-node01:/home/techs$ sudo ceilometer-rootwrap /etc/ceilometer/rootwrap.conf ipmitool sdr info
SDR Version : 0x51
Record Count : 125
Free Space : 1342 bytes
Most recent Addition : 02/07/2106 06:28:15
Most recent Erase : 02/07/2106 06:28:15
SDR overflow : no
SDR Repository Update Support : modal
Delete SDR supported : no
Partial Add SDR supported : no
Reserve SDR repository supported : yes
SDR Repository Alloc info supported : no
ceilometer@cl-comp-node01:/home/techs$ python
Python 2.7.12 (default, Nov 20 2017, 18:23:56)
[GCC 5.4.0 20160609] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from subprocess import Popen, PIPE
>>> obj = Popen(["sudo", "ceilometer-rootwrap", "/etc/ceilometer/rootwrap.conf", "ipmitool", "sdr", "info"], stdout=PIPE)
>>> obj_res = obj.communicate()
>>> print(obj_res[0])
SDR Version : 0x51
Record Count : 125
Free Space : 1342 bytes
Most recent Addition : 02/07/2106 06:28:15
Most recent Erase : 02/07/2106 06:28:15
SDR overflow : no
SDR Repository Update Support : modal
Delete SDR supported : no
Partial Add SDR supported : no
Reserve SDR repository supported : yes
SDR Repository Alloc info supported : no

>>>

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

My fault - debug messages are correct, as they print a status code for the launched command. As it finishes without error, it "returns" 0.

And I haven't noticed any launch of ipmi pollster since service start. So it seems, that ipmi agent daemon just launched but it's doing nothing.
So the problem, that service is not starting has been resolved by configuring rootwrap, but it is still not working.

Revision history for this message
gordon chung (chungg) wrote :

just fyi, the compute.node.* meters aren't polled, you need to enable compute monitors in nova and it will send notificaitons periodically: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html

can you add log messages to https://github.com/openstack/ceilometer/blob/stable/pike/ceilometer/ipmi/pollsters/sensor.py#L99

if something is logged here and you see nothing in db, then it's because there's something wrong with notification agent (probably incorrect pipeline.yaml)

also, if we get this working, it'd be great if we document how to enable this stuff. the original maintainer of this code has moved on and took all knowledge with him

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :
Download full text (3.9 KiB)

Yep, I've already get compute.node.* working but had just forgotten to remove it from polling.yaml afterwards. It is removed now, thanks for the tip.

I've added to ceilometer/ipmi/pollsters/sensor.py on line 99 following code:
with open("/var/log/ceilometer/debug-ipmi.log", "a") as log:
    log.write("sensor_id: {0} name: {1}\n".format(resource_id, self.METRIC.lower()))

So it is supposed to log smth to my debug-ipmi.log (which is owned by ceilometer). However it's empty. So it seems, that this block of code is not launched.

In pipeline.yaml I expect metrics to be gathered by this rule:
sources:
    - name: meter_source
      meters:
          - "*"
      sinks:
          - meter_sink
sinks:
    - name: meter_sink
      transformers:
      publishers:
          - gnocchi://

It seems, that for some reason on agent start it tries to use stevedore, but I don't how and for what reason it is used. I've pasted the following after the line https://github.com/openstack/stevedore/blob/stable/pike/stevedore/extension.py#L218 :

with open("/var/log/ceilometer/debug-ipmi.log", "a") as log:
    log.write("plugin: {0}, args: {1}, kwargs: {2} \n".format(plugin, invoke_args, invoke_kwds))

and got the following on ipmi agent start:
plugin: <class 'ceilometer.ipmi.pollsters.sensor.VoltageSensorPollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.node.PowerPollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.node.CUPSIndexPollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.sensor.CurrentSensorPollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.node.CPUUtilPollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.node.OutletTemperaturePollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.node.AirflowPollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.sensor.TemperatureSensorPollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.sensor.FanSensorPollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.node.MemUtilPollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.node.IOUtilPollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.ipmi.pollsters.node.InletTemperaturePollster'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'ceilometer.agent.discovery.localnode.LocalNodeDiscovery'>, args: (<oslo_config.cfg.ConfigOpts object at 0x7faddebbff10>,), kwargs: {}
plugin: <class 'oslo_messaging._drivers.impl_rabbit.Rab...

Read more...

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Actually, I realized, that I wrote a bit wrong try/except. All in all I see all required objects in stevedore:

object: <ceilometer.ipmi.pollsters.sensor.VoltageSensorPollster object at 0x7f0c7eb24990>
object: <ceilometer.ipmi.pollsters.sensor.CurrentSensorPollster object at 0x7f0c7e272250>
object: <ceilometer.ipmi.pollsters.sensor.TemperatureSensorPollster object at 0x7f0c7e27d2d0>
object: <ceilometer.ipmi.pollsters.sensor.FanSensorPollster object at 0x7f0c7e27d410>
object: <ceilometer.agent.discovery.localnode.LocalNodeDiscovery object at 0x7f0c7e27d5d0>
object: <oslo_messaging._drivers.impl_rabbit.RabbitDriver object at 0x7f0c7e27d790>
object: <oslo_messaging.notify.messaging.MessagingV2Driver object at 0x7f0c7eb24e50>

So you was right, that these errors in error log shouldn't influence in this way

Revision history for this message
gordon chung (chungg) wrote :

do you see info level logs starting with "Polling pollster"? specifically for ipmi?

you can also set up logs here: https://github.com/openstack/ceilometer/blob/stable/pike/ceilometer/agent/manager.py#L371-L382

this is where the polling tasks are created.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

No, there are no such statements for in agent-ipmi log. While I see regular messages for agent-compute.

So the results of debugging setup_polling_tasks() method is the following:
First of all I received polling_tasks value at agent-ipmi start. It resulted in empty dict, while on agent-compute it results in {interval: ceilometer.agent.manager.PollingTask} object.

pollster is a list of retrieved via stevedore extensions. so list of pollster.name is the following:
['hardware.ipmi.voltage', 'hardware.ipmi.current', 'hardware.ipmi.temperature', 'hardware.ipmi.fan']

However, source.support_meter() returns False for every object, so if is not passed and polling_task is not created.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

So, according to the list of pollster.name (and documentation), these extensions should be valid for bare metal service, but it is not used in our case.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Sorry, for posting small updates, but I thought, that this may be important.

It seems, that I've managed to launch pollster. I tried to replace hardware.ipmi.node.temperature with hardware.ipmi.temperature in my /etc/ceilometer/polling.yaml.

This result I've logged with hardware.ipmi.node.temperature:
======================================================
pollster.name: hardware.ipmi.voltage, source.support_meter(pollster.name): False, source: instance
pollster.name: hardware.ipmi.current, source.support_meter(pollster.name): False, source: instance
pollster.name: hardware.ipmi.temperature, source.support_meter(pollster.name): False, source: instance
pollster.name: hardware.ipmi.fan, source.support_meter(pollster.name): False, source: instance
pollster.name: hardware.ipmi.voltage, source.support_meter(pollster.name): False, source: ipmi
pollster.name: hardware.ipmi.current, source.support_meter(pollster.name): False, source: ipmi
pollster.name: hardware.ipmi.temperature, source.support_meter(pollster.name): False, source: ipmi
pollster.name: hardware.ipmi.fan, source.support_meter(pollster.name): False, source: ipmi
polling_tasks: {}
======================================================

This result I've logged with hardware.ipmi.temperature:

======================================================
pollster.name: hardware.ipmi.voltage, source.support_meter(pollster.name): False, source: instance
pollster.name: hardware.ipmi.current, source.support_meter(pollster.name): False, source: instance
pollster.name: hardware.ipmi.temperature, source.support_meter(pollster.name): False, source: instance
pollster.name: hardware.ipmi.fan, source.support_meter(pollster.name): False, source: instance
pollster.name: hardware.ipmi.voltage, source.support_meter(pollster.name): False, source: ipmi
pollster.name: hardware.ipmi.current, source.support_meter(pollster.name): False, source: ipmi
pollster.name: hardware.ipmi.temperature, source.support_meter(pollster.name): True, source: ipmi
pollster.name: hardware.ipmi.fan, source.support_meter(pollster.name): False, source: ipmi
polling_tasks: {600: <ceilometer.agent.manager.PollingTask object at 0x7f1c10029050>}
======================================================

And in 20 minutes I've get following in log:
2018-02-08 18:04:21.558 23534 INFO ceilometer.agent.manager [-] Polling pollster hardware.ipmi.temperature in the context of ipmi
2018-02-08 18:04:21.559 23534 DEBUG oslo_concurrency.processutils [-] Running cmd (subprocess): sudo ceilometer-rootwrap /etc/ceilometer/rootwrap.conf ipmitool sdr -v type Temperature execute /usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:355
2018-02-08 18:04:23.930 23534 DEBUG oslo_concurrency.processutils [-] CMD "sudo ceilometer-rootwrap /etc/ceilometer/rootwrap.conf ipmitool sdr -v type Temperature" returned: 0 in 2.372s execute /usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:385

And it's a bit strange, as in the documentation it is stated, that hardware.ipmi.temperature is used by ironic and it uses notification - not polling.

Revision history for this message
gordon chung (chungg) wrote :

sigh. i'm so sorry i didn't notice it. hardware.ipmi.temperature (without node) is correct.

the actual loading of pollsters are taken from: https://github.com/openstack/ceilometer/blob/ffc9ee99c10ede988769907fdb0594a512c890cd/setup.cfg#L129-L140 which you'll notice don't use 'node' for sensor data

it seems the value being returned is wrong? or is 0 the anticipated value of Temperature?

can you identify the places where we need to improve docs?
- how to set up ipmi with rootwrap
- measurements name is wrong?

Revision history for this message
gordon chung (chungg) wrote :

ok it's wrong in gnocch_resources.yaml mappings too.

Revision history for this message
gordon chung (chungg) wrote :

oh wait. there's both nodemanager and sensor metrics that capture temperature.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

There is no installation guide for ceilometer-agent-ipmi at all, so it would be nice to create one and specify info about rootwrap configuration in it.

Returning 0 is not retrieved data, but it logs exit code of the command. As it exits without error - that it would be 0. It dissapointed me as well:)

Info about valid measurements were specified here: https://docs.openstack.org/ceilometer/pike/admin/telemetry-measurements.html#ipmi-based-meters

And yes, I didn't received data into gnocchi, and I supposed it's due to wrong conf in meters.d but I didn't had time to check that.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to ceilometer (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/542949

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

I meant, that exit code instead of output confused me, not disappointed - sorry for this "typo")

Revision history for this message
gordon chung (chungg) wrote :

you didn't get data because we missed telling gnocchi to add it. i've added it in https://review.openstack.org/542949 but you'll need to run ceilometer-upgrade

could you help by adding instructions on how to setup ipmi agent with rootwrap?

i also see measurements page needs to be updated to say not just notifications but also pollsters for certain ipmi meters.

now we need to see why it's not retrieving data.

Revision history for this message
gordon chung (chungg) wrote :

oh. ok. so it 'works', the 0 is an error code and not the value. i agree. that's a dumb log.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

It is retrieving data.

2018-02-08 18:04:23.930 23534 DEBUG oslo_concurrency.processutils [-] CMD "sudo ceilometer-rootwrap /etc/ceilometer/rootwrap.conf ipmitool sdr -v type Temperature" returned: 0 in 2.372s execute /usr/lib/python2.7/dist-packages/oslo_concurrency/processutils.py:385

"returned: 0" means, that CMD exited with code 0. DEBUG logs not the result of the command (or retrieved data), but the exit code, which confuses. It is described here https://github.com/openstack/oslo.concurrency/blob/stable/pike/oslo_concurrency/processutils.py#L401-L404

So data should be on it's place. I will try to apply patch and will let you know about the result.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Installation seems pretty simple afterwards:

1. Install ceilometer-agent-ipmi
apt install ceilometer-agent-ipmi

2. Add to /etc/sudoers the line:
ceilometer ALL = (root) NOPASSWD: /usr/bin/ceilometer-rootwrap /etc/ceilometer/rootwrap.conf *

3. Add metrics to polling.yaml
- name: ipmi
  interval: 300
  meters:
    - hardware.ipmi.temperature
    - ...
4. Start service
systemctl start ceilometer-agent-ipmi

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Sorry for the stupid question:
This patch should be applied either for the central agent or on the side of ipmi agent?

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Don't mind. I've applied changes on the controller node and ran ceilometer-upgrade and restarted ceilometer-agent-central and ceilometer-agent-notification.

I've received ipmi_sensor in gnocchi and get measurement for hardware.ipmi.temperature:
# gnocchi measures show 84a27198-3cee-478a-a85c-e55310939c69
+---------------------------+-------------+-------+
| timestamp | granularity | value |
+---------------------------+-------------+-------+
| 2018-02-09T20:00:00+02:00 | 3600.0 | 17.0 |
+---------------------------+-------------+-------+

So thank you very much for your participation and fast revolvement of the bug.

Revision history for this message
Dmitriy Rabotyagov (noonedeadpunk) wrote :

Missed output:
+-----------------------+-------------------------------------------------------------------+
| Field | Value |
+-----------------------+-------------------------------------------------------------------+
| created_by_project_id | 82daa8818ff34f9eb9b1db71235b3733 |
| created_by_user_id | d28a5b46a2624ea7b50297dc5c98b126 |
| creator | d28a5b46a2624ea7b50297dc5c98b126:82daa8818ff34f9eb9b1db71235b3733 |
| ended_at | None |
| id | cdd52934-c99f-5e5c-90df-8bf8761f6046 |
| metrics | hardware.ipmi.current: f7a08d9a-b1d3-44e3-b72f-1ecfd9bd504a |
| | hardware.ipmi.power: 734fda4b-5461-4669-b255-538fb907eae5 |
| | hardware.ipmi.temperature: 84a27198-3cee-478a-a85c-e55310939c69 |
| | hardware.ipmi.voltage: 065aec41-014f-4dd3-b139-25dfe0c4dedb |
| original_resource_id | cl-comp-node01-ambient_temp_(0xe) |
| project_id | None |
| revision_end | None |
| revision_start | 2018-02-09T18:34:23.320098+00:00 |
| started_at | 2018-02-09T18:34:23.320079+00:00 |
| type | ipmi_sensor |
| user_id | None |
+-----------------------+-------------------------------------------------------------------+

Revision history for this message
gordon chung (chungg) wrote :

yay! sorry for delay in response, was tracking another item. glad to see it working now.

now the docs just need to be updated to fix this.

thanks again for all the help debugging this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/543036

Changed in ceilometer:
assignee: nobody → gordon chung (chungg)
status: Triaged → In Progress
gordon chung (chungg)
Changed in ceilometer:
importance: High → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/542949
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=663c523328690dfcc30c1ad986ba57e566bd194c
Submitter: Zuul
Branch: master

commit 663c523328690dfcc30c1ad986ba57e566bd194c
Author: gord chung <email address hidden>
Date: Fri Feb 9 12:05:55 2018 -0500

    add ipmi sensor data to gnocchi

    we've been missing this data for a while.

    Change-Id: I0df15c3e2f4ce98a41320a711e1f18d2c5d7c34d
    Related-Bug: #1746736

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (master)

Reviewed: https://review.openstack.org/543036
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=8e06ebcecfcf7245dbbfe693e0cae36ed7a92952
Submitter: Zuul
Branch: master

commit 8e06ebcecfcf7245dbbfe693e0cae36ed7a92952
Author: gord chung <email address hidden>
Date: Fri Feb 9 21:56:32 2018 +0000

    update ipmi docs

    - add install instructions
    - fix docs to better show ipmi meters from notifications(ironic)
    and pollsters(ipmitool/node manager)

    Closes-Bug: #1746736
    Change-Id: Ia83b56006e201bb0f8681ac1299387fb2ee6bdb6

Changed in ceilometer:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/548348

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/548354

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (stable/pike)

Reviewed: https://review.openstack.org/548354
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=d91c480b5735aeefb1519fbf3c015e63f916fb61
Submitter: Zuul
Branch: stable/pike

commit d91c480b5735aeefb1519fbf3c015e63f916fb61
Author: gord chung <email address hidden>
Date: Fri Feb 9 21:56:32 2018 +0000

    update ipmi docs

    - add install instructions
    - fix docs to better show ipmi meters from notifications(ironic)
    and pollsters(ipmitool/node manager)

    Closes-Bug: #1746736
    Change-Id: Ia83b56006e201bb0f8681ac1299387fb2ee6bdb6
    (cherry picked from commit 8e06ebcecfcf7245dbbfe693e0cae36ed7a92952)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ceilometer (stable/queens)

Reviewed: https://review.openstack.org/548348
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=4ee12fbb5d1a283b97dff48149f90442b9c7bb07
Submitter: Zuul
Branch: stable/queens

commit 4ee12fbb5d1a283b97dff48149f90442b9c7bb07
Author: gord chung <email address hidden>
Date: Fri Feb 9 21:56:32 2018 +0000

    update ipmi docs

    - add install instructions
    - fix docs to better show ipmi meters from notifications(ironic)
    and pollsters(ipmitool/node manager)

    Closes-Bug: #1746736
    Change-Id: Ia83b56006e201bb0f8681ac1299387fb2ee6bdb6
    (cherry picked from commit 8e06ebcecfcf7245dbbfe693e0cae36ed7a92952)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ceilometer 10.0.1

This issue was fixed in the openstack/ceilometer 10.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ceilometer 9.0.6

This issue was fixed in the openstack/ceilometer 9.0.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/ceilometer 11.0.0

This issue was fixed in the openstack/ceilometer 11.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.