NetApp Cinder driver performance collector is fetching diag level commands via API/ZAPI

Bug #1670879 reported by Bishoy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Unassigned

Bug Description

Hello Team,

We have been seeing a lot of errors on Mitaka regarding permission on the openstack user used by Cinder that has a Vserver admin privileges regarding performance archives:

This happens on Rehdat and Fuel Openstack.

2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode [req-adsfsd1-b754-4efed5-bf91-sdfsdfcca - - - - -] Could not get utilization counters from node EMEA-LAB1
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode Traceback (most recent call last):
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/netapp/dataontap/performance/perf_cmode.py", line 144, in _get_node_utilization_counters
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode self._get_node_utilization_wafl_counters(node_name) +
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/netapp/dataontap/performance/perf_cmode.py", line 156, in _get_node_utilization_system_counters
2017-02-1
5 01:22:5.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode self.system_object_name, node_name))
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 853, in trace_method_logging_wrapper
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode return f(*args, **kwargs)
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 853, in trace_method_logging_wrapper
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode return f(*args, **kwargs)
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/netapp/dataontap/client/client_cmode.py", line 813, in get_performance_instance_uuids
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode enable_tunneling=False)
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode File "/usr/lib/python2.7/dist-packages/cinder/utils.py", line 853, in trace_method_logging_wrapper
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode return f(*args, **kwargs)
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/netapp/dataontap/client/client_base.py", line 89, in send_request
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode return self.connection.invoke_successfully(request, enable_tunneling)
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode File "/usr/lib/python2.7/dist-packages/cinder/volume/drivers/netapp/dataontap
/client/api.py", line 253, in invoke_successfully
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode raise NaApiError(code, msg)
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode NaApiError: NetApp API failed. Reason - 13003:Insufficient privileges: user 'openstack' does not have read access to this resource
2017-02-1 01:22:57.716 7271 ERROR cinder.volume.drivers.netapp.dataontap.performance.perf_cmode

After looking at the code:

def _get_node_utilization_wafl_counters(self, node_name):
        """Get the WAFL counters for calculating node utilization."""

        wafl_instance_uuids = self.zapi_client.get_performance_instance_uuids(
            'wafl', node_name)

        wafl_counter_names = ['total_cp_msecs', 'cp_phase_times']
        wafl_counters = self.zapi_client.get_performance_counters(
            'wafl', wafl_instance_uuids, wafl_counter_names)

        # Expand array data so we can use wafl:cp_phase_times[P2_FLUSH]
        for counter in wafl_counters:
            if 'cp_phase_times' in counter:
                self._expand_performance_array(
                    'wafl', 'cp_phase_times', counter)

        return wafl_counters

It looks like it's trying to get "get_node_utilization_wafl_counters" from the cDOT cluster with a user that only has ontapi permission which means It can't get data like that even from stored data at the SPI like what oncommand does.

for example total_cp_msecs and cp_phase_times can be only obtained via admin role user in a diag mode on Node level!

E.G.
filer*> wafl_susp -z
filer*> wafl_susp -r

Don't see it tries to even diag the requests:

def get_performance_counters(self, object_name, instance_uuids,
                                 counter_names):
        """Gets or or more cDOT performance counters."""

        api_args = {
            'objectname': object_name,
            'instance-uuids': [
                {'instance-uuid': instance_uuid}
                for instance_uuid in instance_uuids
            ],
            'counters': [
                {'counter': counter} for counter in counter_names
            ],
        }

        result = self.send_request('perf-object-get-instances',
                                   api_args,
                                   enable_tunneling=False)

        counter_data = []

        timestamp = result.get_child_content('timestamp')

        instances = result.get_child_by_name(
            'instances') or netapp_api.NaElement('None')
        for instance in instances.get_children():

            instance_name = instance.get_child_content('name')
            instance_uuid = instance.get_child_content('uuid')
            node_name = instance_uuid.split(':')[0]

            counters = instance.get_child_by_name(
                'counters') or netapp_api.NaElement('None')
            for counter in counters.get_children():

                counter_name = counter.get_child_content('name')
                counter_value = counter.get_child_content('value')

                counter_data.append({
                    'instance-name': instance_name,
                    'instance-uuid': instance_uuid,
                    'node-name': node_name,
                    'timestamp': timestamp,
                    counter_name: counter_value,
                })

        return counter_data

Cinder shouldn't need info about aggregates and things like that like Manila does. or at least if we need to enable some features by having these privileges we should avoid having a lot of errors that that concerns admins.

Thanks in advance!

Bishoy (bishoysamy)
tags: added: cinder manila
Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

AFAICS this is a duplicate of https://bugs.launchpad.net/cinder/+bug/1660870.
The 'fix' for which is to de-escalate the log level, or just log it once.

https://review.openstack.org/#/c/433251/

Revision history for this message
Bishoy (bishoysamy) wrote :

Wafl counts can be gathered with no diag mode?

Revision history for this message
Goutham Pacha Ravi (gouthamr) wrote :

Yes, and as you mentioned, you do need the cluster-scoped administrator account for this.

Here's a snapshot from one of our CI runs:

Snapshot of the response: http://paste.openstack.org/show/601852/
The cinder-volume log: http://54.153.118.32/ci-logs/logs/69/442769/2/upstream-check/cinder-cDOT-FCP/ce653d4/logs/screen-c-vol.txt.gz
All logs: http://54.153.118.32/ci-logs/logs/69/442769/2/upstream-check/cinder-cDOT-FCP/ce653d4/
Patch: https://review.openstack.org/#/c/442769/

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.