[RFE] Granular metering data in neutron-metering-agent

Bug #1886949 reported by Rafael Weingartner on 2020-07-09
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
neutron
Undecided
Rafael Weingartner

Bug Description

Problem Description
=================
Currently, when creating shared labels in Neutron metering, we only receive a total amount of accounting data for a label. This makes it hard for operators to identify what tenants and routers, where the labels were applied, are generating the traffic. Moreover, for shared labels, as they apply in all tenants, Neutron metering is using a random tenant_id, which can create confusion when used in production.

Proposed Change
===============
We propose to extend the neutron metering agent to enable/use a granular
message format. To maintain backward compatibility, this feature is enabled/disabled by the following neutron-metering-agent parameter:

* ``granular_traffic_data``: Defines if the metering agent driver should
present traffic data in a granular fashion, instead of grouping all of the
traffic data for all tenants and routers where the labels were assigned to. The default value is ``False`` for backward compatibility.

When the ``granular_traffic_data`` config is set to ``True``, we have the
following granularities:
* label -- all of the traffic counters for a given label. One must bear in
mind that a label (shared) can be assigned to multiple routers.
* router -- all of the traffic counter for all labels that are assigned to
the router.
* tenant -- all of the traffic counters for all labels of all routers that a tenant has.
* router-label -- all of the traffic counters for a router and the given label.
* tenant-label -- all of the traffic counters for all routers of a tenant that have a given label.

Each granularity presented here is sent to the message bus with different
events types that vary according to the granularity. The mapping between
granularity and event type is presented as follows.

* ``label`` -- event type ``l3.meter.label``.
* ``router`` -- event type ``l3.meter.router``.
* ``tenant`` -- event type ``l3.meter.tenant``..
* ``router-label`` -- event type ``l3.meter.label_router``.
* ``tenant-label`` -- event type ``l3.meter.label_tenant``.

Therefore, we will change the non-granular (``granular_traffic_data = False``) traffic messages (here also called as legacy), which have the following format.

     {"pkts": "<the number of packets that matched the rules of the labels>",
      "bytes": "<the number of bytes that matched the rules of the labels>",
      "time": "<seconds between the first data collection and the last one>",
      "first_update": "timeutils.utcnow_ts() of the first collection",
      "last_update": "timeutils.utcnow_ts() of the last collection",
      "host": "<neutron metering agent host name>",
      "label_id": "<the label id>",
      "tenant_id": "<the tenant id>"
      }

The following will be the new event message format, which will also contain some attributes that can be found in the legacy mode such as ``bytes``, ``pkts``, ``time``, ``first_update``, ``last_update``, and ``host``. As follows we present an example of JSON message with all of the
possible attributes.

     {"resource_id": "router-f0f745d9a59c47fdbbdd187d718f9e41-label-00c714f1-49c8-462c-8f5d-f05f21e035c7",
      "tenant_id": "f0f745d9a59c47fdbbdd187d718f9e41",
      "first_update": 1591058790,
      "bytes": 0,
      "label_id": "00c714f1-49c8-462c-8f5d-f05f21e035c7",
      "label_name": "test1",
      "last_update": 1591059037,
      "host": "<hostname>",
      "time": 247,
      "pkts": 0,
      "label_shared": true
      }

The ``resource_id`` is a unique identifier for the "resource" being monitored. Here we consider a resource to be any of the granularities that we handle. The following is the standard for resource ID when the legacy
mode is disabled (for each granularity).

* labels -- label-<label_id>
* routers -- router-<router_id>
* tenant -- tenant-<tenant_id>
* router-label -- router-<router_id>-label-<label_id>
* tenant-label -- tenant-<tenant_id>-label-<label_id>

Changed in neutron:
assignee: nobody → Rafael Weingartner (rafaelweingartner)
status: New → In Progress
description: updated
Revision history for this message
Brian Haley (brian-haley) wrote :

Just an FYI that there was a discussion recently at the PTG and drivers meeting about a "new" way to do metering in the l3-agent, with the potential to deprecate the metering agent, https://bugs.launchpad.net/neutron/+bug/1817881

So we should have a wider discussion on this topic. I've asked Slawek to take a look so we can get this on the drivers team agenda.

Revision history for this message
Rafael Weingartner (rafaelweingartner) wrote :

Thanks for the heads up Brian. However, that proposal is to change the method used to gather data. This proposal does not touch the method that we use to gather data. I imagine that the proposal https://bugs.launchpad.net/neutron/+bug/1817881 will only change how data is retrieved, and then report back to Ceilometer (metering queue) with the legacy message format. Therefore, the proposals are not exclusive, they both complement each other and enhance the Neutron metering agents.

Just ping if you guys want to have a wider discussion.

Revision history for this message
Brian Haley (brian-haley) wrote :

Ok, from a quick read this looked like it was updating the metering agent code, which we probably don't want to do if we're going to be deprecating it. After Slawek looks he'll most likely add it our agenda, we have a meeting every Friday at 14:00 UTC.

tags: added: rfe
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

For now https://bugs.launchpad.net/neutron/+bug/1817881 isn't really implemented yet and what we discussed was that if that will be done and we will see that it is real alternative to the metering agent, we may think about deprecating it.
But IMO if there are users of the metering agent now and You want to improve it and implement new features there, we shouldn't IMO block it in any way. It isn't definitely deprecated currently.
So I will add this RFE to the today's drivers meeting agenda to discuss that.

tags: added: rfe-triaged
removed: rfe
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

We discussed this RFE during our last drivers meeting. Discussion can be found at http://eavesdrop.openstack.org/meetings/neutron_drivers/2020/neutron_drivers.2020-07-10-14.00.log.html#l-93

There is couple of interesting questions there. Please reply to those questions in the comment here so we can continue discussion this RFE on next meeting.

Revision history for this message
Rafael Weingartner (rafaelweingartner) wrote :

Hello Slawek,
I was not able to make the meeting. Sorry, about that. My answers for the inquiries that you guys raised on the 2020-07-10 Neutron meeting are the following.

For the inquiry:
14:49:23 <slaweq> looking at the proposed change of the message format I'm not sure if we need another config knob for that
14:49:51 <slaweq> we could maybe simply add some new fields to the existing message so it would be backward compatible always
14:49:54 <slaweq> wdyt?

I created the configuration to make it backward compatible. I was not sure by whom and how the current implementation is being used. Therefore, I created a parameter to enable/disable the implementation. It seemed more natural, and we can deprecate the “legacy mode” with time, and just use the new format proposed.

14:50:53 <mlavalle> would the receiving end be confused by the "new fields"?

Yes, it would (in my opinion). It is not just about new fields; it is about separating the different aggregation methods into different metering messages. We need that to be able to process them properly in Ceilometer.

14:51:55 <njohnston> Does this granularity imply an increase in the number of emssages sent to the notification queue?

Yes, it does. With the new format, instead of one message per label per report time frame, we would have in the worst case (1 message per label + 1 message per router + 1 message per tenant + 1 message per label-tenant + 1 message per label-router) per report time frame. However, one must bear in mind that these report time frames are configurable. Therefore, one can set it to 30-60min. The guys using this right now are using 10min for the report time frame in a reasonably sized environment without any problems.

14:52:38 <njohnston> If message volume will increase that is something that an operator might want to be able to turn off, if their RabbitMQ is stressed.

That is another reason why I created the parameter. In the worst case, the operator could just use the legacy mode. However, there is always the possibility of using larger report intervals and/or larger data-gathering intervals. All of these configurations are described in detail in the documentation I wrote for the Neutron metering implementation.

I guess those were all of the questions I found in the meeting. If you guys have any other questions, suggestions, or comments concerning this implementation, please do not hesitate to ping me.

And, by the way, I am working on another extension for the Neutron-metering system to allow remote and source IP filtering withint label rules. I just mention that to show you guys that there are indeed people using it (the metering agent) and working to make it better.

Revision history for this message
Miguel Lavalle (minsel) wrote :

@Rafael

Thanks for the responses. This RFE was reviewed again by the drivers team and was approved. There was a suggestion of investigating whether versioning the messages might be feasible, like Nova does.

tags: added: rfe-approved
removed: rfe-triaged
Revision history for this message
Rafael Weingartner (rafaelweingartner) wrote :

Thanks for the update @Miguel.

To answer the inquiry regarding versioning:
```
14:14:36 <njohnston> Yes. Ideal would be for the message to incorporate some kind of versioning like an OVO but I don't think we can depend on the other side evolving to understand that necessarily.
```
I agree it would be interesting. I have been contributing quite a great deal already to Ceilometer as well. Therefore, it would not be hard to design a method for it to handle versioned messages as well. I will add this idea to my backlog.

Thank you all guys for the review, and support on the proposal. The pull request is already open and ready for your reviews.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Revision history for this message
Rafael Weingartner (rafaelweingartner) wrote :

Thank you very much!
We did not know that it was a requirement to post a Blueprint a well.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/735605
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=bd1467b47c3c55f098722fca79a12dfc57ea6e18
Submitter: Zuul
Branch: master

commit bd1467b47c3c55f098722fca79a12dfc57ea6e18
Author: Rafael Weingärtner <email address hidden>
Date: Fri Jun 12 09:54:47 2020 -0300

    Granular metering data in neutron-metering-agent

    Extend neutron metering agent to generate Granular metering data.
    The rationale here is to have data (bytes and packets) not just in
    a label basis, but also in tenant, router, and router-label, and tenant-label
    basis. This allows operators to develop more complex network monitoring
    solutions.

    Moreover, I added documentation to explain what is the neutron metering agent,
    its configs, and different message formats.

    Change-Id: I7b6172f88efd4df89d7bed9a0af52f80c61acbe0
    Implements: https://blueprints.launchpad.net/neutron/+spec/granular-metering-data
    Closes-Bug: #1886949

Changed in neutron:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers