file dispatcher does not write data cleanly

Bug #1437506 reported by gordon chung
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceilometer
Invalid
Undecided
Unassigned

Bug Description

the data captured by ceilometer maps to json pretty cleanly. it'd be nice to have file dispatcher write in a way that can easily be changed to json so that it can be read in by a processing tool such as spark for efficient post processing.

currently, the file dispatcher writes metering data as a list:

[{u'counter_name': u'vcpus', u'user_id': u'd510479b90ab4c9b9d6069bb2c42998d', u'message_signature': u'b867f6b0994171393ff9eddfb037cdedc381694191c273282f3ec75083af3c30', u'timestamp': u'2015-03-27 22:00:49.204839', u'resource_id': u'30474703-7afb-4045-8626-5e7ce97bfa18', u'message_id': u'ba543da4-d4cc-11e4-927b-28b2bd01ed52', u'source': u'openstack', u'counter_unit': u'vcpu', u'counter_volume': 1, u'project_id': u'fb1b20664781417d8270ffc4d9b0df03', u'resource_metadata': {u'state_description': u'', u'event_type': u'compute.instance.exists', u'availability_zone': u'nova', u'terminated_at': u'', u'ephemeral_gb': 0, u'instance_type_id': 6, u'bandwidth': {}, u'deleted_at': u'', u'reservation_id': u'r-j3y63nnp', u'instance_id': u'30474703-7afb-4045-8626-5e7ce97bfa18', u'display_name': u'inst', u'hostname': u'inst', u'state': u'active', u'progress': u'', u'launched_at': u'2015-03-27T13:11:48.000000', u'node': u'reverie', u'ramdisk_id': u'8cc6bcc9-dce1-4187-bca5-d4179edff630', u'access_ip_v6': None, u'disk_gb': 0, u'access_ip_v4': None, u'kernel_id': u'2555347b-91a4-4f8b-afb6-091cd8a7572e', u'host': u'compute.reverie', u'user_id': u'd510479b90ab4c9b9d6069bb2c42998d', u'image_ref_url': u'http://10.162.32.175:9292/images/283de560-a2e1-4e9d-b7fd-6b08f5008460', u'cell_name': u'', u'audit_period_beginning': u'2015-03-27 21:00:00', u'root_gb': 0, u'tenant_id': u'fb1b20664781417d8270ffc4d9b0df03', u'created_at': u'2015-03-27 13:11:41+00:00', u'memory_mb': 64, u'instance_type': u'm1.nano', u'vcpus': 1, u'image_meta': {u'kernel_id': u'2555347b-91a4-4f8b-afb6-091cd8a7572e', u'container_format': u'ami', u'min_ram': u'0', u'ramdisk_id': u'8cc6bcc9-dce1-4187-bca5-d4179edff630', u'disk_format': u'ami', u'min_disk': u'0', u'base_image_ref': u'283de560-a2e1-4e9d-b7fd-6b08f5008460'}, u'architecture': None, u'audit_period_ending': u'2015-03-27 22:00:00', u'os_type': None, u'instance_flavor_id': u'42'}, u'counter_type': u'gauge'}]

and events as a object

[<Event: a950f983-b771-4080-bc86-8d7b14851e95, compute.instance.exists, 2015-03-27 22:00:49.223657, <Trait: state 1 active> <Trait: audit_period_beginning 4 2015-03-27 21:00:00> <Trait: root_gb 2 0> <Trait: user_id 1 d510479b90ab4c9b9d6069bb2c42998d> <Trait: service 1 compute> <Trait: disk_gb 2 0> <Trait: tenant_id 1 fb1b20664781417d8270ffc4d9b0df03> <Trait: ephemeral_gb 2 0> <Trait: instance_type_id 2 6> <Trait: vcpus 2 1> <Trait: memory_mb 2 64> <Trait: instance_id 1 c29d959a-3e5d-49e2-a996-51d223cc60d4> <Trait: host 1 reverie> <Trait: request_id 1 req-7da46874-26c6-4ced-a68d-bbc7d87aa39e> <Trait: audit_period_ending 4 2015-03-27 22:00:00> <Trait: instance_type 1 m1.nano> <Trait: launched_at 4 2015-03-27 20:44:17>>]

Changed in ceilometer:
assignee: nobody → Rohit Jaiswal (rohit-jaiswal-3)
assignee: Rohit Jaiswal (rohit-jaiswal-3) → nobody
Revision history for this message
Rohit Jaiswal (rohit-jaiswal-3) wrote :

Is it mostly to do with the event signature?

Revision history for this message
Rohit Jaiswal (rohit-jaiswal-3) wrote :

I can see that events are being stored as object; In collector, event payload is cast to Event object and dispatched as a list of Event objects.

For Samples, i think its the plugin_base, the base class of notification plugins that is publishing a single sample as a list.

The collector can be changed to use json encoding on payloads before forwarding to dispatchers. does that sound like a reasonable approach?

Changed in ceilometer:
assignee: nobody → Rohit Jaiswal (rohit-jaiswal-3)
Revision history for this message
Rohit Jaiswal (rohit-jaiswal-3) wrote :

I meant encoding the data in the file dispatcher itself. For events, it would make more sense for collector to emit the data in raw form. The event database driver can handle the cast to the Event model, so that all dispatchers get the event data in same format and can change it to whatever form they want.

Revision history for this message
gordon chung (chungg) wrote :

so events already have a serialise function that will out put json: https://github.com/openstack/ceilometer/blob/master/ceilometer/event/storage/models.py#L62-L67

i haven't really looked into it but i was hoping to output it so it was proper json and not just lines of individual json entries... but now that i think about it, it might be better to output lines of individual json entries and if someone wants to use a tool like spark, they can easily format it to json.

Revision history for this message
Rohit Jaiswal (rohit-jaiswal-3) wrote :

I think if we use the serialize function in models.Event, we will have to incur a deserialize-serialize in case of FileDispatcher which will have considerable cost when dispatching events at scale. Can the collector just emit plain python list of dicts of events? This is similar to how samples are dispatched by collector currently. This makes the event format json-ready, but changes the contract for the event storage layer, so that they now get a list of plain dicts, rather than Event objects.

Revision history for this message
gordon chung (chungg) wrote :

ok, so i got it. i think the issue for events is that in collector we take the json/dict based event we receive and translate it back to the event object unnecessarily.

i've open bug 1438285 to track this

Revision history for this message
Rohit Jaiswal (rohit-jaiswal-3) wrote :

sounds good. I think when 1438285 is fixed, the intended fix for this bug will be to use the serialize function of the event model in FileDispatcher to properly serialize each event model and log it. For samples, there is no such serialize function, so we can probably use json.dumps

Revision history for this message
Rohit Jaiswal (rohit-jaiswal-3) wrote :

Correction to my previous comment - not to use serialize function since input would no longer be list of model objects. (assuming 1438285 is fixed). In which case, the data being logged is json-ready but not pure json and then this bug can be closed or do we want FileDispatcher to log pure json?

Revision history for this message
gordon chung (chungg) wrote :

just an fyi, i pushed my cleanup item since the scope was quite large as you mentioned.

Changed in ceilometer:
importance: Undecided → Medium
gordon chung (chungg)
Changed in ceilometer:
assignee: Rohit Jaiswal (rohit-jaiswal-3) → nobody
Changed in ceilometer:
assignee: nobody → khushbu (khushbuparakh)
Changed in ceilometer:
assignee: khushbu (khushbuparakh) → nobody
Changed in ceilometer:
status: Triaged → Confirmed
gordon chung (chungg)
Changed in ceilometer:
status: Confirmed → Triaged
Revision history for this message
ZhiQiang Fan (aji-zqfan) wrote :

master branch verified to be:

[{u'event_type': u'compute.instance.exists', u'traits': [[u'ephemeral_gb', 2, 0], [u'instance_type_id', 2, 6], [u'user_id', 1, u'c417eb1ad2534df483e9065c03ab8da8'], [u'service', 1, u'compute'], [u'state', 1, u'active'], [u'project_id', 1, u'74a0f0d8734d46a18773139c0c331e5e'], [u'launched_at', 4, u'2016-04-23T04:50:13'], [u'disk_gb', 2, 1], [u'instance_id', 1, u'0d6a8180-d93d-4a13-b8a2-62299c8bb1b0'], [u'host', 1, u'BJWS'], [u'audit_period_beginning', 4, u'2016-04-23T22:00:00'], [u'root_gb', 2, 1], [u'tenant_id', 1, u'74a0f0d8734d46a18773139c0c331e5e'], [u'memory_mb', 2, 512], [u'instance_type', 1, u'm1.tiny'], [u'vcpus', 2, 1], [u'request_id', 1, u'req-3e6b9d0a-ca64-46f1-b0bb-3368761b6ea1'], [u'audit_period_ending', 4, u'2016-04-23T23:00:00']], u'message_signature': u'6145916adf8144f85ebd7eb20b1725cb6e305d003e91d16e9e5d81f3a4f8f45b', u'raw': {}, u'generated': u'2016-04-23T23:00:42.491759', u'message_id': u'edd4a19b-79a7-400e-bb6c-d5684cb25d68'}]

seems good

Changed in ceilometer:
status: Triaged → Invalid
importance: Medium → Undecided
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.