Collector is slow on storing samples to backend database

Bug #1291923 reported by Mitsuru Kanabuchi
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceilometer
Won't Fix
Medium
Unassigned

Bug Description

It takes about 18 minutes in storing 20,000 samples (floating ips in our case) to the backend database.
Stable/havana, with MySQL backend DB is used.

The polling interval is 10 minutes by default, so next polling cycle will start before storing completes.
Therefore, problems such that queue is choked with messages (samples) and/or resources for collector (cpu, memory) increase, might occur.

The followings can be considered to be reasons of the problem.
- Collector stores samples one at a time (i.e., no bulk insert).
- Schema level/code level problems may exist, as pointed out in the following article.
  http://lists.openstack.org/pipermail/openstack-dev/2013-December/023134.html

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

I think this is not a duplicate of Bug #1291054: ceilometer service is working on a single thread.

#1291054 seems to target to enable multiple workers for notification agent and collector. The fix will have great effect in speeding up storing samples to backend database.

However, #1291054 seems to solve our problem only partly. There are cases that single central agent needs to handle a large amount of samples. For example, 20,000 samples of floating ips, acquired from one polling to neutron, are made into one message and put into the queue. Only one collector can get the message and store 20,000 samples. Therefore, we believe that we need to consider the DB store speed of a single collector.

Ideas for solutions are:
1. implement some bulk insert feature into central agent.
2. avoid forming large number of samples into single message - divide message to contain smaller number of samples for each, so that multiple workers can take effect.

Changed in ceilometer:
assignee: nobody → Mitsuru Kanabuchi (kanabuchi)
Julien Danjou (jdanjou)
Changed in ceilometer:
importance: Undecided → Low
status: New → Triaged
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ceilometer (master)

Fix proposed to branch: master
Review: https://review.openstack.org/94155

Changed in ceilometer:
status: Triaged → In Progress
Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

In order to make effective use of the Multiple Worker [1],
we made modifications to split a message and to send it.

[1]:https://bugs.launchpad.net/ceilometer/+bug/1291054

Revision history for this message
gordon chung (chungg) wrote :

hi Mitsuru, we've made changes to collecto sql backend so the collector should write quite a bit faster now. you can also enable multiple collector workers now

Revision history for this message
Mitsuru Kanabuchi (kanabuchi) wrote :

Hi Gordon, thank you for great works!
We have a plan to re-evaluate the collector's performance after Juno-1 released.
We'll post the result of that measurement.

Changed in ceilometer:
assignee: Mitsuru Kanabuchi (kanabuchi) → Rikimaru Honjo (honjo-rikimaru-c6)
Revision history for this message
Rikimaru Honjo (honjo-rikimaru-c6) wrote :

Hi Gordon,

I'm Rikimaru Honjo. I assumed this task from Mitsuru. Please keeping in touch with me about this topic.

we have re-evaluated the collector's performance using Juno-1.

Due to the change in our test environment, this time we used glance image and image.size, each 10400 samples (20800 total). The result shows that Juno-1 is more than 2 times faster than Icehouse.
Using Icehouse: 50 minutes to write the 20800 samples to DB.
Using Juno-1: 18.5 minutes.
Please note that absolute times are irrelevant, since we used very old machine.
Thank you to your effort in refining the model.

Eoghan Glynn (eglynn)
Changed in ceilometer:
milestone: none → juno-3
Revision history for this message
gordon chung (chungg) wrote :

hi Rikimaru Honjo.

that is great news. more refinements are coming (less related to write speed but some improvement is expected)

Thierry Carrez (ttx)
Changed in ceilometer:
milestone: juno-3 → juno-rc1
Eoghan Glynn (eglynn)
Changed in ceilometer:
milestone: juno-rc1 → none
Revision history for this message
gordon chung (chungg) wrote :

the SQL backend can still be improved with batch inserts

Changed in ceilometer:
status: In Progress → Triaged
importance: Low → Medium
assignee: Rikimaru Honjo (honjo-rikimaru-c6) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on ceilometer (master)

Change abandoned by gordon chung (<email address hidden>) on branch: master
Review: https://review.openstack.org/94155
Reason: inactive, please reopen if important

Revision history for this message
gordon chung (chungg) wrote :

will close this... as we transition to new model (gnocchi), we can re-open.

Changed in ceilometer:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.