ceilometer statistics timeout

Bug #1597679 reported by magicboiz
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Won't Fix
Medium
MOS Ceilometer
7.0.x
Won't Fix
Medium
MOS Maintenance
8.0.x
Won't Fix
Medium
MOS Maintenance
Mitaka
Won't Fix
Medium
MOS Ceilometer

Bug Description

Hi

there is a strange ceilometer-api behavior. While I can query ceilometer-api for "meter-list"/"sample-list" (or even openstack usage show) very fast (which should discard slowness somewhere...), it looks like ceilometer-api struggles with "ceilometer statistics" getting always a timeout

[user1@ubuntu ~]$ ceilometer statistics -q "project=6294315f9df1491fa9345701d2642c96;timestamp>2016-06-20T14:30" -m vcpus --period 60
b'Gateway Timeout (HTTP 504)'

Environment:
MOS 8.0 + Mantainance Update 1
ubuntu 14.04 LTS

I have applied https://review.openstack.org/#/c/287993/ to my environment, restarted all ceilometer programs, but still getting the same timeout.

Some ceilometer-api tests (with the time it takes to be completed):

[user1@ubuntu ~]$ time openstack usage show
Usage from 2016-06-02 to 2016-07-01 on project 6294315f9df1491fa9345701d2642c96:
+---------------+-----------+
| Field | Value |
+---------------+-----------+
| CPU Hours | 164.21 |
| Disk GB-Hours | 16420.51 |
| RAM MB-Hours | 168146.07 |
| Servers | 1 |
+---------------+-----------+
real 0m2.204s
user 0m1.490s
sys 0m0.113s

[user1@ubuntu ~]$ time ceilometer meter-list
+---------------------------------+------------+-----------+-----------------------------------------------------------------------+----------------------------------+----------------------------------+
| Name | Type | Unit | Resource ID | User ID | Project ID |
+---------------------------------+------------+-----------+-----------------------------------------------------------------------+----------------------------------+----------------------------------+
| cpu | cumulative | ns | 5c305833-fbd2-44cb-b517-789641c2a5db | eb12445c73ae4093b7c7105ebe3f1216 | 6294315f9df1491fa9345701d2642c96 |
| cpu.delta | delta | ns | 5c305833-fbd2-44cb-b517-789641c2a5db | eb12445c73ae4093b7c7105ebe3f1216 | 6294315f9df1491fa9345701d2642c96 |
.............
| vcpus | gauge | vcpu | 5c305833-fbd2-44cb-b517-789641c2a5db | eb12445c73ae4093b7c7105ebe3f1216 | 6294315f9df1491fa9345701d2642c96 |
+---------------------------------+------------+-----------+-----------------------------------------------------------------------+----------------------------------+----------------------------------+

real 0m2.088s
user 0m0.897s
sys 0m0.057s

[user1@ubuntu ~]$ time ceilometer sample-list -q "timestamp>2016-06-1T00:00"
+--------------------------------------+-----------------------------------------------------------------------+---------------------------------+------------+------------------------+-----------+----------------------------+
| ID | Resource ID | Name | Type | Volume | Unit | Timestamp |
+--------------------------------------+-----------------------------------------------------------------------+---------------------------------+------------+------------------------+-----------+----------------------------+
| e81ed112-3e99-11e6-b12c-52540086048d | 6294315f9df1491fa9345701d2642c96 | storage.objects.containers | gauge | 0.0 | container | 2016-06-30T08:09:02.663000 |
......
| 9b50bb52-3e99-11e6-90d4-52540086048d | instance-000002d6-5c305833-fbd2-44cb-b517-789641c2a5db-tapdefa1d28-b5 | network.outgoing.packets.gb | cumulative | 2.654269337654114e-07 | gb | 2016-06-30T08:06:53.700000 |
+--------------------------------------+-----------------------------------------------------------------------+---------------------------------+------------+------------------------+-----------+----------------------------+

real 0m2.373s
user 0m0.977s
sys 0m0.073s

Revision history for this message
magicboiz (magicboiz) wrote :

Hi

new update: after rebooting all my three controllers, now I can get the expected result, but it takes a loooong time, too much time actually:

time ceilometer statistics -q "project=6294315f9df1491fa9345701d2642c96;timestamp>2016-06-01T00:00" -m vcpus --period 60
+--------+---------------------+---------------------+-----+-----+-----+-----+-------+----------+----------------------------+----------------------------+
| Period | Period Start | Period End | Max | Min | Avg | Sum | Count | Duration | Duration Start | Duration End |
+--------+---------------------+---------------------+-----+-----+-----+-----+-------+----------+----------------------------+----------------------------+
| 60 | 2016-06-30T11:00:00 | 2016-06-30T11:01:00 | 1.0 | 1.0 | 1.0 | 1.0 | 1 | 0.0 | 2016-06-30T11:00:11.167000 | 2016-06-30T11:00:11.167000 |
| 60 | 2016-06-30T12:00:00 | 2016-06-30T12:01:00 | 1.0 | 1.0 | 1.0 | 1.0 | 1 | 0.0 | 2016-06-30T12:00:05.296000 | 2016-06-30T12:00:05.296000 |
| 60 | 2016-06-30T13:00:00 | 2016-06-30T13:01:00 | 1.0 | 1.0 | 1.0 | 1.0 | 1 | 0.0 | 2016-06-30T13:00:56.159000 | 2016-06-30T13:00:56.159000 |
+--------+---------------------+---------------------+-----+-----+-----+-----+-------+----------+----------------------------+----------------------------+

real 0m22.567s
user 0m0.883s
sys 0m0.090s

22sec with just some instances running (less than 20vm running) in the cloud.

So, probably this is not a bug, but I think that the whole ceilometer/mongodb configuration has some performance problem because all my environment run with new hardware (controllers with > 24GB RAM and SSD samsun pro)

Revision history for this message
Dmitry Klenov (dklenov) wrote :

@magicboiz - can you plase attach diagnostic snapshot to this issue? It would help to analyze performance-related items.

Changed in fuel:
milestone: none → 10.0
assignee: nobody → MOS Ceilometer (mos-ceilometer)
importance: Undecided → Medium
tags: added: area-mos
Changed in fuel:
status: New → Incomplete
Revision history for this message
magicboiz (magicboiz) wrote :

Hi

I have a 2GB diagnostic snapshot file....what's the correct way to upload such a big file?

Revision history for this message
magicboiz (magicboiz) wrote :

Hi

Is it possible to upload a 2GB diagnostic file? thx

Besides that, is it possible to make haproxy/ceilometer-api/mongod to not timeout after 1min working out on a ceilometer query?

Revision history for this message
magicboiz (magicboiz) wrote :

hi

as a workaround, increasing haproxy timeouts (queue, server, client timeouts) to 2min, ceilometer queries doesn't timeout after 1min.

Revision history for this message
Nadya Privalova (nprivalova) wrote :

@magicboiz, you're right that this request should not work so long. The reason is that it is notification-based (cpu metric is collected only after notifications is received, i.e. at the moments when vm starts and once in hour when Nova sends heartbeats). We usually don't recommend to run Mongo on controllers because it creates high disk utilisation. Anyway, in your env everything should work good (20 vm is very small scale). Could you please check two things for us:
1. Make sure that disk utilisation is not too high
2. Check the list of indexes in Mongo. We've changed this in MOS 9.0. Could you please try to apply this change https://review.openstack.org/#/c/276262/ . Reach me out in irc if you need any assistance on this.

Changed in fuel:
milestone: 10.0 → 9.1
Changed in fuel:
milestone: 9.1 → 8.0-updates
Revision history for this message
magicboiz (magicboiz) wrote :

Hi

IOWAIT is not the problem, our servers are running over samsung ssd disks, and iostat shows no problem there. But process mongod puts CPU% ~ 100% everytime there is a query like this:

"ceilometer statistics -q "timestamp>2016-07-01T00:00;timestamp<2016-08-01T00:00" -m memory"

this query takes about 100sec to be completed in my system. During this time, the mongo node which works on the query, puts its cpu(core) to ~ 100%.

I'll apply this path and let you know.

Revision history for this message
magicboiz (magicboiz) wrote :

Hi again

how can I check the mongodb ceilometex index list?

Revision history for this message
Nadya Privalova (nprivalova) wrote :

To check the list of indexes, you should go to a primary node and do the following:

ceilometer:PRIMARY> use ceilometer
switched to db ceilometer
ceilometer:PRIMARY> db.system.indexes.find()

Here is the output from my env:

{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "ceilometer.dummyData" }
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "ceilometer.event" }
{ "v" : 1, "key" : { "event_type" : 1 }, "name" : "default_type_idx", "ns" : "ceilometer.event" }
{ "v" : 1, "key" : { "timestamp" : -1 }, "name" : "event_ttl", "ns" : "ceilometer.event", "expireAfterSeconds" : 604800 }
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "ceilometer.resource" }
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "resource_id" : 1 }, "name" : "default_meter_resource_idx", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "user_id" : 1 }, "name" : "default_meter_user_idx", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "project_id" : 1 }, "name" : "default_meter_project_idx", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "counter_name" : 1 }, "name" : "default_meter_idx", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "user_id" : -1 }, "name" : "default_resource_user_idx", "ns" : "ceilometer.resource" }
{ "v" : 1, "key" : { "project_id" : -1 }, "name" : "default_resource_project_idx", "ns" : "ceilometer.resource" }
{ "v" : 1, "key" : { "first_sample_timestamp" : -1 }, "name" : "default_resource_fst_idx", "ns" : "ceilometer.resource" }
{ "v" : 1, "key" : { "timestamp" : -1 }, "name" : "meter_ttl", "ns" : "ceilometer.meter", "expireAfterSeconds" : 604800 }
{ "v" : 1, "key" : { "last_sample_timestamp" : -1 }, "name" : "resource_ttl", "ns" : "ceilometer.resource", "expireAfterSeconds" : 604800 }

After you apply the patch, you will have the same output. Don't forget to run dbsync after the patch is applied.

Revision history for this message
magicboiz (magicboiz) wrote :
Download full text (3.3 KiB)

Hi,

this is my output:

1. ceilometer-dbsync:
root@node-5:~# source openrc
root@node-5:~# ceilometer-dbsync
No handlers could be found for logger "oslo_config.cfg"

2. mongodb index list:

MongoDB shell version: 2.6.10
connecting to: test
ceilometer:PRIMARY> use ceilometer
switched to db ceilometer
ceilometer:PRIMARY> db.system.indexes.find()
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "ceilometer.dummyData" }
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "ceilometer.resource" }
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "timestamp" : -1 }, "name" : "timestamp_idx", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "last_sample_timestamp" : -1 }, "name" : "last_sample_timestamp_idx", "ns" : "ceilometer.resource" }
{ "v" : 1, "key" : { "timestamp" : 1 }, "name" : "meter_ttl", "ns" : "ceilometer.meter", "expireAfterSeconds" : 604800 }
{ "v" : 1, "key" : { "last_sample_timestamp" : 1 }, "name" : "resource_ttl", "ns" : "ceilometer.resource", "expireAfterSeconds" : 604800 }
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "ceilometer.event" }
{ "v" : 1, "key" : { "timestamp" : 1 }, "name" : "event_ttl", "ns" : "ceilometer.event", "expireAfterSeconds" : 604800 }
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "ceilometer.alarm" }
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "ceilometer.alarm_history" }
{ "v" : 1, "key" : { "timestamp" : 1 }, "name" : "alarm_history_ttl", "ns" : "ceilometer.alarm_history", "expireAfterSeconds" : 604800 }
{ "v" : 1, "key" : { "resource_id" : 1 }, "name" : "default_meter_resource_idx", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "user_id" : 1 }, "name" : "default_meter_user_idx", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "project_id" : 1 }, "name" : "default_meter_project_idx", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "counter_name" : 1 }, "name" : "default_meter_idx", "ns" : "ceilometer.meter" }
{ "v" : 1, "key" : { "user_id" : -1 }, "name" : "default_resource_user_idx", "ns" : "ceilometer.resource" }
{ "v" : 1, "key" : { "project_id" : -1 }, "name" : "default_resource_project_idx", "ns" : "ceilometer.resource" }
{ "v" : 1, "key" : { "event_type" : 1 }, "name" : "default_type_idx", "ns" : "ceilometer.event" }
ceilometer:PRIMARY>

After applying this patch, I'm getting the same times:
time ceilometer statistics -q "timestamp>2016-07-01T00:00;timestamp<2016-08-01T00:00" -m memory.gb
+--------+---------------------+----------------------------+-----+-----+-----+--------+-------+------------+----------------------------+----------------------------+
| Period | Period Start | Period End | Max | Min | Avg | Sum | Count | Duration | Duration Start | Duration End |
+--------+---------------------+----------------------------+-----+-----+-----+--------+-------+------------+----------------------------+----------------------------+
| 0 | 2016-07-01T00:00:00 | 2016-07-12T09:00:40.784000 | 8.0 | 1.0 | 2.6 | 4368.0 | 1680 | 601204.119 | 2016-07-05T10:00:36.665000 | 2016-07-12T09:00:40.784000 |
+--------+---------------------+-------...

Read more...

Revision history for this message
Nadya Privalova (nprivalova) wrote :

@magicboiz, this definitely should be much faster. To understand what's going on I need the following data from you:
1. Mongo logs
2. Current operations and server status from mongo (db.currentOp() and db.runCommand( { serverStatus: 1, workingSet: 1, metrics: 0, locks: 0 } ))
3. atop logs

You can put everything in some shared folder in google drive. And a quick note: if you're a Mirantis customer, it's better to open a support request.

Revision history for this message
magicboiz (magicboiz) wrote :

Hi

here in this pastebin file you'll get almost. Atop is disabled on my system right now.

http://pastebin.com/jrkWJwyW

Can this behaviour be related with my specific pipeline.yaml? I have lowered interval to just 60sec in order to get a more precise usage data....

-
  name: meter_pipeline
  interval: 60
  meters:
    - "network.incoming.bytes"
    - "network.outgoing.bytes"
  transformers:
  publishers:
    - rpc://

-
  name: network_stats_outgoing
  interval: 60
  meters:
    - "network.outgoing.bytes"
  transformers:
    - name: "rate_of_change"
    parameters:
      target:
        name: "network.outgoing.bytes.change"
        type: "gauge"
        scale: "100.0"
        unit: "MB"
  publishers:
    - rpc://

-

Revision history for this message
magicboiz (magicboiz) wrote :

Hi again

I've digging into the AGGREGATE QUERIES that ceilometer exec against mongoDB. SHouldn't we have a compound index incorporating _all_ fields involved in the $match stage of the aggregation??

Something like this:

use ceilometer;
db.meter.createIndex({timestamp:-1, counter_name:1,project_id:1, resource_id:1},{background:true})

Changed in fuel:
status: Incomplete → Confirmed
tags: added: area-ceilometer
removed: area-mos ceilometer
Revision history for this message
Igor Degtiarov (idegtiarov) wrote :

@magicboiz

Hi! Let me explain why we've propose patch with moving from compound indexes to set of single ones. The main reason was avoid whole database scanning in requests when used only one or several fields without compound's "prefix". For example it was happened in almost all sample-list queries. The pretty same situation we had in statistic queries when "prefix" part of compound index wasn't used in query. That is why by default we create single indexes instead of compound.

From other side if we know that main use case is getting statistics and we have field that will be used in every request - indexes should be changed to compound. Your proposition is not good enough I need to emphasize that timestamp as a "prefix" field is a bad choice. Field that will have bounds in request should be the last in index creation list. So it seems that in your case when project_id and resource_id could be absent in request the best choice will be:

db.meter.createIndex({counter_name:1, resource_id:1, project_id:1, timestamp:-1}, {name: "statistic_idx", background: true})

If statistics is the main and exclusive use case other meter indexes except ttl should be dropped to release occupied memory.

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/ceilometer (9.0/mitaka)

Fix proposed to branch: 9.0/mitaka
Change author: Igor Degtiarov <email address hidden>
Review: https://review.fuel-infra.org/23859

Changed in fuel:
status: Confirmed → In Progress
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/ceilometer (9.0/mitaka)

Reviewed: https://review.fuel-infra.org/23859
Submitter: Pkgs Jenkins <email address hidden>
Branch: 9.0/mitaka

Commit: 33fd3ac67dba9decdc89ca094a376cfefaf07eb4
Author: Igor Degtiarov <email address hidden>
Date: Wed Aug 3 09:52:26 2016

[MongoDB] add compound index for statistic request

MongoDB cannot use index intersection if at least one of indexes use with
boundary query. It ceilometer statistic we always have query with interval
for timestamp field. That is why only compound index could improve
performance of search stage in aggregation pipeline.

Change-Id: Id452f22e9b9ea1198b3ccee0ba5cd292b45bdc03
Closes-Bug: #1594947
Closes-Bug: #1597679

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
milestone: 8.0-updates → 9.2
Revision history for this message
Sergii Rizvan (srizvan) wrote :

Status for 8.0-updates was wrongly set as 'Fix Committed' but actually the patch has been merged into 9.0/mitaka branch. That's why for 8.0 status should be 'Confirmed' and for 9.2 QA verification is needed in order to set 'Fix released' for 9.x.

Revision history for this message
Dmitry Belyaninov (dbelyaninov) wrote :

For 9.2
https://product-ci.infra.mirantis.net/view/9.x_acceptance/job/9.x.acceptance.ubuntu.failover_group_mongo/17/console

- Ceilometer test to list meters, alarms, resources and events (failure) Time limit exceeded while waiting for getting list of meters to finish. Please refer to OpenStack logs for more details.

Changed in fuel:
status: Fix Committed → Confirmed
Revision history for this message
Nadya Privalova (nprivalova) wrote :

We will not recommend to use Telemetry roles or Mongo in MOSes anymore. The recommendation is to use the OpenStak Telemetry plugin instead. More info http://github.com/openstack/fuel-plugin-openstack-telemetry

Roman Vyalov (r0mikiam)
Changed in fuel:
status: Confirmed → Won't Fix
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/ceilometer (mcp/newton)

Fix proposed to branch: mcp/newton
Change author: Igor Degtiarov <email address hidden>
Review: https://review.fuel-infra.org/33376

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/ceilometer (11.0/ocata)

Fix proposed to branch: 11.0/ocata
Change author: Igor Degtiarov <email address hidden>
Review: https://review.fuel-infra.org/33755

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/ceilometer (mcp/ocata)

Fix proposed to branch: mcp/ocata
Change author: Igor Degtiarov <email address hidden>
Review: https://review.fuel-infra.org/34480

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/ceilometer (mcp/newton)

Change abandoned by Ilya Tyaptin <email address hidden> on branch: mcp/newton
Review: https://review.fuel-infra.org/33376

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/ceilometer (mcp/ocata)

Change abandoned by Ilya Tyaptin <email address hidden> on branch: mcp/ocata
Review: https://review.fuel-infra.org/34480

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/ceilometer (11.0/ocata)

Change abandoned by Roman Podoliaka <email address hidden> on branch: 11.0/ocata
Review: https://review.fuel-infra.org/33755
Reason: we don't use 11.0/ocata anymore - mcp/ocata is the correct branch name

Revision history for this message
Alexey Stupnikov (astupnikov) wrote :

We no longer support MOS5.1, MOS6.0, MOS6.1
We deliver only Critical/Security fixes to MOS7.0, MOS8.0.
We deliver only High/Critical/Security fixes to MOS9.2.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.