query-samples timeout in middle scale data set

Bug #1506738 reported by ZhiQiang Fan on 2015-10-16
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceilometer
Fix Released
Undecided
ZhiQiang Fan
Kilo
Fix Released
Undecided
ZhiQiang Fan
Liberty
Fix Released
Undecided
ZhiQiang Fan

Bug Description

currently, when we call query-samples API, for sql backend, we do a inner join to get a temp table, then apply filters, orderby and limit options on it, which will drain disk space or timeout exception when data set is growing.

my test shows that timeout exception raises even when sample table row is 500K, even limit=1 is specified, and there is only one request is processing.

for i.e., the sql statement generated for "ceilometer query-samples --limit 1" is:

SELECT anon_1.id AS anon_1_id, anon_1.counter_name AS anon_1_counter_name, anon_1.counter_type AS anon_1_counter_type, anon_1.counter_unit AS anon_1_counter_unit, anon_1.counter_volume AS anon_1_counter_volume, anon_1.resource_id AS anon_1_resource_id, anon_1.source_id AS anon_1_source_id, anon_1.user_id AS anon_1_user_id, anon_1.project_id AS anon_1_project_id, anon_1.resource_metadata AS anon_1_resource_metadata, anon_1.internal_id AS anon_1_internal_id, anon_1.timestamp AS anon_1_timestamp, anon_1.message_id AS anon_1_message_id, anon_1.message_signature AS anon_1_message_signature, anon_1.recorded_at AS anon_1_recorded_at
FROM (SELECT sample.id AS id, meter.name AS counter_name, meter.type AS counter_type, meter.unit AS counter_unit, sample.volume AS counter_volume, resource.resource_id AS resource_id, resource.source_id AS source_id, resource.user_id AS user_id, resource.project_id AS project_id, resource.resource_metadata AS resource_metadata, resource.internal_id AS internal_id, sample.timestamp AS timestamp, sample.message_id AS message_id, sample.message_signature AS message_signature, sample.recorded_at AS recorded_at
FROM sample INNER JOIN meter ON sample.meter_id = meter.id INNER JOIN resource ON sample.resource_id = resource.internal_id) AS anon_1 ORDER BY anon_1.timestamp DESC
LIMIT 1

actually, it should be:

SELECT sample.id AS id, meter.name AS counter_name, meter.type AS counter_type, meter.unit AS counter_unit, sample.volume AS counter_volume, resource.resource_id AS resource_id, resource.source_id AS source_id, resource.user_id AS user_id, resource.project_id AS project_id, resource.resource_metadata AS resource_metadata, resource.internal_id AS internal_id, sample.timestamp AS timestamp, sample.message_id AS message_id, sample.message_signature AS message_signature, sample.recorded_at AS recorded_at
FROM sample INNER JOIN meter ON sample.meter_id = meter.id INNER JOIN resource ON sample.resource_id = resource.internal_id ORDER BY sample.timestamp DESC
LIMIT 1

the second one can response in less than 10 seconds instead of timeout (10 minutes)

query-alarms doesn't have such issue because its doesn't generate a temp table

ZhiQiang Fan (aji-zqfan) on 2015-10-16
Changed in ceilometer:
assignee: nobody → ZhiQiang Fan (aji-zqfan)

Fix proposed to branch: master
Review: https://review.openstack.org/235826

Changed in ceilometer:
status: New → In Progress
gordon chung (chungg) wrote :

is this mysql?

ZhiQiang Fan (aji-zqfan) wrote :

@gordc
currently, I only test with mysql backend

ZhiQiang Fan (aji-zqfan) wrote :
Download full text (3.2 KiB)

my test for postgresql shows that, postgresql (9.3) is not affected because the command query-samples can response very quick instead of timeout, which mysql couldn't

with my patch applied, the postgresql runs well as before, no significant performance impact observed (only test query-samples because the patch only affects that API):

PostgreSQL:

with log_min_duration_statement = 0 in /etc/postgresql/9.3/main/postgresql.conf
5M samples and 100K resources

before change:
$ time ceilometer query-samples -l 1000000 -f '{"and":[{"<=":{"timestamp":"2015-10-26T04:46:48"}},{">=":{"timestamp":"2015-10-26T03:46:48"}}]}'

real 0m11.183s
user 0m0.537s
sys 0m0.087s

2015-10-25 22:56:04 MDT LOG: duration: 4.177 ms statement: SELECT anon_1.id AS anon_1_id, anon_1.counter_name AS anon_1_counter_name, anon_1.counter_type AS anon_1_counter_type, anon_1.counter_unit AS anon_1_counter_unit, anon_1.counter_volume AS anon_1_counter_volume, anon_1.resource_id AS anon_1_resource_id, anon_1.source_id AS anon_1_source_id, anon_1.user_id AS anon_1_user_id, anon_1.project_id AS anon_1_project_id, anon_1.resource_metadata AS anon_1_resource_metadata, anon_1.internal_id AS anon_1_internal_id, anon_1.timestamp AS anon_1_timestamp, anon_1.message_id AS anon_1_message_id, anon_1.message_signature AS anon_1_message_signature, anon_1.recorded_at AS anon_1_recorded_at
    FROM (SELECT sample.id AS id, meter.name AS counter_name, meter.type AS counter_type, meter.unit AS counter_unit, sample.volume AS counter_volume, resource.resource_id AS resource_id, resource.source_id AS source_id, resource.user_id AS user_id, resource.project_id AS project_id, resource.resource_metadata AS resource_metadata, resource.internal_id AS internal_id, sample.timestamp AS timestamp, sample.message_id AS message_id, sample.message_signature AS message_signature, sample.recorded_at AS recorded_at
    FROM sample JOIN meter ON sample.meter_id = meter.id JOIN resource ON sample.resource_id = resource.internal_id) AS anon_1
    WHERE anon_1.timestamp <= '2015-10-26T04:46:48'::timestamp AND anon_1.timestamp >= '2015-10-26T03:46:48'::timestamp ORDER BY anon_1.timestamp DESC
     LIMIT 1000000

after change:

$ time ceilometer query-samples -l 1000000 -f '{"and":[{"<=":{"timestamp":"2015-10-26T04:46:48"}},{">=":{"timestamp":"2015-10-26T03:46:48"}}]}'

real 0m1.009s
user 0m0.500s
sys 0m0.081s

2015-10-25 22:53:08 MDT LOG: duration: 3.658 ms statement: SELECT sample.timestamp AS sample_timestamp, sample.recorded_at AS sample_recorded_at, sample.message_id AS sample_message_id, sample.message_signature AS sample_message_signature, sample.volume AS counter_volume, meter.name AS counter_name, meter.type AS counter_type, meter.unit AS counter_unit, resource.source_id AS resource_source_id, resource.user_id AS resource_user_id, resource.project_id AS resource_project_id, resource.resource_metadata AS resource_resource_metadata, resource.resource_id AS resource_resource_id
    FROM sample JOIN meter ON meter.id = sample.meter_id JOIN resource ON resource.internal_id = sample.resource_id
    WHERE sample.timestamp <= '2015-10-26T04:46:48'::timestamp AND sample.timestamp >= '2015...

Read more...

Reviewed: https://review.openstack.org/235826
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=b9bf5f1f571f21aee4347c80a5a7ac8a1c456e45
Submitter: Jenkins
Branch: master

commit b9bf5f1f571f21aee4347c80a5a7ac8a1c456e45
Author: ZhiQiang Fan <email address hidden>
Date: Fri Oct 16 02:45:45 2015 -0700

    avoid generate temporary table when query samples

    Currently, when we call query-samples API, for SQL backend, we do
    an inner join to get a temporary table, then apply filters, orderby
    and limit options on it. Such implementation will drain disk space
    or cause timeout exception when data set is growing, even with specified
    limit and filters.

    This patch applies filters and limit on the inner join itself, and
    uses a fake mapper object instead of temporary table to suit current
    query transformer.

    Change-Id: I261a2dd362ed51c16a6fa191dadcce1b45fce2e4
    Closes-Bug: #1506738

Changed in ceilometer:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/239429
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=16af61a0de1ae1d6165babac0f026c4d4c3b5cb9
Submitter: Jenkins
Branch: stable/liberty

commit 16af61a0de1ae1d6165babac0f026c4d4c3b5cb9
Author: ZhiQiang Fan <email address hidden>
Date: Fri Oct 16 02:45:45 2015 -0700

    avoid generate temporary table when query samples

    Currently, when we call query-samples API, for SQL backend, we do
    an inner join to get a temporary table, then apply filters, orderby
    and limit options on it. Such implementation will drain disk space
    or cause timeout exception when data set is growing, even with specified
    limit and filters.

    This patch applies filters and limit on the inner join itself, and
    uses a fake mapper object instead of temporary table to suit current
    query transformer.

    Change-Id: I261a2dd362ed51c16a6fa191dadcce1b45fce2e4
    Closes-Bug: #1506738
    (cherry picked from commit b9bf5f1f571f21aee4347c80a5a7ac8a1c456e45)

Reviewed: https://review.openstack.org/239431
Committed: https://git.openstack.org/cgit/openstack/ceilometer/commit/?id=1cbe702e846d0dd0ffd53b67e257452c799c9e1d
Submitter: Jenkins
Branch: stable/kilo

commit 1cbe702e846d0dd0ffd53b67e257452c799c9e1d
Author: ZhiQiang Fan <email address hidden>
Date: Fri Oct 16 02:45:45 2015 -0700

    avoid generate temporary table when query samples

    Currently, when we call query-samples API, for SQL backend, we do
    an inner join to get a temporary table, then apply filters, orderby
    and limit options on it. Such implementation will drain disk space
    or cause timeout exception when data set is growing, even with specified
    limit and filters.

    This patch applies filters and limit on the inner join itself, and
    uses a fake mapper object instead of temporary table to suit current
    query transformer.

    Change-Id: I261a2dd362ed51c16a6fa191dadcce1b45fce2e4
    Closes-Bug: #1506738
    (cherry picked from commit b9bf5f1f571f21aee4347c80a5a7ac8a1c456e45)

This issue was fixed in the openstack/ceilometer 6.0.0.0b1 development milestone.

Thierry Carrez (ttx) on 2015-12-03
Changed in ceilometer:
status: Fix Committed → Fix Released

This issue was fixed in the openstack/ceilometer 5.0.1 release.

Liusheng (liusheng) on 2015-12-14
Changed in ceilometer:
milestone: none → mitaka-1
ZhiQiang Fan (aji-zqfan) wrote :

fixed in 5.0.1

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers