get_events inefficient in SQLAlchemy

Bug #1267865 reported by Thomas Maddox
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Ceilometer
Invalid
Medium
Unassigned
python-ceilometerclient
Won't Fix
Medium
Unassigned

Bug Description

The latest updates to impl_sqlalchemy.py introduced a more complex version of the get_events method in which we can filter on multiple traits. Unfortunately it appears that an inefficiency was introduced that causes the query to have to do a full join between Trait and TraitType before going any further. This was taking well over 3 hours to run in my testing on a 70M event MySQL database with ~20 traits per event (~1.4B traits). The produced query looks like this (from innotop):

SELECT trait.id AS trait_id,
       trait.trait_type_id AS trait_trait_type_id,
       trait.t_string AS trait_t_string,
       trait.t_float AS trait_t_float,
       trait.t_int AS trait_t_int,
       trait.t_datetime AS trait_t_datetime,
       trait.event_id AS trait_event_id
FROM trait
INNER JOIN trait_type ON trait.trait_type_id = trait_type.id
INNER JOIN
  (SELECT event.id AS id,
          event.message_id AS message_id,
          event.generated AS GENERATED,
          event.event_type_id AS event_type_id
   FROM event
   INNER JOIN event_type ON event_type.id = event.event_type_id
   AND event_type.`desc` = 'compute.instance.resize.prep.end'
   INNER JOIN
     (SELECT trait.event_id AS event_id
      FROM trait
      INNER JOIN trait_type ON trait.trait_type_id = trait_type.id
      AND trait_type.`desc` = 'progress') AS anon_2 ON event.id = anon_2.event_id
   WHERE event.GENERATED >= '1393319531'
     AND event.GENERATED <= '1393320519') AS anon_1 ON trait.event_id = anon_1.id

The following subquery appears to be the main culprit and was taking hours (~3) to run. It had to find all 'progress' traits, which was designed to be an attribute on most every event in the load test. This could have just as easily been 'tenant_id'.

(SELECT trait.event_id AS event_id
FROM trait
INNER JOIN trait_type ON trait.trait_type_id = trait_type.id
AND trait_type.`desc` = 'progress')

Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

A couple query options that still support multi-trait filtering:

SELECT trait.trait_type_id,
       trait.t_string,
       trait.t_int,
       trait.t_float,
       trait.t_datetime,
       trait.event_id,
       event.generated,
       event.message_id,
       trait_type.`desc`,
       trait_type.data_type,
       event_type.`desc`
FROM trait
INNER JOIN event ON trait.event_id = event.id
INNER JOIN
  (SELECT t.event_id AS event_id
   FROM trait t
   INNER JOIN
     (SELECT e.id AS id
      FROM event e
      INNER JOIN event_type et ON e.event_type_id = et.id
      WHERE e.generated BETWEEN 1392061319 AND 1392320519) AS events ON events.id = t.event_id
   INNER JOIN trait_type tt ON tt.id = t.trait_type_id
   AND
      ((tt.`desc` = 'progress' AND t.t_int = 12)
        OR
      (tt.`desc` = 'state' AND t.t_string = 'state_12')
        OR
      (tt.`desc` = 'old_task' AND t.t_string = 'task_0'))
   GROUP BY (t.event_id) HAVING COUNT(t.event_id) = 3) AS filtered_events ON event.id = filtered_events.event_id
INNER JOIN event_type ON event_type.id = event.event_type_id
INNER JOIN trait_type ON trait_type.id = trait.trait_type_id;

-------------------------------------------------------------------------------------------------------------

SELECT *
FROM trait
INNER JOIN
  ( SELECT e.id AS id,
           e.message_id AS message_id,
           e.generated AS GENERATED,
           e.event_type_id AS event_type_id,
           count(*) as count
   FROM event e
   INNER JOIN trait t ON e.id = t.event_id
   INNER JOIN event_type et ON e.event_type_id = et.id
   AND et.`desc` = 'compute.instance.resize.prep.end'
   INNER JOIN trait_type tt ON t.trait_type_id = tt.id
   AND ((tt.`desc` = 'progress' AND t.t_int = 12) OR (tt.`desc` = 'uuid' AND t.t_string = '5b899a3b-dfe9-4790-bfe5-2b511d252351'))
   WHERE e.generated BETWEEN 1393061319 AND 1393320519
   GROUP BY e.id, e.message_id, e.generated, e.event_type_id) AS anon_1 ON anon_1.id = trait.event_id
WHERE anon_1.count = 2;

Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

Looking closely, those are the same query, the second is just the inner query for the first.

Revision history for this message
gordon chung (chungg) wrote :

hey thomas, is this still a issue? asking before i try to build a 70M record table. :)

Julien Danjou (jdanjou)
Changed in ceilometer:
status: New → Triaged
importance: Undecided → Medium
gordon chung (chungg)
Changed in ceilometer:
assignee: nobody → gordon chung (chungg)
Revision history for this message
gordon chung (chungg) wrote :

adding ceilometerclient as affected. testing against 5000 db, it seems a majority of the time is spent outputting the data

timing checks in code (~1min on server):

2014-05-01 14:54:12.545 18576 INFO urllib3.connectionpool [-] Starting new HTTP connection (1):
2014-05-01 14:54:13.657 18576 WARNING ceilometer.storage.impl_sqlalchemy [-] ---------------RUN TIME TO FETCH EVENT: 0:00:00.954279
2014-05-01 14:54:34.680 18576 WARNING ceilometer.storage.impl_sqlalchemy [-] ---------------RUN TIME TO FETCH TRAITS: 0:00:21.022390
2014-05-01 14:55:13.218 18576 WARNING ceilometer.storage.impl_sqlalchemy [-] --------------- TOTAL Events: 5000, RUNTIME TO YIELD RESULTS: 0:00:38.537616
2014-05-01 14:55:13.226 18576 WARNING ceilometer.api.controllers.v2 [-] ---------------------- RUNTIME FROM API REQ to RESP: 0:01:00.523782

running ceilometer event-list (~3.5min to run command):

chungg@vienna:~/devstack$ time ceilometer event-list > temp_out.txt
real 3m24.613s
user 2m3.195s

Changed in ceilometer:
status: Triaged → In Progress
gordon chung (chungg)
Changed in python-ceilometerclient:
importance: Undecided → Medium
Eoghan Glynn (eglynn)
Changed in ceilometer:
milestone: none → juno-2
Eoghan Glynn (eglynn)
Changed in ceilometer:
milestone: juno-2 → juno-3
gordon chung (chungg)
Changed in ceilometer:
assignee: gordon chung (chungg) → nobody
Eoghan Glynn (eglynn)
Changed in ceilometer:
milestone: juno-3 → none
Revision history for this message
Dina Belova (dbelova) wrote :

Ceilometer bug still in progress?

Revision history for this message
Thomas Maddox (thomas-maddox) wrote :

Something tells me this isn't actively being worked on at the moment.

Revision history for this message
gordon chung (chungg) wrote :

whoops forgot to switch back.... to triaged.

i'm not actively tackling this bug specifically. nealph said hp would test out: https://review.openstack.org/#/c/91501/

i honestly don't know if it'll help this bug specifically. Dina, if you'd like to tackle this issue for sql, feel free to take it.

Changed in ceilometer:
status: In Progress → Triaged
Changed in python-ceilometerclient:
status: New → Triaged
Revision history for this message
gordon chung (chungg) wrote :

this is mildly improved by this: https://review.openstack.org/#/c/130869/

Revision history for this message
gordon chung (chungg) wrote :

closing ceilometer related item. we can reopen new bug if applicable

Changed in ceilometer:
status: Triaged → Invalid
gordon chung (chungg)
Changed in python-ceilometerclient:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.