The latest updates to impl_sqlalchemy.py introduced a more complex version of the get_events method in which we can filter on multiple traits. Unfortunately it appears that an inefficiency was introduced that causes the query to have to do a full join between Trait and TraitType before going any further. This was taking well over 3 hours to run in my testing on a 70M event MySQL database with ~20 traits per event (~1.4B traits). The produced query looks like this (from innotop):
SELECT trait.id AS trait_id,
trait.trait_type_id AS trait_trait_type_id,
trait.t_string AS trait_t_string,
trait.t_float AS trait_t_float,
trait.t_int AS trait_t_int,
trait.t_datetime AS trait_t_datetime,
trait.event_id AS trait_event_id
FROM trait
INNER JOIN trait_type ON trait.trait_type_id = trait_type.id
INNER JOIN
(SELECT event.id AS id,
event.message_id AS message_id,
event.generated AS GENERATED,
event.event_type_id AS event_type_id
FROM event
INNER JOIN event_type ON event_type.id = event.event_type_id
AND event_type.`desc` = 'compute.instance.resize.prep.end'
INNER JOIN
(SELECT trait.event_id AS event_id
FROM trait
INNER JOIN trait_type ON trait.trait_type_id = trait_type.id
AND trait_type.`desc` = 'progress') AS anon_2 ON event.id = anon_2.event_id
WHERE event.GENERATED >= '1393319531'
AND event.GENERATED <= '1393320519') AS anon_1 ON trait.event_id = anon_1.id
The following subquery appears to be the main culprit and was taking hours (~3) to run. It had to find all 'progress' traits, which was designed to be an attribute on most every event in the load test. This could have just as easily been 'tenant_id'.
(SELECT trait.event_id AS event_id
FROM trait
INNER JOIN trait_type ON trait.trait_type_id = trait_type.id
AND trait_type.`desc` = 'progress')
A couple query options that still support multi-trait filtering:
SELECT trait.trait_ type_id,
trait.t_ string,
trait.t_ float,
trait.t_ datetime,
trait.event_ id,
event.generated ,
event.message_ id,
trait_type. `desc`,
trait_type. data_type,
event_type. `desc` events. event_id type_id;
trait.t_int,
FROM trait
INNER JOIN event ON trait.event_id = event.id
INNER JOIN
(SELECT t.event_id AS event_id
FROM trait t
INNER JOIN
(SELECT e.id AS id
FROM event e
INNER JOIN event_type et ON e.event_type_id = et.id
WHERE e.generated BETWEEN 1392061319 AND 1392320519) AS events ON events.id = t.event_id
INNER JOIN trait_type tt ON tt.id = t.trait_type_id
AND
((tt.`desc` = 'progress' AND t.t_int = 12)
OR
(tt.`desc` = 'state' AND t.t_string = 'state_12')
OR
(tt.`desc` = 'old_task' AND t.t_string = 'task_0'))
GROUP BY (t.event_id) HAVING COUNT(t.event_id) = 3) AS filtered_events ON event.id = filtered_
INNER JOIN event_type ON event_type.id = event.event_type_id
INNER JOIN trait_type ON trait_type.id = trait.trait_
------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ------- ----
SELECT *
e.message_ id AS message_id,
e.generated AS GENERATED,
e.event_ type_id AS event_type_id, instance. resize. prep.end' dfe9-4790- bfe5-2b511d2523 51'))
FROM trait
INNER JOIN
( SELECT e.id AS id,
count(*) as count
FROM event e
INNER JOIN trait t ON e.id = t.event_id
INNER JOIN event_type et ON e.event_type_id = et.id
AND et.`desc` = 'compute.
INNER JOIN trait_type tt ON t.trait_type_id = tt.id
AND ((tt.`desc` = 'progress' AND t.t_int = 12) OR (tt.`desc` = 'uuid' AND t.t_string = '5b899a3b-
WHERE e.generated BETWEEN 1393061319 AND 1393320519
GROUP BY e.id, e.message_id, e.generated, e.event_type_id) AS anon_1 ON anon_1.id = trait.event_id
WHERE anon_1.count = 2;