Bad performance when get traits for event with mysql backend

Bug #1825477 reported by leehom
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Panko
New
Undecided
Unassigned

Bug Description

I'm using Queens version panko. And as it's deployed by kolla, panko_api is running in container.

# rpm -qa | grep panko
python-panko-4.0.2-86.el7.noarch
openstack-panko-common-4.0.2-86.el7.noarch
openstack-panko-api-4.0.2-86.el7.noarch

I found get events from Panko with rest API is very slow.
And after research, the issue is narrow down to below part of source code.

https://github.com/openstack/panko/blob/7fe341001cb3132389d62c99d5bc8a790c366958/panko/storage/impl_sqlalchemy.py#L345
Almost all time is consumed by trait_q.order_by(models.TraitDatetime.key).all().
```
            for id_, key, t_date, t_int, t_float, t_text in (
                    trait_q.order_by(models.TraitDatetime.key)).all():
                if t_int is not None:
                    dtype = api_models.Trait.INT_TYPE
                    val = t_int
                elif t_float is not None:
                    dtype = api_models.Trait.FLOAT_TYPE
                    val = t_float
                elif t_date is not None:
                    dtype = api_models.Trait.DATETIME_TYPE
```

Below is the Request I used.
```
{
    "q": [
        {
            "field": "all_tenants",
            "type": "string",
            "value": "True"
        },
        {
            "field": "end_timestamp",
            "value": "2019-04-19T23:59:59",
            "type": "string",
            "op": "le"
        },
        {
            "field": "start_timestamp",
            "value": "2019-04-18T00:00:00",
            "type": "string",
            "op": "ge"
        }
    ]
}
```

When I call get API with above request, it returned in around 7s.

I added some logs to analyze this problem, as you can see from below log.
As you can see, almost all time is consumed when get traits for events.

2019-04-19 07:23:11.368 17 INFO panko.storage.impl_sqlalchemy [-] Session begin <EventFilter(start_timestamp: 2019-04-18 00:00:00, end_timestamp: 2019-04-19 23:59:59, event_type: None, traits: [])>
2019-04-19 07:23:11.370 17 INFO panko.storage.impl_sqlalchemy [-] Start get event
2019-04-19 07:23:11.540 17 INFO panko.storage.impl_sqlalchemy [-] Get event complete
**2019-04-19 07:23:11.550 17 INFO panko.storage.impl_sqlalchemy [-] Start get trait**
**2019-04-19 07:23:17.240 17 INFO panko.storage.impl_sqlalchemy [-] Get trait complete**
2019-04-19 07:23:17.241 17 INFO panko.storage.impl_sqlalchemy [-] Session complete

I'm not sure if there's ant methods can be used the improve the performance.
Or it's just a bug, because the time is almost same no matter I get ten events or hundred events at once.

leehom (feli5)
description: updated
Revision history for this message
Mike Bayer (zzzeek) wrote :
Download full text (6.5 KiB)

Hi there -

SQLAlchemy maintainer here. We are seeing extremely slow performance in this same area, such that the MySQL query times out and the service fails. Additionally, it pushed the CPU use of MySQL to 100% per cpu which is something I've never seen before. If the MySQL backend is recommended for use then I would think this should be high priority for this backend as it is a blocking issue when it causes MySQL itself to lose its ability to respond to other queries.

The reason seems to be that the query is too complicated and needs to be rewritten, most likely as a series of smaller and simpler queries. It has lots of deep nesting, subquery correlation, and UNIONs, enough so that it can cause MySQL to grind to a halt, making this more of a blocker. The code at https://github.com/openstack/panko/blob/7fe341001cb3132389d62c99d5bc8a790c366958/panko/storage/impl_sqlalchemy.py#L345 is explicit in it's request for this level of nesting and complexity so this is the area that would need to be reworked somehow.

Here's a full dump of the SQL we've observed, which I've formatted for readability:

SELECT
 anon_1.trait_datetime_event_id AS anon_1_trait_datetime_event_id, anon_1.trait_datetime_key AS anon_1_trait_datetime_key,
 anon_1.trait_datetime_value AS anon_1_trait_datetime_value, anon_1.anon_2 AS anon_1_anon_2,
 anon_1.anon_3 AS anon_1_anon_3, anon_1.anon_4 AS anon_1_anon_4
FROM (
 SELECT trait_datetime.event_id AS trait_datetime_event_id, trait_datetime.`key` AS trait_datetime_key,
 trait_datetime.value AS trait_datetime_value,
 CAST(NULL AS SIGNED INTEGER) AS anon_2, NULL AS anon_3, CAST(NULL AS CHAR(255)) AS anon_4 FROM trait_datetime
 WHERE EXISTS (
  SELECT * FROM (
   SELECT event.id AS id FROM event INNER JOIN event_type ON event_type.id = event.event_type_id
   WHERE (
    EXISTS (
     SELECT * FROM (
      SELECT anon_7.trait_text_event_id AS trait_text_event_id
      FROM (
       SELECT trait_text.event_id AS trait_text_event_id FROM trait_text
       WHERE NOT (
        EXISTS (
         SELECT * FROM (
          SELECT trait_text.event_id AS event_id FROM trait_text
          WHERE trait_text.`key` = %(key_1)s
         ) AS anon_8
        WHERE trait_text.event_id = anon_8.event_id
       )
      )
      UNION
      SELECT trait_text.event_id AS trait_text_event_id
      FROM trait_text, event
      WHERE trait_text.`key` = %(key_2)s AND trait_text.value = %(value_1)s AND event.id = trait_text.event_id
     ) AS anon_7
    ) AS anon_6
    WHERE event.id = anon_6.trait_text_event_id)
   ) AND event.`generated` >= %(generated_1)s
   ORDER BY event.`generated` ASC, event.message_id ASC
    LIMIT %(param_1)s
   ) AS anon_5
  WHERE trait_datetime.event_id = anon_5.id
 )
 UNION ALL
 SELECT
  trait_int.event_id AS trait_int_event_id, trait_int.`key` AS trait_int_key,
  NULL AS anon_9, trait_int.value AS trait_int_value,
  NULL AS anon_10, NULL AS anon_11 FROM trait_int
  WHERE EXISTS (
   SELECT * FROM (
    SELECT event.id AS id FROM event
    INNER JOIN event_type ON event_type.id = event.event_type_id
    WHERE (
     EXISTS (
      SELECT * FROM (
       SELECT anon_7.trait_text_event_id AS tra...

Read more...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.