Gnocchi metrics database seems to increase without bound for heavily used cloud

Bug #1848049 reported by Steven Parker
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Gnocchi Charm
Triaged
Wishlist
Unassigned

Bug Description

With a cloud being used with heavy heat deployments the metrics database seems to have increased to about 20G.

Revision history for this message
Steven Parker (sbparke) wrote :

We found the following on a production cloud.
This causes gnocchi to alarm on our cloud and it seems on similar clouds that have this charm deployed.

The last dataset we collected was from ~20:30-11:30. On 21:55:48, all designate servers got MySQL errors. From the MySQL slow query log file, its possible to see that from 21:04 to 21:16 MySQL received 40 queries from gnocchi, each query [1]doing joins and returning ~500MB each and taking as much as 16min to finish, in a total of 20GB of data. The next thing logged on this file is 21:54 with a lot of slow
queries from other services specially designate. Also around this time, MySQL logs several warning messages saying "InnoDB: Warning: difficult to find free blocks in the buffer pool (338 search iterations)!". So, I believe this is swamping the IOs from the disk, consuming almost all CPUs from the server and causing other queries to timeout.

Revision history for this message
Steven Parker (sbparke) wrote :

gnocchi archive-policy list
+--------+-------------+-----------------------------------------------------------------------+---------------------------------+
| name | back_window | definition | aggregation_methods |
+--------+-------------+-----------------------------------------------------------------------+---------------------------------+
| bool | 3600 | - timespan: 365 days, 0:00:00, points: 31536000, granularity: 0:00:01 | last |
| high | 0 | - timespan: 1:00:00, points: 3600, granularity: 0:00:01 | max, mean, count, sum, std, min |
| | | - timespan: 7 days, 0:00:00, points: 10080, granularity: 0:01:00 | |
| | | - timespan: 365 days, 0:00:00, points: 8760, granularity: 1:00:00 | |
| low | 0 | - timespan: 30 days, 0:00:00, points: 8640, granularity: 0:05:00 | max, mean, count, sum, std, min |
| medium | 0 | - timespan: 7 days, 0:00:00, points: 10080, granularity: 0:01:00 | max, mean, count, sum, std, min |
| | | - timespan: 365 days, 0:00:00, points: 8760, granularity: 1:00:00 | |
+--------+-------------+-----------------------------------------------------------------------+---------------------------------+

Revision history for this message
Steven Parker (sbparke) wrote :

Executing the following:

SELECT metric.id AS metric_id, metric.archive_policy_name AS metric_archive_policy_name, metric.creator AS metric_creator, metric.resource_id AS metric_resource_id, metric.name AS metric_name, metric.unit AS metric_unit, metric.status AS metric_status, archive_policy_1.name AS archive_policy_1_name, archive_policy_1.back_window AS archive_policy_1_back_window, archive_policy_1.definition AS archive_policy_1_definition, archive_policy_1.aggregation_methods AS archive_policy_1_aggregation_methods
FROM metric LEFT OUTER JOIN archive_policy AS archive_policy_1 ON archive_policy_1.name = metric.archive_policy_name
WHERE metric.status = 'active' ORDER BY metric.id ASC;

Generated:
2482004 lines

Revision history for this message
Steven Parker (sbparke) wrote :
Download full text (7.2 KiB)

This occurred on Xenial/Queens gnocchi charm version version 17

juju config gnocchi
application: gnocchi
application-config:
  trust:
    default: false
    description: Does this application have access to trusted credentials
    source: default
    type: bool
    value: false
charm: gnocchi
settings:
  action-managed-upgrade:
    default: false
    description: |
      If True enables openstack upgrades for this charm via juju actions.
      You will still need to set openstack-origin to the new repository but
      instead of an upgrade running automatically across all units, it will
      wait for you to execute the openstack-upgrade action for this charm on
      each unit. If False it will revert to existing behavior of upgrading
      all units on config change.
    source: default
    type: boolean
    value: false
  debug:
    default: false
    description: Enable debug logging
    source: default
    type: boolean
    value: false
  dns-ha:
    default: false
    description: |
      Use DNS HA with MAAS 2.0. Note if this is set do not set vip settings
      below.
    source: default
    type: boolean
    value: false
  haproxy-client-timeout:
    description: |
      Client timeout configuration in ms for haproxy, used in HA
      configurations. If not provided, default value of 90000ms is used.
    source: unset
    type: int
  haproxy-connect-timeout:
    description: |
      Connect timeout configuration in ms for haproxy, used in HA
      configurations. If not provided, default value of 9000ms is used.
    source: unset
    type: int
  haproxy-queue-timeout:
    description: |
      Queue timeout configuration in ms for haproxy, used in HA
      configurations. If not provided, default value of 9000ms is used.
    source: unset
    type: int
  haproxy-server-timeout:
    description: |
      Server timeout configuration in ms for haproxy, used in HA
      configurations. If not provided, default value of 90000ms is used.
    source: unset
    type: int
  openstack-origin:
    default: distro
    description: |
      Repository from which to install OpenStack.

      May be one of the following:

        distro (default)
        ppa:somecustom/ppa (PPA name must include OpenStack Release)
        deb url sources entry|key id
        or a supported Ubuntu Cloud Archive pocket.

      Supported Ubuntu Cloud Archive pockets include:

        cloud:trusty-liberty
        cloud:trusty-juno
        cloud:trusty-kilo
        cloud:trusty-liberty
        cloud:trusty-mitaka

      Note that updating this setting to a source that is known to
      provide a later version of OpenStack will trigger a software
      upgrade.
    source: user
    type: string
    value: cloud:xenial-queens
  os-admin-hostname:
    description: |
      The hostname or address of the admin endpoints created in the keystone
      identity provider.
      .
      This value will be used for admin endpoints. For example, an
      os-admin-hostname set to 'api-admin.example.com' with ssl enabled
      will create the following endpoint for neutron-api:
      .
      https://api-admin.example.com:9696/
    source: user
    type: string
    value: gnocchi-inter...

Read more...

Changed in charm-gnocchi:
milestone: none → 19.10
Revision history for this message
Erlon R. Cruz (sombrafam) wrote :

This is a bug in the gnocchi code. I have reported it in gnocchi github:
https://github.com/gnocchixyz/gnocchi/issues/1050

David Ames (thedac)
Changed in charm-gnocchi:
milestone: 19.10 → 20.01
Revision history for this message
Steven Parker (sbparke) wrote :

Is there a solution that we can provide in the short term for user space?
Reducing metric retention times etc..

Thanks,
   Steven

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

It's an upstream issue where the using applications (services in openstack) don't use Gnocchi 'properly' (according to the upstream bug https://github.com/gnocchixyz/gnocchi/issues/1050).

Changed in charm-gnocchi:
importance: Undecided → Wishlist
milestone: 20.01 → none
status: New → Triaged
Revision history for this message
Chris Sanders (chris.sanders) wrote :

I'm marking this field high, upstream hasn't made any improvements and it's being seen on multiple environments. I believe it's time we re-evaluate solutions that the charm can provide.

tags: added: gnocchi-support sts
Revision history for this message
Billy Olsen (billy-olsen) wrote :

Removed field-high as this will require roadmap work and effort to resolve this.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.