service_statuses table running full in Designate database

Bug #1768824 reported by Dmitry Galkin on 2018-05-03
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Designate
High
Erik Olof Gunnar Andersson

Bug Description

Hi,

The service_statuses table in Designate database is running full of records in our deployment:

MariaDB [designate]> select count(*) from service_statuses;

+----------+
| count(*) |
+----------+
| 24474342 |
+----------+
1 row in set (7 min 19.09 sec)

We got millions of rows in just couple of month. The problem is that the same services running on the same hosts create new record (instead of updating existing) during status report to Designate.
This is how it looks in DB:

MariaDB [designate]> select * from service_statuses;
+----------------------------------+---------------------+---------------------+--------------+------------------------------------+---------------------+--------+-------+--------------+
| id | created_at | updated_at | service_name | hostname | heartbeated_at | status | stats | capabilities |
+----------------------------------+---------------------+---------------------+--------------+------------------------------------+---------------------+--------+-------+--------------+
| 0dde2b5f228549d5995cb0338841bd50 | 2018-05-02 12:06:03 | NULL | producer | designate-producer-855855776-cr8d9 | 2018-05-02 12:06:03 | UP | {} | {} |
| 0e311d3000d8403d97066eba619490a3 | 2018-05-02 12:05:14 | NULL | api | designate-api-2042646259-6090v | 2018-05-02 12:05:13 | UP | {} | {} |
| 168448cd97cd428ea19318243570482c | 2018-05-02 12:05:48 | NULL | producer | designate-producer-855855776-cr8d9 | 2018-05-02 12:05:48 | UP | {} | {} |
| 1685d7f80d8c4f75b052680e5e2f40ae | 2018-05-02 12:05:59 | NULL | api | designate-api-2042646259-6090v | 2018-05-02 12:05:58 | UP | {} | {} |
| 192275eb33854b4091b981b0c32d04f7 | 2018-05-02 12:05:41 | NULL | worker | designate-worker-3446544-7fzqx | 2018-05-02 12:05:35 | UP | {} | {} |
| 1e465011f21f47f096b54005675e8011 | 2018-05-02 12:05:25 | NULL | mdns | designate-mdns-4198843580-lw6s2 | 2018-05-02 12:05:25 | UP | {} | {} |
| 22e0ab87b3cd4228bc191e49923d13ba | 2018-05-02 12:05:58 | NULL | producer | designate-producer-855855776-cr8d9 | 2018-05-02 12:05:58 | UP | {} | {} |
| 284d6b6bf53f47358fa14750d0c2a181 | 2018-05-02 12:05:18 | NULL | producer | designate-producer-855855776-cr8d9 | 2018-05-02 12:05:18 | UP | {} | {} |
| 3617ac740b9945c09f65f4b8cf0a72b5 | 2018-05-02 12:05:25 | NULL | api | designate-api-2042646259-drzws | 2018-05-02 12:05:25 | UP | {} | {} |
| 3880157829f946398159f104b13a066b | 2018-05-02 12:06:00 | NULL | mdns | designate-mdns-4198843580-lw6s2 | 2018-05-02 12:06:00 | UP | {} | {} |

The respective part of the code is:
https://github.com/openstack/designate/blob/master/designate/central/service.py#L2879-L2898

It is not exactly clear to me why the id is needed to identify the service. The 'hostname' and 'service_name' seem to be enough - it is rather unlikely that there are two api services are running on the same host..

So, I've removed the id part:

        if service_status.obj_attr_is_set('id'):
            criterion["id"] = service_status.id

and cleaned up the service_statuses table, now it looks fine:

MariaDB [designate]> select * from service_statuses;
+----------------------------------+---------------------+---------------------+--------------+-------------------------------------+---------------------+--------+-------+--------------+
| id | created_at | updated_at | service_name | hostname | heartbeated_at | status | stats | capabilities |
+----------------------------------+---------------------+---------------------+--------------+-------------------------------------+---------------------+--------+-------+--------------+
| 1e89d10b0ddb459f8a9522ebd62fa629 | 2018-05-02 13:03:21 | 2018-05-02 13:03:37 | worker | designate-worker-2851226431-9274c | 2018-05-02 13:03:36 | UP | {} | {} |
| 8d29d50270ee48999b78e90c1b88acf8 | 2018-05-02 13:03:23 | 2018-05-02 13:03:38 | api | designate-api-1699569558-9rvg6 | 2018-05-02 13:03:38 | UP | {} | {} |
| 9602ccf55d204c7b86277e21af3d739f | 2018-05-02 13:03:18 | 2018-05-02 13:03:38 | producer | designate-producer-3892623035-6vwt0 | 2018-05-02 13:03:38 | UP | {} | {} |
| cec705e082c542cdbfa83b9ed0c2b438 | 2018-05-02 13:03:19 | 2018-05-02 13:03:39 | api | designate-api-1699569558-r018m | 2018-05-02 13:03:39 | UP | {} | {} |
| ddd48f13b67448d2949ced39d76e2958 | 2018-05-02 13:03:20 | 2018-05-02 13:03:40 | central | designate-central-2632925199-wwj7j | 2018-05-02 13:03:40 | UP | {} | {} |
| ef0ae921a78d4ac090dc74a1dc686921 | 2018-05-02 13:03:21 | 2018-05-02 13:03:41 | mdns | designate-mdns-4162010419-pvclv | 2018-05-02 13:03:41 | UP | {} | {} |
+----------------------------------+---------------------+---------------------+--------------+-------------------------------------+---------------------+--------+-------+--------------+
6 rows in set (0.00 sec)

Our Designate version is Pike, but the code seems to be the same in master branch.

Dmitry Galkin (galkindmitrii) wrote :

Here are the examples of service_statuses table in a more readable way:

http://paste.openstack.org/show/720282/

The issue here is that the hostname and service_name under service_statuses are not unique. If for any reason a duplicate entry for a service gets created, the call to update will always fail, as it's excepting only a single database entry.

https://github.com/openstack/designate/blob/fb5c1a97797c66292151549b82519a682b6300b6/designate/storage/impl_sqlalchemy/migrate_repo/versions/097_add_services.py
https://github.com/openstack/designate/blob/fb5c1a97797c66292151549b82519a682b6300b6/designate/storage/impl_sqlalchemy/__init__.py#L1771

We need to update the schema to only allow a unique entry per hostname/service_name.

Changed in designate:
status: New → Confirmed
importance: Undecided → High
Changed in designate:
assignee: nobody → Erik Olof Gunnar Andersson (eandersson)
status: Confirmed → In Progress

Reviewed: https://review.openstack.org/568372
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=1924abff40341337bd00659ff01a14ea91701475
Submitter: Zuul
Branch: master

commit 1924abff40341337bd00659ff01a14ea91701475
Author: Erik Olof Gunnar Andersson <email address hidden>
Date: Mon May 14 13:50:03 2018 -0700

    Add unique constraint to service_statuses

    In the current implementation, if for any reason a duplicate
    service entry gets created, the call to update that service
    will fail endlessly, and instead cause the service to create
    new entries everytime update_service_status gets called. Causing
    it to fill the database with duplicate entries.

    This patch adds a unique constraint to the service_statuses
    table based on the service_name and hostname, to ensure that
    this cannot happen.

    In addition we add a new test to the storage driver and further
    expanded the central service test coverage.

    Change-Id: I307a8f7dd8b8a83effa447a846db3288efa32dba
    Closes-Bug: #1768824

Changed in designate:
status: In Progress → Fix Released

This issue was fixed in the openstack/designate 7.0.0.0b3 development milestone.

Reviewed: https://review.openstack.org/604430
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=1842802de4a05a057eb7f3c94bd01a4b1f0b9143
Submitter: Zuul
Branch: master

commit 1842802de4a05a057eb7f3c94bd01a4b1f0b9143
Author: Ben Nemec <email address hidden>
Date: Wed Sep 19 20:34:26 2018 +0000

    Add designate-status command for upgrade checks

    This checks for duplicate entries in the service_statuses table
    which will be an error after change
    I307a8f7dd8b8a83effa447a846db3288efa32dba.

    Related-Bug: 1768824

    Story: 2003657
    Task: 26127

    Change-Id: Ie0350b034f0eb03749138aadd0951d30073214c0
    Co-authored-by: Doug Hellmann <email address hidden>

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers