Unclear 'dns service' statuses and states; need to upgrade configuration

Bug #1761503 reported by Annie Melen
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Designate
Triaged
High
Unassigned

Bug Description

Hello!

Recently I've deployed Designate on two nodes and started to check/test it via cli (python-designateclient). What I should notice:

root@control01:~# openstack dns service list
+--------------------------------------+---------------+--------------+--------+-------+--------------+
| id | hostname | service_name | status | stats | capabilities |
+--------------------------------------+---------------+--------------+--------+-------+--------------+
| 3871fdbf-4901-4eaf-9cfb-5505a5870ad8 | control01-api | api | UP | - | - |
| aafee96e-e0c2-4d01-b393-17f726b97955 | control02-api | api | UP | - | - |
| 001140fc-dd2e-42b9-a63d-cf7702a5d22f | control01-api | central | UP | - | - |
| bb105334-f202-48f4-9ab7-715ed9e8d3b1 | control02-api | central | UP | - | - |
| dc0fe0fb-1a98-4adf-93e8-ff6b533a6737 | control01-api | worker | UP | - | - |
| 7237e35d-bf25-4d17-9b83-20494f8523c7 | control02-api | worker | UP | - | - |
| 8aeb091e-171d-49cb-8a1f-e2e261a0d85a | control01-api | mdns | UP | - | - |
| f126bb73-02cb-4aad-9445-923b4e994632 | control02-api | mdns | UP | - | - |
+--------------------------------------+---------------+--------------+--------+-------+--------------+

[1] Output shows not all of services actually installed on the nodes. In my case, I've also have designate-producer service, which is missed in the table.
[2] 'Status' column is not exactly service status - 'active', 'stopped', 'failed', etc. It looks like 'enabled/disabled' status, because even I've stopped all designate services on two nodes, command always shows me status 'UP'...
For example, this otput is looking much clearly:

root@control01:~# openstack compute service list
+----+------------------+---------------+----------+---------+-------+----------------------------+
| ID | Binary | Host | Zone | Status | State | Updated At |
+----+------------------+---------------+----------+---------+-------+----------------------------+
| 2 | nova-scheduler | control02-api | internal | enabled | up | 2018-04-05T12:27:03.000000 |
| 4 | nova-scheduler | control01-api | internal | enabled | up | 2018-04-05T12:27:04.000000 |
| 5 | nova-consoleauth | control01-api | internal | enabled | up | 2018-04-05T12:27:08.000000 |
| 6 | nova-consoleauth | control02-api | internal | enabled | down | 2018-04-05T11:45:28.000000 |
| 17 | nova-conductor | control01-api | internal | enabled | up | 2018-04-05T12:27:07.000000 |
| 19 | nova-conductor | control02-api | internal | enabled | up | 2018-04-05T12:27:06.000000 |
| 20 | nova-compute | compute02-api | nova | enabled | up | 2018-04-05T12:27:07.000000 |
| 23 | nova-compute | compute01-api | nova | enabled | up | 2018-04-05T12:27:07.000000 |
+----+------------------+---------------+----------+---------+-------+----------------------------+

From [2] comes that:
[a] we need to update configuration with 'service_down_time' option (Maximum time in seconds since last check-in for up service) and related 'report_interval' option (Number of seconds indicating how frequently the state of services is reported) in [DEFAULT] section
[b] we need to extend 'service_statuses' database table with column 'state', which value is depending on 'service_down_time' value

Annie Melen (anniemelen)
summary: - Unclear 'dns service' statuses and states; need to upgraded
- configuration
+ Unclear 'dns service' statuses and states; need to upgrade configuration
Changed in designate:
status: New → Triaged
importance: Undecided → High
Alvaro Uria (aluria)
tags: added: canonical-bootstack
Revision history for this message
Giuseppe Petralia (peppepetra) wrote :

The resolution of this bug is blocking the implementation of a smart check for the Designate services in the openstack-service-checks charm

https://bugs.launchpad.net/charm-openstack-service-checks/+bug/1845293

We can still reproduce the bug in cloud:bionic-rocky:

All services are up:

$ date
Tue Sep 24 11:01:10 UTC 2019

$ openstack dns service list
+--------------------------------------+---------------+--------------+--------+-------+--------------+
| id | hostname | service_name | status | stats | capabilities |
+--------------------------------------+---------------+--------------+--------+-------+--------------+
| bac409d2-6be5-4f0a-bed6-14ace02f91ef | juju-e719f5-9 | api | UP | - | - |
| f1b39c47-23a1-430f-a5b2-0bd5f9a8f3e8 | juju-e719f5-9 | producer | UP | - | - |
| 6103d503-6ecb-4a31-82a9-814948c5f667 | juju-e719f5-9 | mdns | UP | - | - |
+--------------------------------------+---------------+--------------+--------+-------+--------------+

We stopped the mdns service and after 30 minutes it is still reported as UP even if the heartbeat is 30 minutes old:

$ date
Tue Sep 24 11:36:12 UTC 2019

$ openstack dns service show 6103d503-6ecb-4a31-82a9-814948c5f667
+----------------+--------------------------------------+
| Field | Value |
+----------------+--------------------------------------+
| capabilities | - |
| created_at | 2019-09-23T16:24:48.000000 |
| heartbeated_at | 2019-09-24T11:06:03.000000 |
| hostname | juju-e719f5-9 |
| id | 6103d503-6ecb-4a31-82a9-814948c5f667 |
| service_name | mdns |
| stats | - |
| status | UP |
| updated_at | 2019-09-24T11:06:03.000000 |
+----------------+--------------------------------------+

Is there any news on the resolution of it?

Revision history for this message
Erik Olof Gunnar Andersson (eandersson) wrote :

Most likely this was intended to be based on the timestamp, e.g. if more than 5 minutes since last heartbeat, mark it as DOWN. This could probably be done on either the cli side (e.g. as part of python-designateclient) or within designate central.

Revision history for this message
Pierre Riteau (priteau) wrote :

Is there any plan to introduce this functionality? I would suggest implementing it as part of designate-central, since the API includes the status field in the response to /v2/service_statuses. It would make this status field useless if every API client (not just python-designate client, but clients for any language too) had to parse the heartbeated_at timestamp and do its own timeout check.

Revision history for this message
Pierre Riteau (priteau) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.