senlin-healthcheck-manager not work right and keep connection to keystone make highload

Bug #2055275 reported by Bo Tran
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
senlin
New
Undecided
Unassigned

Bug Description

I doing run senlin-health-manager with latest commit.

I have about 200 cluster with 200 policy healthcheck being attach for each cluster.
After run about 15 minutes. I received this log with following:

```
2024-02-28 09:08:46.468 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 4ba4bd6e-b1f3-4ada-b18f-8a4bb544ec80 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 09:08:46.469 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 17857f00-f8e8-4ebb-a3b2-1ea5e940093c from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 09:08:46.469 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 5664e3df-0db0-425c-935e-2f0b5122cda5 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 09:08:46.469 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: b231926f-2714-4806-a3c4-9554fb904c5c from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 09:08:46.469 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 32f08798-c31c-486c-8f46-0e8a55b64cf1 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 09:08:46.469 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 63d59722-8021-4aab-b43b-a8ca8f10e6d7 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 09:08:46.470 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 32310664-de2a-4875-bf4d-f7ee26c651c5 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 09:08:46.470 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 333cc5c3-bafd-4c36-aa5b-e45385cfb130 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 09:08:46.470 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 925ec982-a409-487c-8f92-c6ee89ffbf71 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 09:08:46.470 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 260513ac-b83d-44cd-b730-d0322100ad7f from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 09:08:46.470 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 3ad3b3d9-0ece-4f0c-8fb8-433381436793 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
```

I had try add a debug log in this line: https://github.com/openstack/senlin/blob/master/senlin/engine/health_manager.py#L782 with following:

```
LOG.info("self.registries: %s, db_registries: %s" % (self.registries, db_registries))
```

and restart.

After senlin-health-manager run. I got this log with following:

```
2024-02-28 10:23:46.476 46 INFO senlin.engine.health_manager [-] self.registries: {}, db_registries: []
2024-02-28 10:23:46.484 41 INFO senlin.engine.health_manager [-] self.registries: {}, db_registries: []
2024-02-28 10:23:46.493 44 INFO senlin.engine.health_manager [-] self.registries: {'4ba4bd6e-b1f3-4ada-b18f-8a4bb544ec80': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a4908700>, '17857f00-f8e8-4ebb-a3b2-1ea5e940093c': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a48fe5e0>, '5664e3df-0db0-425c-935e-2f0b5122cda5': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a48fe9a0>, 'b231926f-2714-4806-a3c4-9554fb904c5c': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a48fec70>, '32f08798-c31c-486c-8f46-0e8a55b64cf1': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a48fefa0>, '63d59722-8021-4aab-b43b-a8ca8f10e6d7': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a61bd130>, '32310664-de2a-4875-bf4d-f7ee26c651c5': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a48f7c40>, '333cc5c3-bafd-4c36-aa5b-e45385cfb130': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a48f76d0>, '925ec982-a409-487c-8f92-c6ee89ffbf71': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a48f7cd0>, '260513ac-b83d-44cd-b730-d0322100ad7f': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a49407c0>, '3ad3b3d9-0ece-4f0c-8fb8-433381436793': <senlin.engine.health_manager.HealthCheck object at 0x7fa8a4916d30>}, db_registries: [('4ba4bd6e-b1f3-4ada-b18f-8a4bb544ec80',), ('17857f00-f8e8-4ebb-a3b2-1ea5e940093c',), ('5664e3df-0db0-425c-935e-2f0b5122cda5',), ('b231926f-2714-4806-a3c4-9554fb904c5c',), ('32f08798-c31c-486c-8f46-0e8a55b64cf1',), ('63d59722-8021-4aab-b43b-a8ca8f10e6d7',), ('32310664-de2a-4875-bf4d-f7ee26c651c5',), ('333cc5c3-bafd-4c36-aa5b-e45385cfb130',), ('925ec982-a409-487c-8f92-c6ee89ffbf71',), ('260513ac-b83d-44cd-b730-d0322100ad7f',), ('3ad3b3d9-0ece-4f0c-8fb8-433381436793',)]
2024-02-28 10:23:46.493 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 4ba4bd6e-b1f3-4ada-b18f-8a4bb544ec80 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.494 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 17857f00-f8e8-4ebb-a3b2-1ea5e940093c from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.494 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 5664e3df-0db0-425c-935e-2f0b5122cda5 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.494 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: b231926f-2714-4806-a3c4-9554fb904c5c from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.494 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 32f08798-c31c-486c-8f46-0e8a55b64cf1 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.494 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 63d59722-8021-4aab-b43b-a8ca8f10e6d7 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.495 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 32310664-de2a-4875-bf4d-f7ee26c651c5 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.495 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 333cc5c3-bafd-4c36-aa5b-e45385cfb130 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.495 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 925ec982-a409-487c-8f92-c6ee89ffbf71 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.495 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 260513ac-b83d-44cd-b730-d0322100ad7f from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.495 44 INFO senlin.engine.health_manager [-] Removing orphaned health check: 3ad3b3d9-0ece-4f0c-8fb8-433381436793 from 7547b7cf-692e-4894-a82d-e31528e1b0a6
2024-02-28 10:23:46.511 42 INFO senlin.engine.health_manager [-] self.registries: {}, db_registries: []
2024-02-28 10:23:46.516 47 INFO senlin.engine.health_manager [-] self.registries: {}, db_registries: []
2024-02-28 10:23:46.517 57 INFO senlin.engine.health_manager [-] self.registries: {}, db_registries: []
2024-02-28 10:23:46.519 50 INFO senlin.engine.health_manager [-] self.registries: {}, db_registries: []
2024-02-28 10:23:46.519 49 INFO senlin.engine.health_manager [-] self.registries: {}, db_registries: []
2024-02-28 10:23:46.525 48 INFO senlin.engine.health_manager [-] self.registries: {}, db_registries: []

```

senlin-health-manager don't do healthcheck nova instances as expected never.

When I run command: `ss -atnpe | grep 5000 | grep senlin-health-manager`. I detected that the number of connect being increment with loop healthcheck.

Revision history for this message
Pham Le Gia Dai (daiplg) wrote :

Could you provide more about the created health policy?

Revision history for this message
Bo Tran (ministry.nd) wrote :

This is file I was used to create health check policy:

```
| updated_at | 2024-04-06T01:37:17Z |
# Sample health policy based on node health checking
type: senlin.policy.health
version: 1.0
description: A policy for maintaining node health from a cluster.
properties:
  detection:
    # Type for health checking, valid values include:
    # NODE_STATUS_POLLING, LB_STATUS_POLLING, LIFECYCLE_EVENTS
    detection_modes:
      - type: NODE_STATUS_POLLING

    # Number of seconds between two adjacent checking
    interval: 60

  recovery:
    # Action that can be retried on a failed node, will improve to
    # support multiple actions in the future. Valid values include:
    # REBOOT, REBUILD, RECREATE
    actions:
      - name: RECREATE
```

I have 2 problems:
1. Keep connection, make number connection to keystone highload
2. Don't healthcheck cluster before senlin detect it as orphaned healthcheck

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.