In large DC, open-ldap could reach FD limit

Bug #1952126 reported by Bin Qian
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Bin Qian

Bug Description

In a large DC environment, open-ldap service could reach its open file descriptor limit. When it happens, it would not be able to provide authentication service.

An enhance is needed to alarm the admin user that the usage is approaching to the limit, and degrade the node when the limit is reached.

new behavior:
when open FD reach above 95% of the FD limit of open-ldap service, an alarm is raised
when open FD reach FD limit of open-ldap service, controller will be degraded.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config-files (master)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ha (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/ha/+/819130

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fault (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/fault/+/819132

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to monitoring (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/monitoring/+/819137

Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: not gating; more of a robustness / enhancement

Changed in starlingx:
assignee: nobody → Bin Qian (bqian20)
importance: Undecided → Low
tags: added: stx.config stx.distcloud
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fault (master)

Reviewed: https://review.opendev.org/c/starlingx/fault/+/819132
Committed: https://opendev.org/starlingx/fault/commit/6105f83a85a8ca2e5ed8f33e0f5ed5455c8f0e17
Submitter: "Zuul (22348)"
Branch: master

commit 6105f83a85a8ca2e5ed8f33e0f5ed5455c8f0e17
Author: Bin Qian <email address hidden>
Date: Tue Nov 23 18:08:50 2021 -0500

    Add new alarm for FD limit reached

    Add a new alarm for open FD approaching limit (major) or
    limit is reached (critical).

    Partial-bug: 1952126
    Change-Id: Ifaece0e1d7a335f980cfebc3a591a90edbc35742
    Signed-off-by: Bin Qian <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to monitoring (master)

Reviewed: https://review.opendev.org/c/starlingx/monitoring/+/819137
Committed: https://opendev.org/starlingx/monitoring/commit/a9f84e13b13b0f20f1511d0503bbcf2df9f0fced
Submitter: "Zuul (22348)"
Branch: master

commit a9f84e13b13b0f20f1511d0503bbcf2df9f0fced
Author: Bin Qian <email address hidden>
Date: Thu Nov 18 15:05:51 2021 -0500

    Add new collectd plugin to monitor a service status

    When openldap service status return 160, raise a major alarm
    for the service is approaching its FD limit. When 161 is returned
    raise critical alarm for the limit is reached.

    SM will degrade the node when the FD reaches the limit.
    Ref SM changes:
    https://review.opendev.org/c/starlingx/ha/+/819130

    TC passed:
    Alarm is raised when FD limit is reached, or above 95% (approaching).
    Alarm is cleared when FD usage is below 95% threshold.
    Upgrade test. New alarm raised on controller-1 (N+1).
    Alarm is cleared when collectd restarts or node reboot (alarm will
    be re-raised if alarming situation is dected again)
    SM detects 161 status code and degraded the node with service
    degraded alarm.
    Alarm raised after fm comes back up after being not available.
    Alarm is cleared after fm comes backup after being not available.

    Closes-bug: 1952126
    Depends-on: https://review.opendev.org/c/starlingx/fault/+/819132

    Change-Id: I78bb6ed6f24570d68f62818e1242286d638fd835
    Signed-off-by: Bin Qian <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ha (master)

Reviewed: https://review.opendev.org/c/starlingx/ha/+/819130
Committed: https://opendev.org/starlingx/ha/commit/4d6a0534f99b08d1d6cb302efbc8f24b08722ce2
Submitter: "Zuul (22348)"
Branch: master

commit 4d6a0534f99b08d1d6cb302efbc8f24b08722ce2
Author: Bin Qian <email address hidden>
Date: Mon Nov 15 10:53:23 2021 -0500

    Add fd-limit-reached degraded condition for open-ldap service

    Added a new degrade action result code (161) to degrade the open-ldap
    service if FD limit is reached. Result code 160 indicates open FD is
    approaching to the limit, it will reset the degraded state as well
    as normal result code 0.

    SM was not designed to raise service level warnings/alarms. A major
    alarms will be raised by collectd when open-ldap open file descriptors
    is approaching to the limit (above 95%), and a critical alarm will be
    raised when the limit is reached (100%).
    Ref collectd changes:
    https://review.opendev.org/c/starlingx/monitoring/+/819137

    Partial-bug: 1952126
    Change-Id: I893c137ab81fcf01e949c9ca13cedcbe8fe5d86d
    Signed-off-by: Bin Qian <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config-files (master)

Reviewed: https://review.opendev.org/c/starlingx/config-files/+/819129
Committed: https://opendev.org/starlingx/config-files/commit/940eb07423c856cc2a40f0a5c0161f46c34e7136
Submitter: "Zuul (22348)"
Branch: master

commit 940eb07423c856cc2a40f0a5c0161f46c34e7136
Author: Bin Qian <email address hidden>
Date: Tue Nov 2 20:54:19 2021 -0400

    report open-ldap service is approaching or reaching FD limit

    openldap status to return 160 for open FD approaching limit
    (above 95%), return 161 for open FD reaches the limit.

    Depends-on: https://review.opendev.org/c/starlingx/ha/+/819130
    Partial-Bug: 1952126
    Change-Id: I27bc46709c74c592a8a6b9505f4f2edac742e1a9
    Signed-off-by: Bin Qian <email address hidden>

Revision history for this message
Ghada Khalil (gkhalil) wrote :

screening: Adding stx.6.0 since the fix will be available for that release

tags: added: stx.6.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.