kubernetes api-server event time to live needs to be increased

Bug #1830899 reported by Allain Legacy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
David Sullivan

Bug Description

Brief Description
-----------------
The Kubernetes API server event time to live defaults to 1 hour. That mechanism controls how long an event remains in the system before it is deleted. The default setting value results in the end user being unable to query events beyond the 1 hour time threshold. For debugging purposes 1 hour is too short to get a clear picture of what system transitions may have occurred.

The retention period can be controlled with a kube-apiserver option (--event-ttl).

Severity
--------
Minor, but impacts system debugability.

Steps to Reproduce
------------------
Run "kubectl get events " and observe that there are no events beyond 1hr of system uptime.

Expected Behavior
------------------
We should provide a longer retention period to allow time to gather system information following a critical issue. I suggest a 24 hour retention as a better alternative but that decision is subject to testing to determine the system storage impact of persisting events for 24 hours in a large system.

Actual Behavior
----------------
Events older than 1hr are deleted.

Reproducibility
---------------
100%

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
20190527T233000Z

Last Pass
---------
Never

Timestamp/Logs
--------------
N/A

Test Activity
-------------
Developer Testing

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking as release gating; this should be evaluated as part of stx.2.0 system engineering activities when planned.

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Tee Ngo (teewrs)
tags: added: stx.2.0 stx.config stx.containers
Revision history for this message
Dariush Eslimi (deslimi) wrote :

After review with technical lead we have decide to defer this change to stx-3.0.

tags: added: stx.3.0
removed: stx.2.0
Revision history for this message
Tee Ngo (teewrs) wrote :

No apparent impact to either storage or performance when TTL is set to 24hr.

Dariush Eslimi (deslimi)
Changed in starlingx:
assignee: Tee Ngo (teewrs) → David Sullivan (dsullivanwr)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/686806

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/686806
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=6434f8a068df93be942aae4869f2e337efe9373c
Submitter: Zuul
Branch: master

commit 6434f8a068df93be942aae4869f2e337efe9373c
Author: David Sullivan <email address hidden>
Date: Fri Oct 4 14:19:59 2019 -0400

    Increase kubernetes apiserver event ttl

    Increase kubernetes apiserver event-ttl to 24h

    Change-Id: If846c99c9d9a5318a7b90b49649fc62d5e214274
    Closes-Bug: 1830899
    Signed-off-by: David Sullivan <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.