FluxCD pods have a history of only 2 logs

Bug #2009784 reported by Leonardo Fagundes Luz Serrano
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Leonardo Fagundes Luz Serrano

Bug Description

Brief Description
-----------------
Kubernetes only keeps logs for the current running instance of FluxCD pods
and the previous instance.

Some issues create a cluster network outage which causes FluxCD pods to be restarted.
When this happens some logs are erased and we lose information about the issues.

Severity
--------
<Minor: System/Feature is usable with minor issue>

Steps to Reproduce
------------------
delete fluxcd pods, wait for them to be recreated, do this twice

Expected Behavior
------------------
3 logs, corresponding to each previous instance and the current one

Actual Behavior
----------------
only 2 logs, oldest instance log is erased

Reproducibility
---------------
Reproducible 100%

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
Can use any load since flux was introduced

Last Pass
---------
Likely never.

Timestamp/Logs
--------------
~/CGTS-42214/subcloud220_20221221.122614$ ls subcloud220_20221221.122614/controller-0_20221221.122614/var/log/pods/flux-helm_helm-controller-f59cdb448-9tjkq_2d283b6b-4eb7-4ffe-b4a6-9626d062b544/manager/
10.log 9.log

~/CGTS-42214/subcloud220_20221221.122614$ ls subcloud220_20221221.122614/controller-0_20221221.122614/var/log/pods/flux-helm_source-controller-84dc897b4b-ftml4_572cef55-fdaf-49f4-926c-fd525e09c9da/manager/
8.log 9.log

Test Activity
-------------
Triage logs.

Workaround
----------
Can't recover deleted logs, but can prevent current ones from being lost
by having some sort of backup mechanism running, such as a cronjob copying the files.

Changed in starlingx:
assignee: nobody → Leonardo Fagundes Luz Serrano (lfagunde)
Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config-files (master)

Reviewed: https://review.opendev.org/c/starlingx/config-files/+/877669
Committed: https://opendev.org/starlingx/config-files/commit/f1e378fe5c949421cfd3b0c08ba00af843e2f1dd
Submitter: "Zuul (22348)"
Branch: master

commit f1e378fe5c949421cfd3b0c08ba00af843e2f1dd
Author: Leonardo Fagundes Luz Serrano <email address hidden>
Date: Thu Mar 16 12:13:57 2023 -0300

    Setup fluxcd's log dir and logrotate

    - Armada has been replaced by Fluxcd, so the logrotate config can
    be adapted.

    - An entry was added to /etc/tmpfiles.d to create /var/log/flux
    during boot. Some more context in [1].

    - About the owner:group:
    The flux container processes are associated with the user:group
    'nobody:nogroup' as defined in their Dockerfiles [2,3], which is
    a default user with very restricted privileges [4].
    Since /var/log is owned by root, it does not allow flux to write files.
    To circumvent that, /var/log/flux has its ownership set to match
    the container processes.

    [1] https://review.opendev.org/c/starlingx/config-files/+/859666
    [2] https://github.com/fluxcd/source-controller/blob/v0.32.1/Dockerfile#L87
    [3] https://github.com/fluxcd/helm-controller/blob/v0.27.0/Dockerfile#L44
    [4] https://wiki.debian.org/SystemGroups

    Test Plan:
    PASS build custom iso and install. Flux log dir exists
         and has right owner:group.
    PASS logs rotate

    Partial-Bug: 2009784

    Signed-off-by: Leonardo Fagundes Luz Serrano <email address hidden>
    Change-Id: I8bf8bf5f42c78d6ddab8f0d65e6ffaff9a8ec555

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/c/starlingx/ansible-playbooks/+/876895
Committed: https://opendev.org/starlingx/ansible-playbooks/commit/23561c8fe31ba3cb99612f418c9cbc2da2a7ca7f
Submitter: "Zuul (22348)"
Branch: master

commit 23561c8fe31ba3cb99612f418c9cbc2da2a7ca7f
Author: Leonardo Fagundes Luz Serrano <email address hidden>
Date: Wed Mar 8 18:18:19 2023 -0300

    Backup flux pod logs

    Kubernetes only keeps logs for the current instance and the previous
    instance of a running pod. At this time, a kubectl configuration to
    customize this behavior has not been found.

    To prevent losing flux logs everytime flux is restarted, the command
    and arguments passed to the containers have been adapted to forward
    stdout and stderr to a log file in a volume hosted at /var/log/flux

    The container commands are the adapted from the Entrypoints in their
    Dockerfiles [1] and [2]. Note helm-controller uses tini to create
    its process while source-controller doesn't.

    [1] https://github.com/fluxcd/source-controller/blob/v0.32.1/Dockerfile
    [2] https://github.com/fluxcd/helm-controller/blob/v0.27.0/Dockerfile

    Test Plan:
    pass: Run bootstrap. Confirm logs are being saved to /var/log/flux
    pass: Run unlock to restart the node. New flux logs still saved to
          same files, despite kubectl rotating log files at /var/log/pods
    pass: Restart node. New flux logs preserve old entries, despite kubectl
          erasing old log files at /var/log/pods
    pass: Restart flux. flux log files preserved and logging still normal
    pass: logrotate rotating /var/log/flux logs

    Depends-On: https://review.opendev.org/c/starlingx/config-files/+/877669

    Closes-Bug: 2009784

    Signed-off-by: Leonardo Fagundes Luz Serrano <email address hidden>
    Change-Id: I2863e7e76a432412cb45706c9f49b2b43a888877

Changed in starlingx:
status: In Progress → Fix Released
Frank Miller (sensfan22)
tags: added: stx.9.0 stx.apps
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.