daemon.log rotating too quickly due to containerd logs

Bug #1882087 reported by Bart Wensley
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Suvro Ghosh

Bug Description

Brief Description
-----------------
The /var/log/daemon.log file (on controller hosts) is rotating out after approximately 2 days when an application (e.g. stx-openstack or stx-monitor) is installed. In order to debug user issues, we should aim for about 2 weeks of log retention.

More than 85% of the logs are coming from containerd. It looks like they are due to liveness checks - for example:
2020-06-04T13:11:48.275 controller-0 containerd[83852]: info time="2020-06-04T13:11:48.274377085Z" level=info msg="ExecSync for \"a54b9c9a03a3facd072bc2801b7a39158057639a767ee03fde2e15dd5f746abb\" with command [/bin/calico-node -felix-live -bird-live] and timeout 1 (s)"
2020-06-04T13:11:48.477 controller-0 containerd[83852]: info time="2020-06-04T13:11:48.477078924Z" level=info msg="Finish piping \"stderr\" of container exec \"2bc88f80d552d5e9eaf773b03d244800e3dedd173f087bbd602ff5c2d8845bd7\""
2020-06-04T13:11:48.477 controller-0 containerd[83852]: info time="2020-06-04T13:11:48.477081900Z" level=info msg="Exec process \"2bc88f80d552d5e9eaf773b03d244800e3dedd173f087bbd602ff5c2d8845bd7\" exits with exit code 0 and error <nil>"
2020-06-04T13:11:48.477 controller-0 containerd[83852]: info time="2020-06-04T13:11:48.477125704Z" level=info msg="Finish piping \"stdout\" of container exec \"2bc88f80d552d5e9eaf773b03d244800e3dedd173f087bbd602ff5c2d8845bd7\""
2020-06-04T13:11:48.479 controller-0 containerd[83852]: info time="2020-06-04T13:11:48.479812220Z" level=info msg="ExecSync for \"a54b9c9a03a3facd072bc2801b7a39158057639a767ee03fde2e15dd5f746abb\" returns with exit code 0"

Severity
--------
Major: debugging issues is difficult as the daemon.log file may have rotated out before the problem is detected and/or the logs are collected

Steps to Reproduce
------------------
Install the system and create some additional pods (e.g. by installing one of the starlingx applications mentioned above)

Expected Behavior
------------------
The daemon.log files keep about 2 weeks of logs.

Actual Behavior
----------------
The daemon.log is being rotated out after about 2 days.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
Designer load from starlingx master: BUILD_DATE="2020-06-02 15:01:59 -0500"

Last Pass
---------
Unknown

Timestamp/Logs
--------------
See logs above

Test Activity
-------------
Developer Testing

Workaround
----------
None

Ghada Khalil (gkhalil)
tags: added: stx.containers
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Marking for stx.5.0 as it's not clear how we can address this in starlingx. This may require follow up with the containerd community

tags: added: stx.5.0
Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Suvro Ghosh (suvr0)
Revision history for this message
Chris Friesen (cbf123) wrote :

Seems like we basically have two options...make the containerd logs less verbose (maybe only log liveness check failures?), or else keep more daemon.log logs (either by keeping more files, or rotating less often).

Revision history for this message
Bart Wensley (bartwensley) wrote :

In the short term, we need more logs, so I would vote to just update the logrotate configuration to keep more daemon.log logs. If we find a way to reduce the containerd logging we can always undo that change.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config-files (master)

Fix proposed to branch: master
Review: https://review.opendev.org/737635

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
Don Penney (dpenney) wrote :

My recommendation would be to redirect these logs out of daemon.log, so they're not flooding it and drowning out the rest of the important logs. We have a number of cases where we define filters and destinations in syslog-ng.conf:
https://opendev.org/starlingx/config-files/src/branch/master/syslog-ng-config/files/syslog-ng.conf

Revision history for this message
Chris Friesen (cbf123) wrote :

Yeah, Don's idea is good. Could do something like sysinv where we filter based on the process name.

Revision history for this message
Lin Shuicheng (shuicheng) wrote :

+1 for Don's idea.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config-files (master)

Reviewed: https://review.opendev.org/737635
Committed: https://git.openstack.org/cgit/starlingx/config-files/commit/?id=1806d74c85a80a3d4494a55e3263f544cd91ac7e
Submitter: Zuul
Branch: master

commit 1806d74c85a80a3d4494a55e3263f544cd91ac7e
Author: Suvro Ghosh <email address hidden>
Date: Tue Jun 23 17:12:37 2020 -0400

    Updated logrotate config for daemon.log

    Increased the file size to 70M which is seven times the previous(10M).
    The LP description mentioned that with the current settings
    "file size 10M" the logs got rotated every 2 days but the desired
    behavior was to rotate every 2 weeks (14 days).
    Therefore,
    2 days = 10M
    14 days (2 * 7 times) = 70M (10 * 7)

    Partial-bug: 1882087

    Change-Id: Ib30ecbc297356225b902598d341fb7ab44c3a8c2
    Signed-off-by: Suvro Ghosh <email address hidden>

Revision history for this message
Suvro Ghosh (suvr0) wrote :

Following Don's idea I am working on separating containerd logs from daemon logs. And since the culprit containerd will be gone from daemon logs which was the reason for the last partial/temporary fix, therefore, I would like to revert the logrotate settings for daemon log to earlier, thoughts?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/758614
Committed: https://git.openstack.org/cgit/starlingx/config-files/commit/?id=987f73124ab32db01cd01cc05bd8f90052aa993d
Submitter: Zuul
Branch: master

commit 987f73124ab32db01cd01cc05bd8f90052aa993d
Author: Suvro Ghosh <email address hidden>
Date: Fri Oct 16 14:46:59 2020 -0400

    Separated containerd logs from daemon logs

    Created a new destination, filter and therefore a log path for
    containerd. Also filtered containerd logs from daemon filter.

    Reverted temporary fix of increased log file size for daemon log and
    added containerd log file to logrotate config

    Change-Id: Ic6e53a5f2a72d96476b79615666dd19dccab8af6
    Closes-Bug: 1882087
    Signed-off-by: Suvro Ghosh <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.