Set SyslogIdentifier for container healthchecks

Bug #1856573 reported by Cédric Jeanneret
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Cédric Jeanneret

Bug Description

Hello,

In order to have an easy way to collect the logs for the container healthchecks (managed via systemd timers), it would be really nice to set the "SyslogIdentified" in the healthcheck units - maybe something like "container-healthcheck".

This would allow to filter those logs at (r)syslog level and push them in a dedicated file, allowing for better monitoring via collectd/sensu without the need to call journald from within a container (this creates "some" SELinux issues due to the need of dbus and some other things).

Cheers,

C.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/699215

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (master)

Fix proposed to branch: master
Review: https://review.opendev.org/699216

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

After some poking, it seems to work. We can get this kind of logs:

Dec 17 15:03:13 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Compiled against library: libvirt 4.5.0
Dec 17 15:03:13 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Using library: libvirt 4.5.0
Dec 17 15:03:13 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Using API: QEMU 4.5.0
Dec 17 15:03:13 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Running hypervisor: QEMU 2.12.0
Dec 17 15:03:13 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Running against daemon: 4.5.0
Dec 17 15:03:13 overcloud-0-novacompute-0 healthcheck_nova_compute: 8 ? 00:00:32 nova-compute
Dec 17 15:04:33 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Compiled against library: libvirt 4.5.0
Dec 17 15:04:33 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Using library: libvirt 4.5.0
Dec 17 15:04:33 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Using API: QEMU 4.5.0
Dec 17 15:04:33 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Running hypervisor: QEMU 2.12.0
Dec 17 15:04:33 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Running against daemon: 4.5.0
Dec 17 15:05:33 overcloud-0-novacompute-0 healthcheck_nova_compute: 8 ? 00:00:36 nova-compute
Dec 17 15:06:07 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Compiled against library: libvirt 4.5.0
Dec 17 15:06:07 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Using library: libvirt 4.5.0
Dec 17 15:06:07 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Using API: QEMU 4.5.0
Dec 17 15:06:07 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Running hypervisor: QEMU 2.12.0
Dec 17 15:06:07 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Running against daemon: 4.5.0
Dec 17 15:07:17 overcloud-0-novacompute-0 healthcheck_nova_compute: Error: exec failed: container_linux.go:345: starting container process caused "exec: \"/openstack/healthcheck\": stat /openstack/healthcheck: no such file or directory": OCI runtime error
Dec 17 15:08:33 overcloud-0-novacompute-0 healthcheck_nova_compute: 8 ? 00:00:40 nova-compute
Dec 17 15:08:33 overcloud-0-novacompute-0 healthcheck_nova_libvirt: FAILED
Dec 17 15:08:33 overcloud-0-novacompute-0 healthcheck_nova_libvirt: Error: non zero exit code: 1: OCI runtime error

The "FAILED" is a tweaked healthcheck that just echoes "FAILED" and exists with 1 - meaning we are able to catch failed healthchecks.

I don't know if this would be enough?

Revision history for this message
Martin Mágr (mmagr) wrote :

So I see the error message on the next line. It is not ideal, but I think we can manage to fetch failure _and_ the respective error message for the failure. Thanks a lot for the effort. ... but we still have to do something with https://review.opendev.org/#/c/695859

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/699215
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=49858c5265310c8cdd1694ae389965370fd97abb
Submitter: Zuul
Branch: master

commit 49858c5265310c8cdd1694ae389965370fd97abb
Author: Cédric Jeanneret <email address hidden>
Date: Mon Dec 16 15:27:57 2019 +0100

    Add SyslogIdenfier to healthcheck systemd unit

    Adding this new field will allow to filter all healthcheck logs using
    the Idenfier value.

    For instance, using journalctl, you would be able to run this:
    `journalctl -t healthcheck_collectd'

    It will also allow to get a dedicated file out of (r)syslog if needed.

    Change-Id: Icdc5caf4cedc46291a807c39c0a31c74955a4a74
    Closes-Bug: #1856573

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (master)

Reviewed: https://review.opendev.org/699216
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=505dc92a63292c763b274e8dd02d852bbe0ace8b
Submitter: Zuul
Branch: master

commit 505dc92a63292c763b274e8dd02d852bbe0ace8b
Author: Cédric Jeanneret <email address hidden>
Date: Mon Dec 16 15:37:57 2019 +0100

    Add SyslogIdenfier to healthcheck systemd unit

    Adding this new field will allow to filter all healthcheck logs using
    the Idenfier value.

    For instance, using journalctl, you would be able to run this:
    `journalctl -t healthcheck_collectd'

    It will also allow to get a dedicated file out of (r)syslog if needed.

    This is the reflection of Icdc5caf4cedc46291a807c39c0a31c74955a4a74

    Change-Id: I6861baa287f2a8288b87be26aacecbcc061cd96f
    Closes-Bug: #1856573

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/701186

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (stable/train)

Reviewed: https://review.opendev.org/701186
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=e0566017a52f52f0c41f62ade60cb0bdf1ee1197
Submitter: Zuul
Branch: stable/train

commit e0566017a52f52f0c41f62ade60cb0bdf1ee1197
Author: Cédric Jeanneret <email address hidden>
Date: Mon Dec 16 15:37:57 2019 +0100

    Add SyslogIdenfier to healthcheck systemd unit

    Adding this new field will allow to filter all healthcheck logs using
    the Idenfier value.

    For instance, using journalctl, you would be able to run this:
    `journalctl -t healthcheck_collectd'

    It will also allow to get a dedicated file out of (r)syslog if needed.

    This is the reflection of Icdc5caf4cedc46291a807c39c0a31c74955a4a74

    Change-Id: I6861baa287f2a8288b87be26aacecbcc061cd96f
    Closes-Bug: #1856573
    (cherry picked from commit 505dc92a63292c763b274e8dd02d852bbe0ace8b)

tags: added: in-stable-train
tags: added: stein-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to paunch (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/701523

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to paunch (stable/stein)

Reviewed: https://review.opendev.org/701523
Committed: https://git.openstack.org/cgit/openstack/paunch/commit/?id=3c38fe60011d9156ec71ec4dae78ebc1f1b88a99
Submitter: Zuul
Branch: stable/stein

commit 3c38fe60011d9156ec71ec4dae78ebc1f1b88a99
Author: Cédric Jeanneret <email address hidden>
Date: Mon Dec 16 15:37:57 2019 +0100

    Add SyslogIdenfier to healthcheck systemd unit

    Adding this new field will allow to filter all healthcheck logs using
    the Idenfier value.

    For instance, using journalctl, you would be able to run this:
    `journalctl -t healthcheck_collectd'

    It will also allow to get a dedicated file out of (r)syslog if needed.

    This is the reflection of Icdc5caf4cedc46291a807c39c0a31c74955a4a74

    Change-Id: I6861baa287f2a8288b87be26aacecbcc061cd96f
    Closes-Bug: #1856573
    (cherry picked from commit 505dc92a63292c763b274e8dd02d852bbe0ace8b)
    (cherry picked from commit e0566017a52f52f0c41f62ade60cb0bdf1ee1197)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/701826

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/train
Review: https://review.opendev.org/701849

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ansible (stable/train)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/701826
Reason: https://review.opendev.org/701849

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 1.1.0

This issue was fixed in the openstack/tripleo-ansible 1.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ansible (stable/train)
Download full text (17.9 KiB)

Reviewed: https://review.opendev.org/701849
Committed: https://git.openstack.org/cgit/openstack/tripleo-ansible/commit/?id=ec4351c56646122f295a4fbdcba65abecc28df26
Submitter: Zuul
Branch: stable/train

commit ec4351c56646122f295a4fbdcba65abecc28df26
Author: Emilien Macchi <email address hidden>
Date: Wed Oct 2 12:01:28 2019 -0400

    [SQUASH] backport tripleo-container-manage to stable/train

    This is a squash of 32 commits to facilitate the backport of
    tripleo-container-manage and its dependencies.

    Introduce tripleo-container-manage role

    This is a first ieration of the role, but there is still a long TODO,
    that will come later in separated patches:
    - Add molecule testing
    - In podman.yaml, add cpuset_cpus with parity of what is in paunch
    - Remove containers that are:
      - managed by tripleo-ansible (using the container_label flag)
      - not in the container-startup-config
    - Print stdout when containers start as it was done with paunch

    Story: 2006732
    Task: 37165

    Co-Authored-By: Kevin Carter <email address hidden>
    Co-Authored-By: Alex Schultz <email address hidden>

    Depends-On: https://review.opendev.org/#/c/702144/
    Change-Id: I2f88caa8e1c230dfe846a8a0dd9f939b98992cd5
    (cherry picked from commit a191a2d6001068c77fa6e4a97c12574c59341864)

    tripleo-container-manage: set some defaults

    Set defaults that are needed to use the role outside of THT more easily.

    Change-Id: Id67cf06c85a2a6b50e6494b1a66f534ccb06c4a7
    (cherry picked from commit 609d7895a1ff4a216abffb7c46c16c49e8abaf4e)

    Move the filters plugin to the core plugins location

    This change is a workaround for a zuul issue which moves the
    nested ansible role plugin to the core plugins directory so
    that it is not creating a gate conflict.

    Change-Id: I9f959803381063502b4d15980b14c3416ffa153f
    Signed-off-by: Kevin Carter <email address hidden>
    (cherry picked from commit e2719131dbdff4fc5fdee32e84ccb4c7655be29b)

    Revert "Workaround for ansible-lint installation failure"

    Backport note: this is a second backport of the same patch
    since now it includes the change in tripleo-container-manage
    role that is being backported to stable/train.

    This reverts the disabling of the ansible-lint test from
    commit cffd4fc9d41b15c31610dee4abd8786916f5933b and updates
    ansible-lint to the fixed version.

    Included are fixes for ansible-lint test failures which
    got merged as part of I2f88caa8e1c230dfe846a8a0dd9f939b98992cd5
    while the lint test was disabled.

    Change-Id: I37100f5e1764a5cd2cb8df82ae963e673ca0a8da
    (cherry picked from commit 28e105c05689f3b8fc046758f9218e9ffab51360)

    tripleo-container-manage: few improvements

    - Add and use variables to make the role more flexible:
      tripleo_container_manage_config,
      tripleo_container_manage_config_id
      tripleo_container_manage_debug
      tripleo_container_manage_config_pattern (and rename hashed_files var)

      With these vars, the role can pretty much be used outside of TripleO.

    - Show logs of config...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch 6.0.1

This issue was fixed in the openstack/paunch 6.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-ansible 0.5.0

This issue was fixed in the openstack/tripleo-ansible 0.5.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/paunch stein-eol

This issue was fixed in the openstack/paunch stein-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.