Some HA containers logging got lost with the move to podman

Bug #1872734 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Michele Baldessari

Bug Description

When podman dropped the journald log-driver we rushed to move to the supported k8s-file driver. This had the side effect of us losing the stdout logs of the HA containers.

In fact previously we were easily able to troubleshoot haproxy startup failures just by looking in the journal. These days instead if haproxy fails to start we have no traces whatsoever in the logs, because when a container fails it gets stopped by pacemaker (and consequently removed) and no logs on the system are available any longer

summary: - HA containers logging got lost with the move to podman
+ Some HA containers logging got lost with the move to podman
Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: none → ussuri-rc1
Revision history for this message
Michele Baldessari (michele) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (master)

Reviewed: https://review.opendev.org/719773
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=06c4aa7446073022b86c1f034a0c5406f2675ddb
Submitter: Zuul
Branch: master

commit 06c4aa7446073022b86c1f034a0c5406f2675ddb
Author: Michele Baldessari <email address hidden>
Date: Tue Apr 14 11:14:22 2020 +0200

    Log stdout of HA containers

    When podman dropped the journald log-driver we rushed to move to the supported
    k8s-file driver. This had the side effect of us losing the stdout logs of the
    HA containers.

    In fact previously we were easily able to troubleshoot haproxy startup failures
    just by looking in the journal. These days instead if haproxy fails to start we
    have no traces whatsoever in the logs, because when a container fails it gets
    stopped by pacemaker (and consequently removed) and no logs on the system are
    available any longer.

    Tested as follows:
    1) Redeploy a previously deployed overcloud that did not have the patch
    and observe that we now log the startup of HA bundles in /var/log/containers/stdouts/*bundle.log

    [root@controller-0 stdouts]# ls -l *bundle.log |grep -v -e init -e restart
    -rw-------. 1 root root 16032 Apr 14 14:13 openstack-cinder-volume.log
    -rw-------. 1 root root 19515 Apr 14 14:00 haproxy-bundle.log
    -rw-------. 1 root root 10509 Apr 14 14:03 ovn-dbs-bundle.log
    -rw-------. 1 root root 6451 Apr 14 14:00 redis-bundle.log

    2) Deploy a composable HA overcloud from scratch with the patch above
    and observe that we obtain the stdout on disk.

    Note that most HA containers log to their usual on-host files just
    fine, we are mainly missing haproxy logs and/or the kolla startup only
    of the HA containers.

    Closes-Bug: #1872734

    Change-Id: I4270b398366e90206adffe32f812632b50df615b

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to puppet-tripleo (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/720657

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to puppet-tripleo (stable/train)

Reviewed: https://review.opendev.org/720657
Committed: https://git.openstack.org/cgit/openstack/puppet-tripleo/commit/?id=7e4aca45fa8fa5e7868b8f387242eb895cf975eb
Submitter: Zuul
Branch: stable/train

commit 7e4aca45fa8fa5e7868b8f387242eb895cf975eb
Author: Michele Baldessari <email address hidden>
Date: Tue Apr 14 11:14:22 2020 +0200

    Log stdout of HA containers

    When podman dropped the journald log-driver we rushed to move to the supported
    k8s-file driver. This had the side effect of us losing the stdout logs of the
    HA containers.

    In fact previously we were easily able to troubleshoot haproxy startup failures
    just by looking in the journal. These days instead if haproxy fails to start we
    have no traces whatsoever in the logs, because when a container fails it gets
    stopped by pacemaker (and consequently removed) and no logs on the system are
    available any longer.

    Tested as follows:
    1) Redeploy a previously deployed overcloud that did not have the patch
    and observe that we now log the startup of HA bundles in /var/log/containers/stdouts/*bundle.log

    [root@controller-0 stdouts]# ls -l *bundle.log |grep -v -e init -e restart
    -rw-------. 1 root root 16032 Apr 14 14:13 openstack-cinder-volume.log
    -rw-------. 1 root root 19515 Apr 14 14:00 haproxy-bundle.log
    -rw-------. 1 root root 10509 Apr 14 14:03 ovn-dbs-bundle.log
    -rw-------. 1 root root 6451 Apr 14 14:00 redis-bundle.log

    2) Deploy a composable HA overcloud from scratch with the patch above
    and observe that we obtain the stdout on disk.

    Note that most HA containers log to their usual on-host files just
    fine, we are mainly missing haproxy logs and/or the kolla startup only
    of the HA containers.

    Closes-Bug: #1872734

    NB: Cherry-picks had some context change in
        manifests/profile/pacemaker/cinder/volume_bundle.pp
        manifests/profile/pacemaker/rabbitmq_bundle.pp
        manifests/profile/pacemaker/manila/share_bundle.pp

    Change-Id: I4270b398366e90206adffe32f812632b50df615b
    (cherry picked from commit 06c4aa7446073022b86c1f034a0c5406f2675ddb)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/puppet-tripleo 11.5.0

This issue was fixed in the openstack/puppet-tripleo 11.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.