Monitor App Filebeat readiness probe failing

Bug #1874328 reported by Kevin Smith
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Simon Cousineau

Bug Description

Brief Description
-----------------
After upgrading to 7.6 helm charts from helm-charts/elasticsearch, filebeat pods may fail to go ready causing application-apply to fail. Even if application-apply succeeds, occasional readiness probe failures for filebeat pods may be seen in /var/log/daemon.log

Severity
--------
<Minor: System/Feature is usable with minor issue>

Steps to Reproduce
------------------
Apply stx-monitor application.

Expected Behavior
------------------
stx-monitor applies successfully and filebeat pod readiness probes do not fail.

Actual Behavior
----------------
stx-monitor may fail to apply due to filebeat pod readiness probe failures.

Reproducibility
---------------
Intermittent.

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
Master April 22

Last Pass
---------
Occurs since upgrade to 7.6 helm charts.

Timestamp/Logs
--------------
/var/log/daemon.log:
2020-04-22T19:08:30.320 controller-0 containerd[2017]: info time="2020-04-22T19:08:30.320070442Z" level=error msg="ExecSync for \"18f92fbacab9a8c9090a4ef502a4ec4efd03f8ed5df0cb25bf363afa91d59493\" failed" error="rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 5s exceeded: context deadline exceeded"
2020-04-22T19:08:30.320 controller-0 kubelet[10103]: info E0422 19:08:30.320332 10103 remote_runtime.go:351] ExecSync 18f92fbacab9a8c9090a4ef502a4ec4efd03f8ed5df0cb25bf363afa91d59493 'sh -c #!/usr/bin/env bash -e
2020-04-22T19:08:30.320 controller-0 kubelet[10103]: info filebeat test output
2020-04-22T19:08:30.320 controller-0 kubelet[10103]: info ' from runtime service failed: rpc error: code = DeadlineExceeded desc = failed to exec in container: timeout 5s exceeded: context deadline exceeded
2020-04-22T19:08:30.320 controller-0 containerd[2017]: info time="2020-04-22T19:08:30.320720033Z" level=info msg="ExecSync for \"18f92fbacab9a8c9090a4ef502a4ec4efd03f8ed5df0cb25bf363afa91d59493\" with command [sh -c #!/usr/bin/env bash -e\nfilebeat test output\n] and timeout 5 (s)"

Test Activity
-------------
Feature Testing

Workaround
----------
Apply readiness probe timeout helm override to the filebeat chart to bump up from the default 5s to 20s.

Revision history for this message
Kevin Smith (kevin.smith.wrs) wrote :

Using the same resources limits as the metricbeat pods for filebeat pods resolves the problem.

Changed in starlingx:
assignee: nobody → Simon Cousineau (scousineau)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/722325
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=0333ccbb4216300eb451004790ce8b4c7e492e6f
Submitter: Zuul
Branch: master

commit 0333ccbb4216300eb451004790ce8b4c7e492e6f
Author: Simon Cousineau <email address hidden>
Date: Thu Apr 23 09:56:38 2020 -0400

    Fix Filebeat readiness probe exceeding timeout

    The 7.6.0 chart upgrade added a readiness probe to the beats. The
    Filebeat readiness probe will occasionally fail, causing
    application-apply to fail. This fix addresses this issue by increasing
    Filebeat's resource limits to match those allotted to Metricbeat.

    Closes-Bug: 1874328

    Change-Id: Ie2e23bbe063fd837999ceb48cc97071034526f35
    Signed-off-by: Simon Cousineau <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.4.0 - issue introduced recently by elasticsearch upversion to 7.6

tags: added: stx.monitor
tags: added: stx.4.0
Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to config (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/729812

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (f/centos8)
Download full text (37.5 KiB)

Reviewed: https://review.opendev.org/729812
Committed: https://git.openstack.org/cgit/starlingx/config/commit/?id=539d476456277c22d0dcbc3cbbc832e623242264
Submitter: Zuul
Branch: f/centos8

commit 320cc40de8518787c2be234d7fdf88ec0a462df2
Author: Don Penney <email address hidden>
Date: Wed May 13 13:06:11 2020 -0400

    Add auto-versioning to starlingx/config packages

    This update makes use of the PKG_GITREVCOUNT variable to auto-version
    the packages in this repo.

    Change-Id: I3a2c8caeb4b4647608978b1f2ccfcf0661508803
    Depends-On: https://review.opendev.org/727837
    Story: 2006166
    Task: 39766
    Signed-off-by: Don Penney <email address hidden>

commit d9f2aea0fb228ed69eb9c9262e29041eedabc15d
Author: Sharath Kumar K <email address hidden>
Date: Wed Apr 22 16:22:22 2020 +0200

    De-branding in starlingx/config: CGCS -> StarlingX

    1. Rename CGCS to StarlingX for .spec files

    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.

    Please note, doing de-brand changes in batches, this is batch9 changes.

    Story: 2006387
    Task: 39524

    Change-Id: Ia1fe0f2baafb78c974551100f16e6a7d99882f15
    Signed-off-by: Sharath Kumar K <email address hidden>

    De-branding in starlingx/config: CGCS -> StarlingX

    1. Rename CGCS to StarlingX for .spec file
    2. Rename TIS to StarlingX for .service files

    Test:
    After the de-brand change, bootimage.iso has been built in the flock
    Layer and installed on the dev machine to validate the changes.

    Please note, doing de-brand changes in batches, this is batch10 changes.

    Story: 2006387
    Task: 36202

    Change-Id: I404ce0da2621495175ad31489e9ad6f7b0211e26
    Signed-off-by: Sharath Kumar K <email address hidden>

commit d141e954fa6bbf688929ec90d1b6604a97792c43
Author: Teresa Ho <email address hidden>
Date: Tue Mar 31 10:08:57 2020 -0400

    Sysinv extensions for FPGA support

    This update adds cli and restapi to support FPGA device
    programming.

    CLI commands:
    system device-image-apply
    system device-image-create
    system device-image-delete
    system device-image-list
    system device-image-remove
    system device-image-show
    system device-image-state-list
    system device-label-list
    system host-device-image-update
    system host-device-image-update-abort
    system host-device-label-assign
    system host-device-label-list
    system host-device-label-remove

    Story: 2006740
    Task: 39498

    Change-Id: I556c2e7a51b3931b5a66ab27b67f51e3a8aebd9f
    Signed-off-by: Teresa Ho <email address hidden>

commit 491cca42ed854d2cb3ee3646b93c56a4f45f563c
Author: Elena Taivan <email address hidden>
Date: Wed Apr 29 11:25:26 2020 +0000

    Qcow2 conversion to raw can be done using 'image-conversion' filesystem

    1. Conversion filesystem can be added before/after
       stx-openstack is applied
    2. If conversion filesystem is added after stx-openstack
       is applied, changes to stx-openstack will only take effec...

tags: added: in-f-centos8
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.