excessive errors in upgrade jobs causing logstash issues

Bug #1817602 reported by Alex Schultz on 2019-02-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
wes hayutin

Bug Description

#openstack-infra has reported that the errors log is excessively large on the upgrades job and is causing logstash backups (OOM as well). We need to fix the upgrade jobs or disable them if they are going to generate excessive errors (200+MB of error logs)

http://eavesdrop.openstack.org/irclogs/%23openstack-infra/%23openstack-infra.2019-02-25.log.html#t2019-02-25T16:07:14

Hi,

even a successful upgrade generate a lot of error as the service get stopped for some time (~120 mb for a successful standalone for instance).

I barely had to reach for those logs to debug an upgrade so maybe we could just deactivate the generation of those logs for upgrade jobs. So maybe we could add a conditional there [1] that ensure that it either not created or at least not pushed to logstash for upgrade job?

[1] https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/collect-logs/tasks/collect.yml#L327-L337

Fix proposed to branch: master
Review: https://review.openstack.org/639165

Changed in tripleo:
assignee: nobody → wes hayutin (weshayutin)
status: Triaged → In Progress

Change abandoned by wes hayutin (<email address hidden>) on branch: master
Review: https://review.openstack.org/639165

wes hayutin (weshayutin) wrote :

We'll be revisiting collect logs, so reopen if you think we should track that here

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers