Multiple post_failure on master periodic pipeline

Bug #1861378 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
wes hayutin

Bug Description

We are seeing multiple post failures on master periodic pipeline
Below is the list of jobs which got impacted:
* periodic-tripleo-ci-centos-7-multinode-1ctlr-featureset010-master
* periodic-tripleo-ci-centos-7-singlenode-featureset050-upgrades-master
* periodic-tripleo-ci-centos-7-standalone-upgrade-master
* periodic-tripleo-ci-rhel-8-scenario001-standalone-master
* periodic-tripleo-ci-rhel-8-scenario002-standalone-master
* periodic-tripleo-ci-rhel-8-scenario003-standalone-master
* periodic-tripleo-ci-rhel-8-scenario004-standalone-master

It can be visible in check jobs also.
https://review.rdoproject.org/zuul/builds?result=POST_FAILURE
At this time 2020-01-30T00:42:19 we have following failures:
* tripleo-ci-rhel-8-scenario001-standalone-rdo
* tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001
* tripleo-ci-rhel-8-scenario004-standalone-rdo
* tripleo-ci-rhel-8-scenario003-standalone-rdo
* tripleo-ci-rhel-8-scenario001-standalone-rdo
* tripleo-ci-centos-7-ovb-3ctlr_1comp_1supp-featureset039
* tripleo-ci-rhel-8-scenario002-standalone-rdo

The post playbook got failed. There might be multiple reasons associated with that.

May be the RDO log server got fulled as there is some recent change in ansible-role-collect-logs to
gzip format.

It needs to be investigated.

Revision history for this message
Javier Peña (jpena-c) wrote :

The disk in logs.rdoproject.org is full again. I'm freeing up some space, and running the pruner script.

Could you check if all tripleo jobs have gone back to compressing logs? Yesterday we had more than 100GB available, and then usage went up again. You can check https://review.rdoproject.org/grafana/?panelId=52240&fullscreen&orgId=1&var-datasource=default&var-server=logs.rdoproject.org.rdocloud&var-inter=$__auto_interval_inter&from=now-24h&to=now for details.

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

https://review.opendev.org/#/c/704738/ patch broke archiving logs on OVB, reverting it: https://review.opendev.org/#/c/704933/

collect-logs.yml is applied as last and override all vars before, it shouldn't be used for variable setting. Use tripleo-ci/toci-quickstart/config/testenv/multinode.yml and tripleo-ci/toci-quickstart/config/testenv/ovb.yml for that

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

Why we do not have a cron job that removes older log folders by age until we get like 20GB free disk space, running every hour?

This should prevent accidents. Always keeping some safe disk space by scrapping old folders. Such an approach would not depend on manual intervention.

tags: added: promotion-blocker
Revision history for this message
daniel.pawlik (daniel-pawlik) wrote :

I guess bug disappear after switching to the new log server.

Changed in tripleo:
assignee: nobody → wes hayutin (weshayutin)
status: Confirmed → In Progress
Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-role-collect-logs (master)

Reviewed: https://review.opendev.org/705446
Committed: https://git.openstack.org/cgit/openstack/ansible-role-collect-logs/commit/?id=d784c1efa1eea3174ae721db965d6a4bd53616eb
Submitter: Zuul
Branch: master

commit d784c1efa1eea3174ae721db965d6a4bd53616eb
Author: Sagi Shnaidman <email address hidden>
Date: Mon Feb 3 16:04:40 2020 +0200

    Fix logs collection with new docker

    With new version of docker it's stuck on "stats" collection, let's
    remove this task untils it's fixed.

    Partial-Bug: #1861378
    Change-Id: I737d0d5ac7c7c178b304ce480b1d60e345ade120

Revision history for this message
wes hayutin (weshayutin) wrote :

looks fixed to me :)

Thanks all

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.