CI: promoter server freeze

Bug #1725230 reported by Gabriele Cerami
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Gabriele Cerami

Bug Description

promoter server stopped working because there is not space left on /.
The containers cleanup tasks did not work properly with the latest runs, and containers have filled the entire root partition.

Revision history for this message
Gabriele Cerami (gcerami) wrote :

Deleting images with docker command did not work.
Stopping the daemon does not work
Manual deletion does not work, because daemon is still holding the file pointers.

Shutting down the server, attaching and additional volume, and mounting it to /var/lib/docker.
At least doker containers will be confined to their own volume and system will be able to work properly.
Then I'll start investigating the cleanup issue.

Revision history for this message
Gabriele Cerami (gcerami) wrote :

Rebooted server but space was still low.
Rebooted with / readonly, run xfs_repair, removed anything in /lost+found (contained all the files in /var/lib/docker)
Reactivated crontab.
Status should be green again.

Investigated issue: as usual, cleanup is part of the main list of tasks, and if something goes wrong at some point in the main playbook, cleanup is not run.

Creating a fix short term, then going for the separate volume solution, that will also ease any rebuild for the server we'll need to undergo.

Revision history for this message
Gabriele Cerami (gcerami) wrote :
Changed in tripleo:
status: Triaged → In Progress
Changed in tripleo:
milestone: queens-1 → queens-2
Revision history for this message
Attila Darazs (adarazs) wrote :

Change commited. This should be fixed, closing.

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.