Deployment Fails when /var is Full.

Bug #1371757 reported by Michael Petersen on 2014-09-19
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Przemyslaw Kaminski
6.0.x
High
Przemyslaw Kaminski

Bug Description

When deploying additional nodes and /var is full Fuel will go into a broken state. Disk space should be verified before deployments are allowed. Manual intervention was the only way to restore Fuel to a working state.

Changed in fuel:
milestone: none → 5.1
milestone: 5.1 → 6.0
assignee: nobody → Fuel Python Team (fuel-python)
Changed in fuel:
status: New → Triaged
importance: Undecided → High
tags: added: customer-found
Dima Shulyak (dshulyak) wrote :

Snapshot will be helpfull actually, i need to see what exactly failed.
It wont be hard to reproduce but, can if you have opportunity please attach one.

Przemyslaw Kaminski (pkaminski) wrote :

Because docker storage is in /var also, full /var means that various Fuel components can malfunction, not only nailgun (for example PostgreSQL: http://www.postgresql.org/docs/9.1/static/disk-full.html). When /var is already full it sometimes might be too late to react automatically -- I had 502 from nailgun when performing tests for this bug.

Thus I'd rather suggest some master ISO monitoring functionality and alerting the user via UI that the disk usage is high or whatever other problem is.

Przemyslaw Kaminski (pkaminski) wrote :

It seems that some consensus was reached among developers. I suggest adding a 'warning' notification if disk space is low. If user ignores it and /var becomes full eventual failures become his responsibility. Additionally an alert box with warning can be added when deploying nodes (as stated in the bug description) although I think notification is enough.

Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Przemyslaw Kaminski (pkaminski)
status: Triaged → In Progress
Baboune (seyvet) wrote :

It is not just upon deployment that the components malfunction, it is in any situation where the /var becomes full. This can be because log rotation is not working as in https://bugs.launchpad.net/fuel/+bug/1378327, or because a patch is being deployed in /var or a "backup" is being triggered via dockerctl backup,

Baboune (seyvet) wrote :

IMO log rotation should be based on the size of available disk. And it should ensure that sufficient space is available so as not to corrupt the docker containers. I dont know if it is possible but maybe the rsyslog should have lower priority than the containers, or the containers should reserve sufficient disk space prior.

Mike Scherbakov (mihgen) wrote :

> IMO log rotation should be based on the size of available disk.
+1 on this.

Folks, considering that we are two days before a release, I suggest a simplest notification mechanism for 6.0, and address this issue better in 6.0.1.
Why can not we just run cron job every X hours, check for a disk space, and do REST API call to Nailgun to /notifications, creating notification there? Then we won't need to bring monit, other complications for now. If this REST API call is complicated for any reason, why not simply run python app then, which would import required nailgun module, and do notification.create - which would create a record directly in DB?
These would be hacks; but in general, I do not think that core nailgun should be also a monitoring service for master node. Let's use other services for it. We might want to use nailgun only as UI notification service. Please note, that it doesn't solve the issue as I probably don't go to UI any often, but at least something for now...

Przemyslaw Kaminski (pkaminski) wrote :

Mike, do we want to create a notification every hour? Or just do it once, when the disk space is below some threshold? I didn't mean to spam the user with alerts. This requires keeping the state somewhere though. Additionaly, with state we could run the job more often and not be afraid of sending many alerts.
I'd opt for calling Nailgun API though since we want to do this monitoring on the host system, not inside a Docker container (because it could provide unreliable information about free disk space). We could probably reuse Fuel CLI config file (/etc/fuel/client/config.yaml) to connect to the API. The Nailgun code can be reliably fetched from inside a Docker container only AFAIK.

I'll implement a POC for cron runner.

Fix proposed to branch: master
Review: https://review.openstack.org/137785

Change abandoned by Przemyslaw Kaminski (<email address hidden>) on branch: master
Review: https://review.openstack.org/135314
Reason: As suggested in https://bugs.launchpad.net/fuel/+bug/1371757 I added a simple cron job to monitor disk space on host system. New review is here: https://review.openstack.org/#/c/137785/ this one is abandoned, possibly waiting for monit implementation.

Change abandoned by Przemyslaw Kaminski (<email address hidden>) on branch: master
Review: https://review.openstack.org/135893
Reason: As suggested in https://bugs.launchpad.net/fuel/+bug/1371757 I added a simple cron job to monitor disk space on host system. New review is here: https://review.openstack.org/#/c/137785/ this one is abandoned, possibly waiting for monit implementation.

Mike Scherbakov (mihgen) wrote :

This looks like the whole feature, which is not even trivial / small in size. I'm with Dmitry P. here, and do not think that we can allow yourself to merge it in. It seems to me too risky.

If you see that it is Critical, not High, or if you think the patch is not so risky - let's discuss.

Let's move it to 6.1 then. With some work we can reactivate monit and get rid of some Python monitoring code. Still, work is not lost, more things got clarified (like adding the new 'monitord' user in services tenant).

Changed in fuel:
milestone: 6.0 → 6.1
tags: added: release-notes
Changed in fuel:
milestone: 6.0 → 6.1
no longer affects: fuel/6.1.x

Fix proposed to branch: master
Review: https://review.openstack.org/150425

Reviewed: https://review.openstack.org/150425
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=52d78a3c4d911e0f6f7809ca5b2f7f183328563b
Submitter: Jenkins
Branch: master

commit 52d78a3c4d911e0f6f7809ca5b2f7f183328563b
Author: Przemyslaw Kaminski <email address hidden>
Date: Tue Jan 27 14:40:43 2015 +0100

    Add update_astute_config function to fuel_upgrade

    This factors out common astute file update functionality

    Change-Id: If8ca93e7030bb9d95f31fb240c55b9b840c3bd3c
    Partial-Bug: #1371757

Reviewed: https://review.openstack.org/150744
Committed: https://git.openstack.org/cgit/stackforge/python-fuelclient/commit/?id=3355e188eec37b9736282683538b4aa51edc6897
Submitter: Jenkins
Branch: master

commit 3355e188eec37b9736282683538b4aa51edc6897
Author: Przemyslaw Kaminski <email address hidden>
Date: Wed Jan 28 11:21:49 2015 +0100

    Add 'notifications' argument

    This can be used for displaying and sending notifications to Fuel.

    Send an error message:
    fuel notify -m This is wrong --topic error
    fuel notifications --send Hello world

    List all unread notifications:
    fuel notifications

    List all notifications:
    fuel notifications -a

    Mark messages as read:
    fuel notifications -r 1 2

    Mark all messages as read:
    fuel notifications -r '*'

    DocImpact
    Related-Bug: #1371757
    Change-Id: I6a5f05febf8f5a01a7b9415546ef56c11aedefce

Reviewed: https://review.openstack.org/156561
Committed: https://git.openstack.org/cgit/stackforge/python-fuelclient/commit/?id=5657dbf06fddb74adb61e9668eb579a1c57d8af8
Submitter: Jenkins
Branch: master

commit 5657dbf06fddb74adb61e9668eb579a1c57d8af8
Author: Przemyslaw Kaminski <email address hidden>
Date: Tue Feb 17 13:01:24 2015 +0100

    Added --tenant option for keystone authentication

    We had 'admin' tenant hardcoded, while for notifications we want other
    tenants to be allowed to do notifications via the CLI.

    DocImpact
    Related-Bug: #1371757

    Change-Id: I26399a62440710b63d8fec94213700fb03ab66e8

Reviewed: https://review.openstack.org/138718
Committed: https://git.openstack.org/cgit/stackforge/fuel-web/commit/?id=d46db29c63ac6d4295d9c43c7246523ed4d5bbed
Submitter: Jenkins
Branch: master

commit d46db29c63ac6d4295d9c43c7246523ed4d5bbed
Author: Przemyslaw Kaminski <email address hidden>
Date: Tue Jan 27 14:56:37 2015 +0100

    monitord user for safe API notification creation

    Added monitord user with password to astute.yaml file. This is used
    by Puppet manifests to create the 'monitord' user with 'monitord' role.
    This allows to safely access API's /notification resource.
    Usage of Keystone admin_token is not recommended nor even possible
    (Keystone doesn't treat admin_token as representing any user and so
    /notifications returns 401).

    Change-Id: I5b4fea9e6811c2d995f058b4a0a11025e04f33fb
    Partial-Bug: #1371757

Reviewed: https://review.openstack.org/137785
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=549b9cbe72f45c4148bf8c4be7311047bc45b1c7
Submitter: Jenkins
Branch: master

commit 549b9cbe72f45c4148bf8c4be7311047bc45b1c7
Author: Przemyslaw Kaminski <email address hidden>
Date: Fri Nov 28 14:21:07 2014 +0100

    Free disk check cron job for host monitoring

    * Added script for creating a notification for the UI about disk space
      running low
    * Added hourly cron job to run this script
    * Added user 'monitord' being a tenant of 'services' with role 'monitoring'.
      This user is necessary to create notifications via Nailgun API.

    This commit depends on python-fuelclient change:
    https://review.openstack.org/#/c/150744/

    Change-Id: I449e2d536330b8ad81bea508bb2f88907b7bf15d
    Closes-Bug: #1371757

Changed in fuel:
status: In Progress → Fix Committed
Changed in fuel:
status: Fix Committed → In Progress
Changed in fuel:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/157692
Committed: https://git.openstack.org/cgit/stackforge/fuel-library/commit/?id=b1eed9b3053653088c0c86bf082e1db132e31397
Submitter: Jenkins
Branch: master

commit b1eed9b3053653088c0c86bf082e1db132e31397
Author: Vladimir Kuklin <email address hidden>
Date: Sat May 16 13:53:54 2015 +0300

    Move free disk space checker to package

    Adds fuel upgrade config for monitd by putting
    Puppet resources into nailgun::host.

    New package fuel-notify adds configuration for
    monitd and a default config for the application.

    Change-Id: I4a5e63b64e7a6d662ebd1e0387c6a84d601b0ca5
    Related-Bug: #1371757

tags: added: release-notes-done
tags: added: on-verification
Sergey Novikov (snovikov) wrote :

Verified on fuel-6.1-478-2015-05-28_20-55-26.iso.

Steps to verify:
    1. Deploy Fuel Master node
    2. SSH to Fuel master node
    3. Check that user "monitord" being a tenant of 'services' with role 'monitoring' is exist
    4. Fill free disk space in /var > 90% total disk size.
    5. Check notification message "fuel notifications -a"

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
Download full text (45.4 KiB)

Reviewed: https://review.openstack.org/194961
Committed: https://git.openstack.org/cgit/stackforge/fuel-docs/commit/?id=0e26e7d7cc153d179ec34985645dd23cdd239ddb
Submitter: Jenkins
Branch: stable/6.1

commit 5cc5f0c643aebecaf3bf4580535a3ea7c3334a6c
Author: Mike Scherbakov <email address hidden>
Date: Tue Jun 23 13:43:35 2015 -0700

    Removed streamlined patching backend pieces

    Change-Id: I955e76ccdbd12a9145f4e9b689f80bdf9fcaf929

commit 563c4b5c78ebfcb1f4f91047c2919f6270f9a1d4
Author: Mike Scherbakov <email address hidden>
Date: Tue Jun 23 13:30:30 2015 -0700

    Removed outdated patching guide

    Change-Id: I76180c277789ade9c5ebedd19fe2092847c0b7d9

commit 8d120c14bec1ab41d448683ad146a3053a57c4ee
Author: Irina Povolotskaya <email address hidden>
Date: Tue Jun 23 19:59:11 2015 +0300

    Add dual hypervisor ref arch into 6.1 docs

    Change-Id: I900c24c9de878eafadbfc995aa879b7f55737fac

commit feebd1592d3305b64bbdfd0bc5fe108190aef120
Author: OlgaGusarenko <email address hidden>
Date: Tue Jun 23 18:38:17 2015 +0300

    [OPs guide] Running Ceilometer section edits

    1. conf file extract is updated
    2. note is updated

    Closes-bug: 1467817
    Change-Id: I0217e164108e0ba6c1397045a5e57d13ff429223

commit 44a93f9dead7511a3461ec35248dbb689c81eafd
Author: OlgaGusarenko <email address hidden>
Date: Tue Jun 23 18:04:40 2015 +0300

    [RN6_1] Final changes

    1. capitalization
    2. 2014.2 to 2014.2.2
    3. general improvements

    Change-Id: I45057e90c90550559f66bc67ccdf97a559fd9000

commit bb41389cae58084285688853281516b659686422
Author: evkonstantinov <email address hidden>
Date: Tue Jun 23 16:45:35 2015 +0300

    Update patching decription

    Update patching description with
    the standard Linux commands.

    Change-Id: Ia1a8346639c468fdfce15a11d2430bf3a4731244

commit bf3018fae3f2e564413d33aba6cdebf8868f0b4e
Author: OlgaGusarenko <email address hidden>
Date: Tue Jun 23 15:55:49 2015 +0300

    [RN6_1] Clean up

    1. Rearranges sections
    2. Improves RST
    3. Changes titles order

    Change-Id: I6110bf515667d3d6ba08ad35ff5d593dbc96641e

commit 1c7e4457808e8f2d6c56fdf31252170972e444b9
Author: Maria Zlatkova <email address hidden>
Date: Tue Jun 23 15:26:28 2015 +0300

    Replaces VBOX screenshots

    This patch:
    - replaces VBOX screenshots
    - changes the link for Download Mirantis VirtualBox scripts
     to https://docs.mirantis.com/openstack/fuel/fuel-master/#downloads

    Change-Id: I58dede960c5c3355d39b07ff44b757403f6af02c
    Closes-Bug: #1467872

commit 0a568bf53fc0e25d1d692d5d74b4a7b4d983bbcc
Author: evkonstantinov <email address hidden>
Date: Tue Jun 23 14:01:55 2015 +0300

    6.1 --separate repos

    change wording and add links to the
    separate repos feature.

    Change-Id: Ib5d0778a0d8f1534f79ed2f553574cb69a3150b0

commit 95a188b21cbdd064d92696b7920e6a0105fe0c56
Author: Maria Zlatkova <email address hidden>
Date: Tue Jun 23 12:07:28 2015 +0300

    Corrects the output 'pcs status'

    Changes the example outputs to appropriate ones.

    Change-Id: Ib6d83...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions