[customer-bp] Shotgun should ensure enough disk space for diagnostic snapshot

Bug #1328879 reported by Tomasz Adam Jaroszewski on 2014-06-11
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
High
Bogdan Dobrelya

Bug Description

Based on customer's support ticket.

Customer has noticed that diagnostic snapshot generator is not checking free disk space before and while generating snaps. It can lead to filling whole available space and prevents snaps from being generated.
When space is filled customer is not receive any notification - 'never ending "Loading...' from Fuel (shotgun).

This should be fixed in 2 parts:
1 - If any part of the shotgun task fails, mcollective needs to be notified that it failed and can report it back to the user.
2 - Ensure minimum 500mb free disk space wherever the dump directory is located. Ideally 5gb free, but this will require a little math to ensure we can accommodate large environments.

summary: - [Fuel] Diagnostic snapshot generator (shotgun) is not checking
- free/available space for snaps
+ [Fuel] Shotgun should ensure enough disk space for diagnostic snapshot
Changed in fuel:
milestone: none → 5.1
status: New → Confirmed
importance: Undecided → High
assignee: nobody → Fuel Python Team (fuel-python)
tags: added: customer-found
Tomasz Bartos (tbartos) wrote :

2. Since diagnostic snapshot contain
/var/log/remote
Check shound't have fixed variable.

It should be at least size of /var/log/remote + some space for configs.
So /var/log/remote space + 500MB or something like
/var/log/remote size + 20-30MB * number of nodes in cluster.

Dima Shulyak (dshulyak) on 2014-06-19
tags: added: shotgun
Changed in fuel:
assignee: Fuel Python Team (fuel-python) → Vladimir Kozhukalov (kozhukalov)

It is not a shotgun issue to check if enough free space is available. Shotgun is a tool to gather some files from a bunch of nodes. It is like enforcing Mysql to check if enough free space is available before write data into a table. It is rather monitoring issue. We really need to have master node monitoring which will check free space among other things and will notify user.

About failing. Lots of shotgun tasks can fail during snapshot and it was one of the most important requirements not to fail if one of tasks fails. That is why we have try-except wrappers wherever possible. Anyway user can take a look in shotgun log in order to figure out what is going on.

As a fast workaround we can:
1) expose shotgun log on web UI
2) add additional parameter into shotgun config. something like minspace=1G and run df command before making snapshot.

Another major point here is to configure logrotate on master node so as to make sure old log directories are removed.

Changed in fuel:
status: Confirmed → Invalid
Dmitry Pyzhov (dpyzhov) on 2014-08-14
summary: - [Fuel] Shotgun should ensure enough disk space for diagnostic snapshot
+ [customer-bp] Shotgun should ensure enough disk space for diagnostic
+ snapshot
Changed in fuel:
milestone: 5.1 → 6.0
assignee: Vladimir Kozhukalov (kozhukalov) → Bogdan Dobrelya (bogdando)
status: Invalid → Confirmed
Bogdan Dobrelya (bogdando) wrote :

That is actually a feature request, and is superceeded by https://blueprints.launchpad.net/fuel/+spec/manage-logs-with-free-space-consideration

Changed in fuel:
status: Confirmed → Won't Fix
tags: added: release-notes
tags: added: module-shotgun
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers