atop floods /run partition

Bug #1530167 reported by Alexey Lebedeff
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Fuel for OpenStack
Fix Released
High
Michael Polenchuk
7.0.x
Won't Fix
High
Michael Polenchuk
8.0.x
Fix Released
High
Michael Polenchuk

Bug Description

I'm running Mirantis OpenStack 7.0 in virtualbox using launch_8gb.sh script.
After few days /run partition became 100% full, and controller node became unfunctional due to this.
The root cause is atop that actively writes data to /run/atop/atop.acct - bad idea, given that /run is on tmpfs.
There is an unfixed upstream bug about this issue - https://bugs.launchpad.net/ubuntu/+source/atop/+bug/1393175

Changed in fuel:
milestone: none → 7.0-updates
importance: Undecided → High
status: New → Confirmed
tags: added: area-linux
Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

It looks a common problem.

atop uses /var/run/atop.acct (by default) to record accounting data about processes. Depending on system behaviour it may grow relatievely slow, or quite fast [1] (see screenshot [2]).

Although there is an opinion that accounting file should be rewritten daily I don't think it's a good idea, since in that case you loose some data that might be needed for debugging.

Anyway, the problem doesn't seems related to atop itself, but to the way it was started.

Reassigning to fuel-library team to let them fix puppet manifests.

[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=650222
[2] https://bugs.debian.org/cgi-bin/bugreport.cgi?att=1;bug=650222;msg=26;filename=650222-screenshot.png

Revision history for this message
Dmitry Teselkin (teselkin-d) wrote :

It is possible to use custom accounting file, passing it to atop via ATOPACCT environment variable (from man [1]):

---
With the environment variable ATOPACCT the name of a specific process accounting file can be specified (accounting should have been activated on beforehand). When this environment variable is present but its contents is empty, process accounting will not be used at all.
---

However, it's not that simple - it works as expected only when ATOPACCT set to empty variable - in this case accounting disabled. To collect accounting data in custom file additional actions required:
* file must be created first
* accounting must be turned on manually
* path to custom accounting file must be passed to atop via ATOPACCT environment varialble
* when atop stopped accounting must be turned off manually

In ubuntu 'accton' command (from package 'acct') is used to enable/disable accounting.

I'm attaching simple wrapper that automates all the step above.

[1] http://linux.die.net/man/1/atop

tags: added: area-library
removed: area-linux
tags: added: team-bugfix
Revision history for this message
Michael Polenchuk (mpolenchuk) wrote :

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=686329
<quote>
note that with 1.27, upstream changed the default location of the accounting file to /tmp/atop.d/atop.acct
</quote>

Revision history for this message
Dmitry Bilunov (dbilunov) wrote :

This issue may cause other problems if resolved in a straightforward manner just by moving acct file from tmpfs to an HDD — on systems with high process churn rate writing acct file causes high i/o usage. I have observed systems on which it caused production degradation; however the problems were caused by spawning thousands of processes each minute.

Another option is disabling acct mechanism — if you assign ATOPACCT environment variable to something like /dev/null, atop will run with accounting disabled ("no procacct" mode). It will still be able to collect valuable statistics; only events that happen either too fast or totally inside the sampling interval could possibly be dropped.

We should consider how much data would flow through acct interface (and written to the acct file if enabled).

Dmitry Pyzhov (dpyzhov)
no longer affects: fuel/mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (master)

Fix proposed to branch: master
Review: https://review.openstack.org/267643

Changed in fuel:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (master)

Reviewed: https://review.openstack.org/267643
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=57911011587fbf24b9e68b2a67fc3f6d1bef967e
Submitter: Jenkins
Branch: master

commit 57911011587fbf24b9e68b2a67fc3f6d1bef967e
Author: Michael Polenchuk <email address hidden>
Date: Thu Jan 14 17:57:57 2016 +0300

    Disable accounting procacct mode

    Turn off process accounting ('no procacct' mode).
    But atop still has an ability to collect valuable stats.

    Bring in internal "custom_acct_file" option:
    * false - use atop default accounting file
    * /path_to/atop.acct - custom one
    * undef - disable accounting procacct mode

    DocImpact: 'custom_accounting_file' is system wide process
    accounting file (valid values is above).

    Change-Id: Ida00dc663dd8c6494c479de2ae2f0f7ab6014a84
    Closes-Bug: #1530167

Changed in fuel:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/8.0)

Fix proposed to branch: stable/8.0
Review: https://review.openstack.org/270707

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to fuel-library (stable/8.0)

Reviewed: https://review.openstack.org/270707
Committed: https://git.openstack.org/cgit/openstack/fuel-library/commit/?id=45c261069386de07160d7416cdeceac09dc2f1c1
Submitter: Jenkins
Branch: stable/8.0

commit 45c261069386de07160d7416cdeceac09dc2f1c1
Author: Michael Polenchuk <email address hidden>
Date: Thu Jan 14 17:57:57 2016 +0300

    Disable accounting procacct mode

    Turn off process accounting ('no procacct' mode).
    But atop still has an ability to collect valuable stats.

    Bring in internal "custom_acct_file" option:
    * false - use atop default accounting file
    * /path_to/atop.acct - custom one
    * undef - disable accounting procacct mode

    DocImpact: 'custom_accounting_file' is system wide process
    accounting file (valid values is above).

    Change-Id: Ida00dc663dd8c6494c479de2ae2f0f7ab6014a84
    Closes-Bug: #1530167
    (cherry picked from commit 57911011587fbf24b9e68b2a67fc3f6d1bef967e)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to fuel-library (stable/7.0)

Fix proposed to branch: stable/7.0
Review: https://review.openstack.org/273025

tags: added: on-verification
Revision history for this message
Alexander Kurenyshev (akurenyshev) wrote :

Verified on iso 478.
Atop writes its logs to the /var/log/atop directory. Directory size increased insignificantly for 4 days.

tags: removed: on-verification
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on fuel-library (stable/7.0)

Change abandoned by Michael Polenchuk (<email address hidden>) on branch: stable/7.0
Review: https://review.openstack.org/273025
Reason: outdated

Revision history for this message
Serhii Ovsianikov (sovsianikov) wrote :

Why this fix was abandoned for 7.0? I always catch this bug on my test lab: Fuel 7.0, 1 controller, 3 ceph nodes, 1 LMA, 1 Elasticsearch, 1 Influxdb.

Please fix it and include the patch in the next MU for 7.0.

Thank you

root@node-15:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 990M 12K 990M 1% /dev
tmpfs 201M 2.9M 198M 99% /run
/dev/dm-3 15G 3.0G 11G 22% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 1001M 60M 942M 6% /run/shm
none 100M 0 100M 0% /run/user
/dev/sda3 196M 39M 148M 21% /boot
/dev/mapper/logs-log 9.8G 3.1G 6.2G 34% /var/log
/dev/mapper/mysql-root 20G 2.9G 16G 16% /var/lib/mysql
/dev/mapper/mongo-mongodb 139G 11G 121G 9% /var/lib/mongo
You have new mail in /var/mail/root

tags: added: support
tags: added: on-verification
Revision history for this message
Alexander Petrov (apetrov-n) wrote :

Verified on MOS 9.0 ISO 479

Changed in fuel:
status: Fix Committed → Fix Released
tags: removed: on-verification
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.