atop floods /run partition (causes no DNS server problem)

Bug #1393175 reported by Alexander Sashnov on 2014-11-16
48
This bug affects 11 people
Affects Status Importance Assigned to Milestone
atop (Ubuntu)
Undecided
Unassigned

Bug Description

atop floods /run partition (causes no DNS server problem)

Laptop HP-635. Uptime: 3 days 22 hours. With lot of times going to sleep mode.
Once I waked it up and realised there is a ping to gateway but no DNS.

In /var/log/syslog:
<pre>
Nov 16 17:33:09 hp-635 dnsmasq[1498]: using nameserver 192.168.1.1#53
Nov 16 17:33:09 hp-635 NetworkManager[775]: <warn> could not commit DNS changes: (0) Could not close /run/resolvconf/resolv.conf.tmp: No space left on device

$ df /run
Filesystem 1K-blocks Used Available Use% Mounted on
tmpfs 165380 165380 0 100% /run

$ sudo ls -lh /run/atop
total 160M
-rw------- 1 root root 160M Nov. 16 17:57 atop.acct

And it's not empty (21 mb compressed):
$ sudo cat /run/atop/atop.acct | gzip -1 | wc -c
21246654

$ ps aux | grep atop
root 1104 0.0 0.2 4312 4300 ? S<L Nov.12 1:26 /usr/bin/atop -a -w /var/log/atop/atop_20141112 600

$ ls -l /var/log/atop/atop_20141112
-rw-r--r-- 1 root root 32406566 Nov. 16 18:03 /var/log/atop/atop_20141112

$ sudo atop
ATOP - hp-635 2014/11/16 18:10:58 ------ 3d22h41m24s elapsed
PRC | sys 208m36s | user 540m27s | #proc 201 | #tslpu 0 | #zombie 0 | #exit 0 |
CPU | sys 14% | user 32% | irq 1% | idle 52% | wait 2% | avgscal 70% |
CPL | avg1 1.35 | avg5 1.71 | avg15 1.75 | csw 397021e3 | intr 13514e4 | numcpu 1 |
MEM | tot 1.6G | free 119.4M | cache 380.9M | dirty 0.5M | buff 20.0M | slab 47.2M |
SWP | tot 2.0G | free 1.6G | | | vmcom 5.0G | vmlim 2.8G |
PAG | scan 5637e3 | stall 0 | | | swin 155401 | swout 213990 |
DSK | sda | busy 5% | read 526441 | write 516347 | MBw/s 0.04 | avio 5.96 ms |
NET | transport | tcpi 3523380 | tcpo 2778206 | udpi 5285591 | udpo 5122297 | tcpao 117696 |
NET | network | ipi 13943896 | ipo 7881203 | ipfrw 0 | deliv 8922e3 | icmpo 3367 |
NET | eth0 1% | pcki 5759242 | pcko 616485 | si 177 Kbps | so 3 Kbps | erro 0 |
NET | teredo 0% | pcki 10021 | pcko 13922 | si 0 Kbps | so 0 Kbps | erro 0 |
NET | wlan0 ---- | pcki 7035683 | pcko 6327205 | si 58 Kbps | so 53 Kbps | erro 0 |
NET | lo ---- | pcki 108979 | pcko 108979 | si 1 Kbps | so 1 Kbps | erro 0 |
                             *** system and process activity since boot ***
  PID RUID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/29
 4324 alex 93 23m53s 258m17s 1.8G 419.6M 1.3G 3.9G N- - S 0 13% firefox
 1288 root 1 95m55s 86m55s 561.8M 267.1M 146.6M 14556K N- - S 0 8% Xorg
25856 alex 6 50m35s 125m14s 278.9M 28088K 3.3G 3.5G N- - S 0 8% transmission-g
 3230 alex 22 14m53s 33m03s 705.3M 78964K 271.2M 485.2M N- - S 0 2% skype
 2389 alex 5 9m43s 9m13s 98.0M 9348K 87760K 1072K N- - S 0 1% rescuetime
 2640 alex 4 17.47s 3m39s 254.8M 27364K 995.3M 733.7M N- - S 0 0% gnome-panel
 2573 alex 4 95.98s 2m13s 171.2M 4116K 17732K 104K N- - S 0 0% pulseaudio

</pre>

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: atop 1.26-2
ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
Uname: Linux 3.13.0-24-generic i686
NonfreeKernelModules: fglrx
ApportVersion: 2.14.1-0ubuntu3.5
Architecture: i386
CurrentDesktop: Unity
Date: Sun Nov 16 17:48:09 2014
InstallationDate: Installed on 2014-03-22 (239 days ago)
InstallationMedia: Ubuntu-GNOME 14.04 "Trusty Tahr" - Alpha i386 (20140226)
SourcePackage: atop
UpgradeStatus: No upgrade log present (probably fresh install)

Alexander Sashnov (sashnov) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in atop (Ubuntu):
status: New → Confirmed
Bogdan Ilisei (znuff) wrote :

Can confirm this happening on multiple systems.

On a default ubuntu (or ubuntu-server) install, /run is usually mounted with size=10%, this fills up virtual machines very fast (where you usually don't allocate much memory).

Please move the log file to a different directory.

Viktor Szathmáry (phraktle) wrote :

The growth of this file depends on system activity. If there are a lot of processes launched, it can fill up the /run tmpfs relatively quickly.

Any plans on addressing this?

Yarden Bar (ayash-jorden) wrote :

Hi,
Is there any progress here?

Yarden Bar (ayash-jorden) wrote :

Hi Again,
Installing atop from src seems to solve the issue as the source is of version 2.2 (http://www.atoptool.nl/downloadatop.php)

Hope this helps :)

urraca (urraca) wrote :

This affects us as well.

The issue is caused by this patch (AFAICT):
https://anonscm.debian.org/cgit/collab-maint/atop.git/tree/debian/patches/var-run?h=wheezy
or
https://anonscm.debian.org/cgit/collab-maint/atop.git/tree/debian/patches/var-run?h=master
respectively (look for ACCTDIR).

Note that I doubt the original decision to put the accounting file in /tmp as well (ran out of /tmp-space a few times in the past on very busy servers), but a RAM-disk is certainly a bad place to keep it!

Why not make it /var/tmp, which usually has plenty of space and is reviewed by admins on a regular base anyway?

urraca (urraca) wrote :

Additionally, the problem does not seem to appear when the cronjob actually runs. The PKG should trigger a crond reload after installation!

Cedders (cedric-gn) wrote :

This affected me on a laptop using atop:i386 1.26-2 on Ubuntu 14.04.5 (upgraded from 12.04 I think(. I'd woken the laptop shortly before midnight, and at about 3am, the wireless connection was up but with no DNS.

$ sudo dnsmasq

dnsmasq: failed to open pidfile /var/run/dnsmasq.pid: No space left on device

There was 300M in /run/atop/atop.acct

root 4999 1 0 981 1948 0 00:00 ? 00:01:35 /usr/bin/atop -a -w /var/log/atop/atop_20171029 600

This was in /etc/cron.d/atop
# start atop daily at midnight
0 0 * * * root invoke-rc.d atop _cron

So is this a packaging problem? I'd agree with urraca that /var/run (symlink to tmpfs /run) is the wrong place. How about /tmp?

Vladimir Smolensky (arizal) wrote :

Having the same issue, affected systems are ubuntu 14.04 and probably 16.04(not sure about that). The problem occurs occasionally, probably caused by some sort of activity on the machine.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers