retracer-app FS / full

Bug #1676289 reported by Laurent Sesquès
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Daisy
Invalid
Undecided
Unassigned

Bug Description

Hi,

We're getting recurring alerts for FS / full on retracers, despite having 50G on / and an extra 20G /mnt.
They're solved by running retracers-cache-restart.sh -f.
/ is monitored and prompt-critical, so this should be avoided.

Thanks,
Laurent

Changed in daisy:
assignee: nobody → Brian Murray (brian-murray)
Revision history for this message
Brian Murray (brian-murray) wrote :

Is this in the production retracer environment? retracers-cache-restart.sh is run every 2 minutes with a threshold of 80% set so that alone should solve it but there is also a cronjob set to run the script with the "-f" switch every 6 hours. Looking at the retracers-cache-restart.log file I see the filesystem being cleaned up regularly.

017-03-27 06:40:01 [11031]: Not running clean up
2017-03-27 06:40:01 [11031]: Current disk usage (46%) less than threshold (80%)
2017-03-27 06:42:01 [12763]: Not running clean up
2017-03-27 06:42:01 [12763]: Current disk usage (46%) less than threshold (80%)
2017-03-27 06:44:01 [16475]: Not running clean up
2017-03-27 06:44:01 [16475]: Current disk usage (44%) less than threshold (80%)
2017-03-27 06:45:01 [17208]: Running clean up because '-f' was specified
2017-03-27 06:45:01 [17208]: Shutting down the retracers
2017-03-27 06:45:01 [17208]: Stopping retracer-amd64
2017-03-27 06:46:01 [17907]: Lockfile found, another process may already be running
2017-03-27 06:48:01 [19605]: Lockfile found, another process may already be running
retracer-amd64 stop/waiting
2017-03-27 06:48:02 [17208]: Stopping retracer-armhf
retracer-armhf stop/waiting
2017-03-27 06:48:02 [17208]: Stopping retracer-i386
retracer-i386 stop/waiting
2017-03-27 06:48:02 [17208]: Killing any remaining processes
2017-03-27 06:48:02 [17208]: Cleaning up cache directories
2017-03-27 06:48:03 [17208]: Cleaning out tmp directories
2017-03-27 06:48:04 [17208]: Cleaning out tmp directories more
2017-03-27 06:48:04 [17208]: Starting retracers back up
2017-03-27 06:48:04 [17208]: Starting retracer-amd64
retracer-amd64 start/running, process 19674
2017-03-27 06:48:04 [17208]: Started retracer-amd64
2017-03-27 06:48:04 [17208]: Starting retracer-armhf
retracer-armhf start/running, process 19688
2017-03-27 06:48:04 [17208]: Started retracer-armhf
2017-03-27 06:48:04 [17208]: Starting retracer-i386
retracer-i386 start/running, process 19701
2017-03-27 06:48:04 [17208]: Started retracer-i386
2017-03-27 06:48:04 [17208]: Done - back down to (10%)
2017-03-27 06:50:01 [23386]: Not running clean up
2017-03-27 06:50:01 [23386]: Current disk usage (15%) less than threshold (80%)
2017-03-27 06:52:01 [28003]: Not running clean up
2017-03-27 06:52:01 [28003]: Current disk usage (19%) less than threshold (80%)
2017-03-27 06:54:01 [30944]: Not running clean up
2017-03-27 06:54:01 [30944]: Current disk usage (19%) less than threshold (80%)
2017-03-27 06:56:01 [1780]: Not running clean up

Is there something I'm missing?

Changed in daisy:
status: New → Incomplete
Revision history for this message
Laurent Sesquès (sajoupa) wrote :

Hi,

Thanks Brian for having taken a look.
This is the production retracer environment, retracer-app/47.
I hadn't seen that this was cron'ed.
We've had recurring alerts on this unit since March 23rd.
Since that date, retracers-cache-restart.log is empty.
My guess was that the FS was full when it tried to run, hence not logging the attempt. But we would have been alerted and would have run retracers-cache-restart manually before the FS got full, so there must be something else. And it runs without -f every 2 minutes so we should definitely have logs.
/var/log/syslog has regular CRON entries, but none for the retracers restarts.

Anyway I looked in landscape for FS usage history. It was empty since March 26th.
I restarted the landscape daemon and it's now populating metrics history.
I ended up also restarting cron, the following */2 run generated logs.

So something was wrong on this machine, but I haven't figured out what yet.

But this doesn't look like a daisy bug.

Thanks,
Laurent

Revision history for this message
Brian Murray (brian-murray) wrote :

Cool, thanks for the additional information.

Changed in daisy:
assignee: Brian Murray (brian-murray) → nobody
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.