Daisy

retracer requires a lot of storage for instance sandboxes

Bug #1295400 reported by Brian Murray on 2014-03-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Daisy	New	High	Unassigned

Bug Description

I'm fairly certain that the cleanup of caches used by the retracers is never done. In setup_cache of retracer.py we can see the following:

instance_sandbox = tempfile.mkdtemp(prefix='cache-', dir=sandbox_release)
atexit.register(shutil.rmtree, instance_sandbox)

So when retracer.py exits the the instance_sandbox directory should be deleted. However, this isn't happening and the sandbox directories are growing quite large. Running the retracer manually, and pressing "Ctrl C" to interrupt the retracer we see "Shutting down." (also using atexit.register) in the retracer log file corresponding to the architecture.

However, running the retracer via the upstart job there is no such "Shutting down" log message leading me to believe that atexit never occurs and this is why the cleanup of caches is not being done regularly.

Related branches

lp:~brian-murray/daisy/remove-sandbox-dir-after-retrace

Merged into lp:daisy

Evan (community): Approve on 2014-04-04

Brian Murray (brian-murray) on 2014-03-20

Changed in daisy:
importance:	Undecided → High

Revision history for this message

Brian Murray (brian-murray) wrote on 2014-03-20:

Actually, it seems more likely that the retracers are taking too long to exit and so then they get killed. From the cookbook:

Upstart waits for up to kill timeout seconds (default 5 seconds) for the process to end.

If the process is still running after the timeout, a SIGKILL signal is sent to the process which cannot be ignored and will forcibly stop the processes in the process group.

So perhaps we should increase the kill timeout for the upstart jobs.

Revision history for this message

Brian Murray (brian-murray) wrote on 2014-03-21:

In addition to increasing the kill timeout I also think the instance_sandbox directories should be remove when ever the retracing process exits, so that disk space needs are kept to a minimum.

Revision history for this message

Brian Murray (brian-murray) wrote on 2014-03-21:

I think I misread a bit code of did not notice that setup_cache will return if a sandbox has already been setup for the release. So any one retracer process will have one sandbox per release, its just that those grow and grow as newer and different packages are downloaded.

Brian Murray (brian-murray) on 2014-03-21

summary:

- cleanup of instance sandboxes is never done
+ retracer requires a lot of storage or instance sandboxes

Revision history for this message

Brian Murray (brian-murray) wrote on 2014-03-21:

Speaking to David Ames about this he thinks the sandbox dir in $release-name/cache-$tmpdir/ is consuming the most space so I've submitted a merge proposal that removes that after every retrace. We'll keep the cache dir in there though as it contains downloaded deb files which are useful for other retrace attempts.

However, for Trusty it'd make sense to clean those up regularly as new packages are created frequently.

summary:

- retracer requires a lot of storage or instance sandboxes
+ retracer requires a lot of storage for instance sandboxes

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.