Canonical Juju

juju state server mongod uses too much disk space

Bug #1492237 reported by JuanJo Ciarlante on 2015-09-04

This bug affects 14 people

Affects		Status	Importance	Assigned to	Milestone
	Canonical Juju	Won't Fix	High	Unassigned
	juju-core	Won't Fix	Critical	Unassigned

Bug Description

E.g. in our nonHA openstack deployment with 49 principal units
and 232 subordinates, /var/lib/juju/db uses 4.2G which seems
rather excessive for this small deployment.

Tags:

JuanJo Ciarlante (jjo) on 2015-09-04

tags:

added: canonical-bootstack

Revision history for this message

Menno Finlay-Smits (menno.smits) wrote on 2015-09-05:

Which version of Juju is this environment running? There were problems in 1.23 and early 1.24 releases that would cause unreasonable mongodb disk usage. Upgrading to 1.24 should automatically free up the unnecessary disk space over time.

Upgrading to 1.24.5 or later is highly recommended.

See this thread for more information: https://<email address hidden>/msg02783.html

Revision history for this message

JuanJo Ciarlante (jjo) wrote on 2015-09-07:

Apologies I forgot to provide juju version - it was 1.23.3 which got upgraded
to 1.24.5 ( we found about diskspace usage during while waiting for 'juju backups'
to complete - unfortunately upgrading to 1.24.5 didn't fix the issue:

* 1.23.3 (2015-09-03)
# du -sch /var/lib/juju/db
4.2G /var/lib/juju/db
4.2G total

* 1.24.5 (2015-09-07)
# du -sch /var/lib/juju/db/
5.3G /var/lib/juju/db/
5.3G total

Revision history for this message

JuanJo Ciarlante (jjo) wrote on 2015-09-07:

hm ... now I actually read your 'over time',
FTR detailed per file disk usage atm:
http://paste.ubuntu.com/12303015/

Revision history for this message

Menno Finlay-Smits (menno.smits) wrote on 2015-09-07:

I've just reviewed this part of the code. The txn pruning functionality runs every 2 hours so there should be a reduction in disk space 2 hours after the state servers upgraded to 1.24.

If the environment is set to log at INFO or DEBUG you should see logs related to the pruning activity. Search for "prun" in the logs (to catch both "prune" and "pruning"). If logging is set to ERROR any serious problems with the pruning functionality should still be present.

Revision history for this message

JuanJo Ciarlante (jjo) wrote on 2015-09-07:

grep prun /var/log/juju/machine-0.log -> http://paste.ubuntu.com/12307044/ ,
shows 'starting worker "txnpruner"', but no 'prunning' message fwiw.

Curtis Hovey (sinzui) on 2015-09-08

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → High
milestone:	none → 1.26.0
tags:	added: mongodb

Revision history for this message

Michael Foley (foli) wrote on 2015-11-18:

I was able to reduce the size of juju DB on a bootstrap node by following instrucions from http://stackoverflow.com/questions/2966687/reducing-mongodb-database-file-size

This was with juju version 1.24.5 with a 4GB ephemeral disk mounted on /mnt.

Example of commands I used:
ps aux | grep mongo
stop juju-db
# Copy commandline output from ps and modify to have --nojournal --repair --repairpath /mnt
/usr/lib/juju/bin/mongod --auth --dbpath /var/lib/juju/db --sslOnNormalPorts --sslPEMKeyFile /var/lib/juju/server.pem --sslPEMKeyPassword xxxxxxx --port 37017 --noprealloc --syslog --smallfiles --nojournal --keyFile /var/lib/juju/shared-secret --replSet juju --ipv6 --oplogSize 512 --repair --repairpath /mnt
start juju-db

I then restarted all jujud for good measure

Cheryl Jennings (cherylj) on 2015-12-07

Changed in juju-core:
milestone:	1.26.0 → 2.0-beta5

Cheryl Jennings (cherylj) on 2016-03-07

Changed in juju-core:
milestone:	2.0-beta5 → 2.0-beta4

Cheryl Jennings (cherylj) on 2016-04-02

Changed in juju-core:
milestone:	2.0-beta4 → 2.1.0

Canonical Juju QA Bot (juju-qa-bot) on 2016-08-22

affects:	juju-core → juju
Changed in juju:
milestone:	2.1.0 → none
milestone:	none → 2.1.0

Revision history for this message

Greg Lutostanski (lutostag) wrote on 2016-11-10:

Hit this. We have 5.2 GB in /var/lib/juju/db and another 1.6 GB in /var/lib/juju/charm-get-cache. We bootstrapped to a virtual node with a 10GB disk, so it often runs out of space and we need to find random folders to delete. Either reducing space as a whole or finding a way to vacuum unused disk space for long-term deployments would be necessary.

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2016-11-17:

+1 Ran into exactly lutostag's situation in OpenStack Charm CI.

tags:

added: uosci

Greg Lutostanski (lutostag) on 2016-11-17

tags:

added: oil-2.0

Revision history for this message

Jacek Nykis (jacekn) wrote on 2016-12-20:

I've just hit this problem in a production environment.

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-01-04:

#10

FYI, I've recorded my experience in working around this more permanently in: https://bugs.launchpad.net/juju-core/+bug/1634390

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-01-04:

#11

Added juju-core as this impacts long-running production workloads, which at this point in time are vastly on 1.25.x, with no upgrade path to 2.x.

Anastasia (anastasia-macmood) on 2017-01-10

Changed in juju-core:
status:	New → Triaged
importance:	Undecided → Critical
milestone:	none → 1.25.10

Curtis Hovey (sinzui) on 2017-01-27

Changed in juju-core:
milestone:	1.25.10 → none

Anastasia (anastasia-macmood) on 2017-01-30

Changed in juju-core:
milestone:	none → 1.25.11

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-02-10:

#12

@Ryan, @Jacek,
Since this bug has been originally filed, we have address many performance related issues in both 1.25 (1.25.10 most recently) and in 2.0 (2.0.3). Are you still seeing this in later versions of Juju?

Since bug # 1634390 has a workaround for this specific issue, can this one be considered its duplicate?

If the answers to the above questions make this a standalone current bug, could you please clarify your juju versions?

Changed in juju:
status:	Triaged → Incomplete
Changed in juju-core:
status:	Triaged → Incomplete
Changed in juju:
milestone:	2.1.0 → none
Changed in juju-core:
milestone:	1.25.11 → none

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-02-10:

#14

1634390 is not a duplicate of this. 1634390 is specifically about Juju not working when /var is on a separate partition, and that should be considered a common and valid enterprise use case.

The one long-running Juju 1.x deployment I have isn't on 1.25.10 yet, so I cannot confirm whether or not the juju db size is reasonable for such a small deployment. But the best I can offer is a Juju upgrade at some point, where the juju db is already huge.

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-02-10:

#15

@Ryan,
I'll take this offer :)
Please re-tests with 1.25.10 when you can.
Feel free to re-open the bug if mongod still misbehaves for you.

Revision history for this message

Paul Gear (paulgear) wrote on 2017-02-11:

#16

Are we to expect that existing long-running juju 1.25.x installations will have their disk usage automatically reduced?

Revision history for this message

Ryan Beisner (1chb1n) wrote on 2017-02-13:

#17

@anastasia - apologies, afaik there isn't really a test, other than to use Juju for the long haul (perhaps soak test if juju core has that?).

I don't and likely won't have a new long-running 1.25.10 model to get a clean behavior report to you. The long-running 1.25.x models I have will be upgraded to 1.25.10 at some point. However, they've already been worked-around by giving the controller a ton of disk space.

Haw Loeung (hloeung) on 2017-04-13

Changed in juju:
status:	Incomplete → Confirmed
Changed in juju-core:
status:	Incomplete → Confirmed

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-04-13:

#18

@Haw Loeung (hloeung),
We have addressed a lot of mongo related issues on 2.2, including moving to more efficient and robust version of Mongo as well as updated algorithms for pruning and cleaning transactions and caches.

We have also updated MgoPurge tool to be used with installations that run previous versions of Juju. Please take advantage of new functionality of MgoPurge to resolve your current observed db space growth:

A. `mgopurge -password "...." -yes -stages prune`

- This can be run on live system. In fact, we'd recommend running this periodically as cron job. From 2.2, similar functionality will be run every couple of hours by Juju itself.
- In HA setup, this can be run on any node, preferably on primary.

B. `mgopurge -password "...." -yes -stages compact`

This will give space back to the operation system. However:
- It does lock database and takes time proportional to database size.
- It is best to shut the controllers prior to running this command.
- It requires free space equal to the current size of the database + another 2GB.
- In HA setup, you may need to run this on each node.

Note that this functionality is compatible with Mongo 2.4 (for Juju 1.25x) and 3.2 (for Juju 2.0x and 2.1x)

I am marking this report as Won't Fix - we are not planning any further releases for Juju series prior to 2.2.

Changed in juju:
status:	Confirmed → Won't Fix
Changed in juju-core:
status:	Confirmed → Won't Fix

Revision history for this message

Haw Loeung (hloeung) wrote on 2017-04-13:

#19

mgopurge 1.4 does indeed help.

Before (du output):

| 3.9G ./db

Before (mongo show databases etc.):

| juju 1.999267578125GB
| juju.txns, storage = 1404 MB, index = 0 MB, total = 1404 MB

mgopurge output snippet:

| 2017-04-13 05:52:22 0 txns are still referenced and will be kept
| 2017-04-13 05:52:22 pruning completed: removed 1637 txns
| 2017-04-13 05:52:22 Running stage "compact": Compact database to release disk space (does not
compact replicas)
| 2017-04-13 05:52:22 detected mmapv1 storage engine: using repairDatabase to compact

Now (du output):

| 1.4G ./db

Now (mongo show databases etc.):

| juju 0.0625GB
| juju.txns.log, storage = 10 MB, index = 1 MB, total = 11 MB

Thanks.

Revision history for this message

Anastasia (anastasia-macmood) wrote on 2017-04-13:

#20

@Haw Loeung (hloeung),

Thank you for update \o/ It's great to know and we appreciate the feedback!

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #1489346

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.