juju-core

Bug #1344940
Comment #10

Comment 10 for bug 1344940

Revision history for this message

Mark Shuttleworth (sabdfl) wrote on 2014-07-21: Re: [Bug 1344940] Re: Juju state server database is overly large

#10

On 21/07/14 01:00, Ian Booth wrote:
> The initial op log size is calculated according to the Mongo recommendations:
> http://docs.mongodb.org/manual/core/replica-set-oplog/
>
> On 64bit Linux systems, it is 5% of the free disk space, but never less than 1GB and never more than 50GB.
> So as Kapil says, on AWS deployments with an 8GB root disk, the op log size will be 1GB.
>
> But for systems with a very big root disk, the size will indeed max out to 50GB. This is as per the documented Mongo behaviour.
> What Juju does is pre-allocate the oplog prior to starting Mongo to reduce the start up delay and subsequent timing issues which were causing machine agent flakiness. But the pre-allocation is done according the the Mongo algorithm used for the file size.

That Mongo algorithm may make sense for "generic Mongo likely to be big
data" but isn't it true that we know the likely behaviour of the
database much more accurately?

In a world without multi-tenanted multi-environment state servers the
"natural" size of the database is likely to be relatively small. If one
is bootstrapping a next-gen, multi-tenant multi-environment state
service then one could say so and we might apply a different algorithm.

Even 1GB seems big to me.

> On local provider, we do actually reduce the initial oplog size to
> improve startup times, but didn't feel the same decision to move away
> from the Mongo defaults would be pertinent for real cloud deployments.

Let's please take a more opinionated view of what a single-environment
database size should look like, anywhere!

The consequences of this decision are a 51G db on a reasonably sized
server on which I'd just like to run the state service for an OpenStack,
which is obviously unacceptable.

> The real issue is the fact that database writes appear to be occurring
> without an actual model change occurring. I've raised a separate bug for
> this, bug 1345832

Thank you! And I see Andrew is right on it :)

> The other 3 issues mentioned in this bug:
>
> 1. Mongo oplog size
> I'd like to leave this the way it is since we're following Mongo recommendations (unless there's a technical reason not to and we get agreement from Kapil and William and whoever else that we should change).

I think Mongo is pitching their database as a repo for streamed, low
value data. That's not how we are using it. So I don't think their
recommendations make sense. Let's go the other way: go for the smallest
(and hence lightest footprint) size we think is reasonable. If needed,
put in place some sort of reporting so we can see how actual databases
in the wild are behaving and come up with an appropriate sizing
algorithm for our application. Regardless, the current position is
unacceptable.

Thanks!
Mark

On 21/07/14 01:00, Ian Booth wrote:
> The initial op log size is calculated according to the Mongo recommendations:
> http://docs.mongodb.org/manual/core/replica-set-oplog/
>
> On 64bit Linux systems, it is 5% of the free disk space, but never less than 1GB and never more than 50GB. 
> So as Kapil says, on AWS deployments with an 8GB root disk, the op log size will be 1GB.
>
> But for systems with a very big root disk, the size will indeed max out to 50GB. This is as per the documented Mongo behaviour.
> What Juju does is pre-allocate the oplog prior to starting Mongo to reduce the start up delay and subsequent timing issues which were causing machine agent flakiness. But the pre-allocation is done according the the Mongo algorithm used for the file size.