presence collection grows without bound

Bug #1454661 reported by John A Meinel
20
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
John A Meinel
juju-core
Won't Fix
Low
Unassigned

Bug Description

presence.pings and presence.beings both grow without cleaning up old data.

beings maps from a Entity into a unique sequence id for a live Pinger and pings tracks the actual "what sequence was active at what time".

For "liveness" checking we only ever look at "slot" and "slot-period", where our period is 30s. (was it alive in the previous slot, or the current slot).
Certainly we don't need to keep the liveness ping from 20 days ago, perhaps we could expire them after 1 day? (we don't seem to *need* the data from more than 1 minute ago.)

For "beings" we only actually need to keep the set of active Pingers, and there *should* only be one Pinger for a given entity. Potentially if there was some confusion and a net split so the agent reconnected before we noticed the disconnect we could have a couple. Even so, we don't need to keep 500 old possible sequence numbers for a given Entity.
Again, for "liveness" we only trust the latest sequence, we just allow the old ones because we don't ever want 2 pingers on the same Sequence (overflow would cause corruption).
However it seems straightforward to just limit the maximum number of sequence numbers that we track for a given entity, and if we get a Ping request for a sequence that we don't know about it, we could just ignore it. (log a warning?)

Changed in juju-core:
status: Triaged → Won't Fix
Junien F (axino)
Changed in juju-core:
status: Won't Fix → New
importance: Low → High
tags: added: canonical-is
Revision history for this message
Junien F (axino) wrote :

Hi,

Not purging the presence collections leads to mongodb performance issues that I believe are one of the causes of the high CPU usage by jujud and mongod, leading to overall performance issues.

We have a juju 2.1.1 controller running about 40 models, and a lot of time is spent reading the presence.beings collections : https://paste.ubuntu.com/24145949/

Some queries are returning over 20k items, and over 3MB of data ! https://paste.ubuntu.com/24145956/

The distribution by of items per model is the following : https://paste.ubuntu.com/24145961/

I believe fixing this will make juju2 more stable (and I don't really understand why this was triaged as "Won't fix").

Also, the presence.beings collection doesn't have any index by default (on our database, we have added one as recommended by axw).

Thanks !

Revision history for this message
Junien F (axino) wrote :

Retargeting to juju (for juju2) - I guess I now understand why this was marked as won't fix :)

Changed in juju-core:
status: New → Won't Fix
importance: High → Low
Revision history for this message
Junien F (axino) wrote :

I forgot to say, the "beings" collection has over 250k items, and the "pings" collection, nearly 5,5M.

juju:PRIMARY> db.presence.beings.count()
251048
juju:PRIMARY> db.presence.pings.count()
5410691

Revision history for this message
John A Meinel (jameinel) wrote :

At this point we are discussion changing how we manage presence entirely to avoid the database. Which would prevent this from being an issue. We might consider doing a quick fix here until we're ready to move. This bug was certainly a hypothetical from looking at the structure of the code but it appears to be an active issue. (It should be less of an issue if controllers are stable as then sequences at least don't keep incrementing.)

Changed in juju:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Junien F (axino) wrote :
Download full text (9.3 KiB)

A quick fix would be very welcome, our mongodb instance is spending quite some time in this collection :

2017-04-06T07:52:41.671+0000 connected to: 127.0.0.1:37017

                      ns total read write 2017-04-06T08:02:41Z
          local.oplog.rs 704910ms 84191ms 620719ms
    juju.statuseshistory 296379ms 15636ms 280742ms
               juju.txns 278067ms 182099ms 95967ms
presence.presence.beings 184857ms 184857ms 0ms
           juju.txns.log 125199ms 83496ms 41703ms
             juju.leases 85285ms 36969ms 48316ms
 presence.presence.pings 71197ms 5527ms 65669ms
        juju.controllers 42361ms 42351ms 9ms
           juju.statuses 38212ms 29837ms 8374ms
              juju.units 35639ms 35639ms 0ms

                      ns total read write 2017-04-06T08:12:41Z
          local.oplog.rs 610708ms 82364ms 528344ms
               juju.txns 247024ms 165533ms 81491ms
    juju.statuseshistory 240834ms 14157ms 226676ms
presence.presence.beings 186170ms 186170ms 0ms
           juju.txns.log 95149ms 78652ms 16496ms
             juju.leases 79514ms 35188ms 44325ms
           juju.statuses 32915ms 24744ms 8170ms
 presence.presence.pings 24804ms 5203ms 19600ms
        juju.controllers 21349ms 21340ms 8ms
              juju.units 21204ms 21204ms 0ms

                      ns total read write 2017-04-06T08:22:41Z
          local.oplog.rs 703049ms 81894ms 621154ms
               juju.txns 213787ms 142622ms 71165ms
presence.presence.beings 163508ms 163508ms 0ms
    juju.statuseshistory 144047ms 14350ms 129697ms
           juju.txns.log 89178ms 75184ms 13994ms
             juju.leases 67510ms 30105ms 37404ms
 presence.presence.pings 29974ms 5673ms 24300ms
           juju.statuses 26888ms 19873ms 7015ms
        juju.controllers 24915ms 24907ms 7ms
              juju.units 18825ms 18825ms 0ms

                      ns total read write 2017-04-06T08:32:41Z
          local.oplog.rs 703089ms 80166ms 622923ms
    juju.statuseshistory 226335ms 14032ms ...

Read more...

Revision history for this message
John A Meinel (jameinel) wrote :

Some thoughts on this:
1) If you can afford the downtime, bringing down the controller and just dropping presence.presence and presence.beings should be fine. The design of the system is such that it would be catastrophic if 2 running pingers used the same identifier (they would increment the same field which would cause overflow potentially changing the live status of lots of other agents). As such, everything always asks for a new runtime index whenever they connect to a controller. That's why presence.beings grows to be so large, but it should mean that as long as you go offline, you can drop everything, and when they reconnect they'll start back at the beginning.

2) We should be able to have a live process that cleans out presence.pings, as we should only ever look at a very narrow window of the data. We're considering how to integrate that into our existing pruning logic now.

3) pruning presence.beings is a lot trickier because of that 'nobody can ping to the same slot'. If we could be confident across all HA machines that nobody connected is using a value, then we should be able to drop that value. The problem in HA is that you have to coordinate who is connected, and if we're doing that, we've already solved the problem of getting rid of the presence database entirely.

4) It also seems presence.beings is the much bigger issue here. There was an optimization put in at one point to load all of the beings on startup. This was because it was showed up as very slow to load them one-by-one on demand. However:
 a) given long running controllers and unbounded growth, likely the size of the active set is now << the total size, such that one-by-one ends up faster.
 b) I have a sneaking suspicion that it is no longer just loaded 1 time at startup. The presence table is now partitioned by model-uuid, which means there isn't 1 singleton responsible for tracking all of the units that will ever connect. Which means we likely re-read all of the beings for a model every time we create a State object for that model, which probably triggers on every new connection.

5) We have a lot of repeats on beings mappings. The production DB has 300k entries for 103 models and (machine 0 of what I'm assuming is the controller has 991 entries)

6) The actual docs themselves got a lot bigger. While they used to be 2 small values (an integer and a short key) they now embed 2 copies of the model-uuid (one in the _id field and one in the model-uuid field).

7) It feels like we could do something with "if they aren't active in the last N pings, and they aren't the highest value for a key, remove the being entry". We could even be conservative and save the last N values. I don't think we have to, though, because the actual "is this alive" logic only respects whoever was the last person to connect as 'key'. So we might lose track of a given being and not be able to map back to who we thought was pinging, but we wouldn't treat that agent as alive anyway.

Changed in juju:
status: Triaged → In Progress
assignee: nobody → Menno Smits (menno.smits)
Revision history for this message
Menno Finlay-Smits (menno.smits) wrote :

https://github.com/juju/juju/pull/7230 takes care of #6 above (low hanging fruit and also speeds up loading of beings docs by using an index).

Revision history for this message
Junien F (axino) wrote :

FWIW, I was able to afford the downtime and drop the "presence" database. This helped a lot, and is an OK workaround for some environments.

John A Meinel (jameinel)
Changed in juju:
assignee: Menno Smits (menno.smits) → John A Meinel (jameinel)
Revision history for this message
John A Meinel (jameinel) wrote :

https://github.com/juju/juju/pull/7248

creates pruning functionality and runs it live.

Changed in juju:
milestone: none → 2.2-beta3
John A Meinel (jameinel)
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.