java version of monasca persister appears to have memory leak
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Monasca |
Triaged
|
Undecided
|
Unassigned |
Bug Description
I've seen this a couple of times, and this time i think i've learned more. we have 3 nodes in our cluster, and all three monasca-persister processes weren't writing to the db -- vertica in our case. the systems were running out of memory and swapping a bunch to disk. i wish i'd looked at how much memory was being consumed by the 3 persister processes, but restarting them freed up about 10G of memory on each system, and writing to the db started again.
when i did a jstack <pid>, there were no threads that had vertica in their info (healthy jstacks of the jvm do). unfortunately, kafka consumer lag wasn't piling up -- which is a bummer because you can't detect this is happening prior to actually losing data. so messages were being consumed from kafka, but never making it to the db.
all 3 persister processes were in this state, so it's as though they couldn't decide who should persist, and nobody was.
if/when this happens again i can get more info while in this state -- if you we can think of what to look at.
Triaged - need more data if this can be reproduced