However, we're still seeing problems. I'm begging to wonder if we're trying to insert too much data into a single row at once, hitting limits in Thrift. It would be extremely helpful to see a number of recent exceptions. If they're all saying "connection reset by peer" triggered from stack_fam.insert, then I'm pretty sure we need to split apart these large inserts.
Unfortunately I cannot check this myself as the log syncing to snakefruit does not appear to be working.
We now specify multiple hosts for the retracers to talk to and Tom has confirmed that we're seeing much more balanced traffic:
https:/ /pastebin. canonical. com/73229/
However, we're still seeing problems. I'm begging to wonder if we're trying to insert too much data into a single row at once, hitting limits in Thrift. It would be extremely helpful to see a number of recent exceptions. If they're all saying "connection reset by peer" triggered from stack_fam.insert, then I'm pretty sure we need to split apart these large inserts.
Unfortunately I cannot check this myself as the log syncing to snakefruit does not appear to be working.