Comment 3 for bug 1036927

Revision history for this message
Evan (ev) wrote : Re: [Bug 1036927] Re: i386 retracers crashed

On Fri, Aug 24, 2012 at 12:33 PM, JuanJo Ciarlante
<email address hidden> wrote:
> FTR happened again (amd64)

I've been looking into this most of the day and I am so far at a loss
for what could be causing it, other than the retracers are
successfully running against 12.04 as of the 15th (they were broken
between 6/18 and 8/15).

We're writing and reading everywhere at ConsistencyLevel one. So it
seems really odd that we're getting timeouts when talking to three
separate nodes. We're not running any queries to back populate, and we
haven't introduced any changes that would dramatically increase the
hit to Cassandra (like counting columns, as was the case the last time
the retracers started timing out).

Can someone on webops tell me what
/srv/daisy.ubuntu.com/production/local_config/local_config.py on
gremlin or cherufe has for cassandra_host? Are we using round robin,
or are the Apache frontends communicating with the cassandra ring
solely through a single node?

As mentioned on IRC, I'm definitely seeing the kind of weird behaviour
we had back before we stopped doing large column counts as part of the
front page of errors.ubuntu.com. That is, my ssh tunnel to jumbee
keeps dying or dropping requests until I restart it.