On Monday, 2014-03-31, Paul reported:
> This morning I decided to visit KARL. I first did a ping on
> <production> to make sure there were no DNS issues. My KARL home
> page is the communities listing.
>
> We have an improved "timeit" at the bottom of every screen. Above
> the copyright statement is a blank line. Mouseover it and you get:
>
> - The execution time in the middleware stack
> - The name of the app server that served that request
>
> From this I could see <appserver1> took 28.87 sec to render the
> results. That, obviously, is a crazy number. The communities listing
> is a very popular request. The cache should be very primed.
>
> I decided to look a little more into it. The
> var/log/connection_stats.csv is Tres's log of how many ZODB objects
> were loaded for slow requests. Here is the line that request,
> showing 5,119 objects were loaded (not a crazy amount):
>
> "2014-03-31 08:00:04", "GET",
> "<hostname>/communities/active_communities.html",
> 28.865574, 5119, 0
>
> Interestingly (perhaps) that is the only entry in the last 39
> minutes, meaning, we haven't had many slow requests on that app
> server.
>
> Now, over to DataDog. At 8AM I see a little spike in CPU, up to 20%
> system. I see a big spike in memcache bytes read at that instance:
> 96 KB, which appears to be the peak amount for all the spikes in the
> last month. The 28.8s shows up in permetrics.tween.t.95percentile.
>
> Ok, getting a little more interesting. Shane has a permetric
> CS.karl.models.catalog.CachingCatalog.search.t.95percentile.
> Basically, time spent executing a catalog operation. At 8AM I see a
> huge jump to....28.8 seconds. So the time spent on my long request
> was (not surprisingly) in the catalog.
>
> None of the relstorage metrics show anything significant at the 8AM
> timeframe. It would appear that the relevant objects were gotten out
> of memcache. I need to get the DataDog agent running again on the
> other app servers, the frontend, and on the db server. (They were
> running previously but I never wired them up for VM reboot.)
In September I will ask ChrisR if there is a 1 hour science experiment that will confirm/deny whether we have a weird config. If it's just a fishing expedition, I'll close this.