KARL3

Investigate relstorage + memcached configuration / performance

Bug #1302750 reported by Tres Seaver on 2014-04-04

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	KARL3	Fix Released	Low	Shane Hathaway	KARL3 m140 "Sep 2014"

Bug Description

On Monday, 2014-03-31, Paul reported:

> This morning I decided to visit KARL. I first did a ping on
> <production> to make sure there were no DNS issues. My KARL home
> page is the communities listing.
>
> We have an improved "timeit" at the bottom of every screen. Above
> the copyright statement is a blank line. Mouseover it and you get:
>
> - The execution time in the middleware stack
> - The name of the app server that served that request
>
> From this I could see <appserver1> took 28.87 sec to render the
> results. That, obviously, is a crazy number. The communities listing
> is a very popular request. The cache should be very primed.
>
> I decided to look a little more into it. The
> var/log/connection_stats.csv is Tres's log of how many ZODB objects
> were loaded for slow requests. Here is the line that request,
> showing 5,119 objects were loaded (not a crazy amount):
>
> "2014-03-31 08:00:04", "GET",
> "<hostname>/communities/active_communities.html",
> 28.865574, 5119, 0
>
> Interestingly (perhaps) that is the only entry in the last 39
> minutes, meaning, we haven't had many slow requests on that app
> server.
>
> Now, over to DataDog. At 8AM I see a little spike in CPU, up to 20%
> system. I see a big spike in memcache bytes read at that instance:
> 96 KB, which appears to be the peak amount for all the spikes in the
> last month. The 28.8s shows up in permetrics.tween.t.95percentile.
>
> Ok, getting a little more interesting. Shane has a permetric
> CS.karl.models.catalog.CachingCatalog.search.t.95percentile.
> Basically, time spent executing a catalog operation. At 8AM I see a
> huge jump to....28.8 seconds. So the time spent on my long request
> was (not surprisingly) in the catalog.
>
> None of the relstorage metrics show anything significant at the 8AM
> timeframe. It would appear that the relevant objects were gotten out
> of memcache. I need to get the DataDog agent running again on the
> other app servers, the frontend, and on the db server. (They were
> running previously but I never wired them up for VM reboot.)

Paul Everitt (paul-agendaless) on 2014-06-24

Changed in karl3:
milestone:	m136 → m138

Paul Everitt (paul-agendaless) on 2014-07-03

Changed in karl3:
importance:	Medium → Low

Paul Everitt (paul-agendaless) on 2014-07-06

Changed in karl3:
assignee:	Tres Seaver (tseaver) → nobody

Paul Everitt (paul-agendaless) on 2014-07-08

Changed in karl3:
milestone:	m138 → m139

Revision history for this message

Paul Everitt (paul-agendaless) wrote on 2014-08-04:

In September I will ask ChrisR if there is a 1 hour science experiment that will confirm/deny whether we have a weird config. If it's just a fishing expedition, I'll close this.

Changed in karl3:
assignee:	nobody → Chris Rossi (chris-archimedeanco)
milestone:	m139 → m140

Paul Everitt (paul-agendaless) on 2014-08-15

Changed in karl3:
assignee:	Chris Rossi (chris-archimedeanco) → Shane Hathaway (shane-hathawaymix)
milestone:	m140 → m139

Revision history for this message

Paul Everitt (paul-agendaless) wrote on 2014-08-29:

Shane, do you think this one is worth doing? I don't think we have good visibility on how memcached is or isn't helping us. We certainly know it is causing an upgrade pain, plus introducing some mystery into how we reason about performance.

Paul Everitt (paul-agendaless) on 2014-09-04

Changed in karl3:
milestone:	m139 → m140

Revision history for this message

Paul Everitt (paul-agendaless) wrote on 2014-09-04:

I'm going to close this. I don't think there's anything we can do about memcached except guess and hope.

Changed in karl3:
status:	New → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.