Content Cache Charm

High traffic can result in HTTP 503 errors

Bug #1835900 reported by Barry Price on 2019-07-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Content Cache Charm	Fix Released	Critical	Barry Price

Bug Description

In a deploy with a (default) 1GB maximum cache size, the cache filled up.

For some reason, nginx wasn't able to garbage-collect in order to make more room.

This left the service responding with HTTP 503 errors instead of the requested content.

Related branches

~barryprice/content-cache-charm/+git/content-cache-charm:configurable-maxconn

Merged into content-cache-charm:master at revision af1512e877f6c0210b4e7616da763b4c75948e7f

Joel Sing (community): Approve (+1) on 2019-07-12

Stuart Bishop (community): Approve on 2019-07-11

Tom Haddon (mthaddon) on 2019-07-09

Changed in content-cache-charm:
status:	New → Confirmed
importance:	Undecided → Critical

Revision history for this message

Barry Price (barryprice) wrote on 2019-07-10:

Working theory is that this was caching sufficiently large objects, and had sufficient simultaneous slow clients connected, that every object in the cache was already being used at the point where it filled up, with no room for any more, but no candidates for deletion.

Lots of "ignore long locked inactive cache entry" in the nginx logs, but nothing else of interest - and no sign of nginx process restarts or segfaults.

Revision history for this message

Barry Price (barryprice) wrote on 2019-07-10:

I've reproduced this on staging via parallel requests for a large object combined with a small cache limit.

Having read through the docs pretty thoroughly, there appears to be no way, AFAICS, to tell nginx "If the on-disk cache is full, just pass-through the request without attempting to cache it" (coupled with an alert to inform the operator that the cache has problems).

We ought to be able to avoid this scenario with sensibly-large cache size limits, but I wonder whether another layer might make sense here, if we can't configure a single nginx to behave in an appropriate way.

Revision history for this message

Barry Price (barryprice) wrote on 2019-07-11:

The cache size issue may be a red herring here.

The charm configures each haproxy backend (including the local nginx service) with "maxconn 16".

Removing this snippet solves the 503 issue.

The charm needs to be amended so that this is configurable per-backend, with a sensible default if it's omitted..

summary:

- Full cache can result in HTTP 503 errors
+ High traffic can result in HTTP 503 errors

Barry Price (barryprice) on 2019-07-11

Changed in content-cache-charm:
assignee:	nobody → Barry Price (barryprice)
status:	Confirmed → In Progress

Barry Price (barryprice) on 2019-07-12

Changed in content-cache-charm:
status:	In Progress → Fix Committed

Barry Price (barryprice) on 2019-07-12

Changed in content-cache-charm:
status:	Fix Committed → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.