High traffic can result in HTTP 503 errors

Bug #1835900 reported by Barry Price
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Content Cache Charm
Fix Released
Critical
Barry Price

Bug Description

In a deploy with a (default) 1GB maximum cache size, the cache filled up.

For some reason, nginx wasn't able to garbage-collect in order to make more room.

This left the service responding with HTTP 503 errors instead of the requested content.

Related branches

Tom Haddon (mthaddon)
Changed in content-cache-charm:
status: New → Confirmed
importance: Undecided → Critical
Revision history for this message
Barry Price (barryprice) wrote :

Working theory is that this was caching sufficiently large objects, and had sufficient simultaneous slow clients connected, that every object in the cache was already being used at the point where it filled up, with no room for any more, but no candidates for deletion.

Lots of "ignore long locked inactive cache entry" in the nginx logs, but nothing else of interest - and no sign of nginx process restarts or segfaults.

Revision history for this message
Barry Price (barryprice) wrote :

I've reproduced this on staging via parallel requests for a large object combined with a small cache limit.

Having read through the docs pretty thoroughly, there appears to be no way, AFAICS, to tell nginx "If the on-disk cache is full, just pass-through the request without attempting to cache it" (coupled with an alert to inform the operator that the cache has problems).

We ought to be able to avoid this scenario with sensibly-large cache size limits, but I wonder whether another layer might make sense here, if we can't configure a single nginx to behave in an appropriate way.

Revision history for this message
Barry Price (barryprice) wrote :

The cache size issue may be a red herring here.

The charm configures each haproxy backend (including the local nginx service) with "maxconn 16".

Removing this snippet solves the 503 issue.

The charm needs to be amended so that this is configurable per-backend, with a sensible default if it's omitted..

summary: - Full cache can result in HTTP 503 errors
+ High traffic can result in HTTP 503 errors
Barry Price (barryprice)
Changed in content-cache-charm:
assignee: nobody → Barry Price (barryprice)
status: Confirmed → In Progress
Barry Price (barryprice)
Changed in content-cache-charm:
status: In Progress → Fix Committed
Barry Price (barryprice)
Changed in content-cache-charm:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.