Outage blip during site config changes

Bug #1881315 reported by John Losito
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Content Cache Charm
Fix Released
Critical
Haw Loeung

Bug Description

I've been noticing quick outage blips during deployments against the content-cache charm. When rolling out a new site against the charm, all of the sites which are being served from content-cache become unresponsive for a second or two while the new configs get picked up. I've seen reports of others caching 4** errors while this is happening and they try to access the site.

This could be related to the haproxy bug. It might worth draining all current connections somehow before making the switch over.

Related branches

Junien F (axino)
Changed in content-cache-charm:
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
Haw Loeung (hloeung) wrote :

Draining works for backends configured in content-cache, not when clients are hitting the content-cache units themselves.

Changed in content-cache-charm:
status: Triaged → Invalid
Revision history for this message
Haw Loeung (hloeung) wrote :

FWIW, this is an issue within HAProxy itself, which we'll need to debug. It also happens with TLS/SSL certificates are refreshed and the autocert charm issues a HAProxy reload which is nothing content-cache related.

Revision history for this message
Junien F (axino) wrote :

Even though the root cause is within haproxy, this is still a charm bug IMO.
Is there an haproxy bug somewhere tracking this ?

Thanks

Changed in content-cache-charm:
status: Invalid → Confirmed
Revision history for this message
Haw Loeung (hloeung) wrote : Re: [Bug 1881315] Re: Outage blip during site config changes

On Fri, Jul 24, 2020 at 12:54:49PM -0000, Junien Fridrick wrote:
> Even though the root cause is within haproxy, this is still a charm bug IMO.

I disagree.

> Is there an haproxy bug somewhere tracking this ?
>

Not yet, will have to try reproduce it with the latest HAProxy (2.2)
before filing a bug upstream.

Haw Loeung (hloeung)
Changed in content-cache-charm:
assignee: nobody → Haw Loeung (hloeung)
status: Confirmed → In Progress
Revision history for this message
Haw Loeung (hloeung) wrote :

The issue was to do with sites config having high check interval (60 seconds). This was to workaround nbproc DoS'ing backends. Things have changed since then with content-cache switching HAProxy to using nbthread.

The sites configs have been changed reducing check interval and this no longer happens. As a safe guard, and to reduce this from happening in the future, the charm now implements a constant 2s interval for the caching layer. This obviously won't help for sites configured with redirects 301s and 302s which aren't cached and needs to be passed through to the backends.

Changed in content-cache-charm:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.