memcached is a spof

Bug #1869797 reported by Andrea Ieri
24
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Gnocchi Charm
Triaged
Wishlist
Unassigned
OpenStack Designate Charm
Triaged
Wishlist
Unassigned
charms.openstack
Invalid
Undecided
Unassigned
memcached-charm
Triaged
Wishlist
Unassigned

Bug Description

We have a clustered 2-unit memcached application related to three gnocchi units.
In gnocchi.conf coordination_url points to a specific memcached unit (the leader in my case, not sure if that's just a coincidence).
If that unit goes offline, or if the memcached service dies, the gnocchi config obviously remains as is, and since memcached does not use VIPs, gnocchi itself becomes inoperable.
Since there's no failover mechanism, having a second memcached unit is effectively irrelevant.

I've skimmed through the code but was not able to find if the issue lies with memcached or gnocchi.

Revision history for this message
Andrea Ieri (aieri) wrote :

In case it's relevant:

os-release: bionic
openstack release: queens
gnocchi rev 30
memcached rev 26

Revision history for this message
Junien F (axino) wrote :

Also, last time I looked (and it was a while ago), the memcached charm didn't actually set up a cluster when deploying multiple units of the same app. It just did nothing special, so all units were separate. Worth checking if you want to unspof it.

Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

In the OpenStack charms, this is a result of the following code: https://github.com/openstack/charms.openstack/blob/94d446433b415a31afba8c540e55772877476930/charms_openstack/adapters.py#L186

As Junien says, the memcached charm doesn't do any sort of HA by default, but this can be configured via a charm option: repcached

Changed in charm-gnocchi:
status: New → Confirmed
Changed in charms.openstack:
status: New → Confirmed
Revision history for this message
Andrea Ieri (aieri) wrote :

True, but repcached only supports trusty, so it's not really an option anymore

Revision history for this message
Chris Sanders (chris.sanders) wrote :

Subscribing field-high, we have recently confirmed that the replication workaround for this bug is only on clouds <= trusty. For clouds > trusty the loss of a single memcached unit causes cloud API outages.

Revision history for this message
Alex Kavanagh (ajkavanagh) wrote :

It's not really a charms.openstack issue. The comment capturing where it is referenced in charms.openstack is useful. Ultimately it's an architecture issue that affects gnocchi (and potentially designate?)

Changed in charms.openstack:
status: Confirmed → Invalid
Revision history for this message
Andrea Ieri (aieri) wrote :

I've added charm-designate as I've confirmed it is affected as well.

description: updated
Changed in charm-memcached:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Peter Sabaini (peter-sabaini) wrote :

Discussed this with team, the work for memcached-charm will need more scheduling. Marking as wishlist for now

Changed in charm-memcached:
importance: High → Wishlist
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

There is some analysis here as to what might be an alternative:

https://bugs.launchpad.net/charm-designate/+bug/1759597

Revision history for this message
Trent Lloyd (lathiat) wrote :

I have done some research in the following related bug for charm-nova-cloud-controller, specifically around fixing many of the scenarios that cause an outage when memcached is down. Ideally this shouldn't happen and we can make some fixes around that, outside of HA:

https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1827397

I will follow-up once I determine a path of action in this and/or other bugs to generally improve the situation.

Changed in charm-designate:
status: New → Triaged
Changed in charm-gnocchi:
status: Confirmed → Triaged
importance: Undecided → Wishlist
Changed in charm-designate:
importance: Undecided → Wishlist
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Triaged charm-designate and charm-gnocchi bugs in the same way as the memcached charm.

One possibility of addressing this is switching the tooz backend to something else (like etcd).

This will not help with charm-nova-cloud-controller because Nova does not rely on tooz for talking to memcached whereas designate and gnocchi use tooz for coordination.

There are multiple ways Nova can use memcached:

* for oslo_cache;
* for service groups.

The service group part seems to be unaffected as the database driver is the default one and we do not override it in charm-nova-cloud-controller:
https://docs.openstack.org/nova/latest/admin/service-groups.html
https://github.com/openstack/nova/blob/stable/stein/nova/conf/servicegroup.py#L19-L23 (the default driver is "db", not "mc")
https://github.com/openstack/nova/blob/stable/stein/nova/cache_utils.py#L69-L95
https://github.com/openstack/nova/blob/stable/stein/nova/servicegroup/drivers/mc.py#L37

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.