oslo.cache's pymemcache backend doesn't recover from socket disconnection

Bug #1934130 reported by Damien Ciabrini
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Confirmed
Undecided
Damien Ciabrini

Bug Description

When oslo.cache is enabled and configured to target pymemcache (e.g. memcached + TLS-e),
pymemcache is managing the sockets that connect to memcached.

With this configuration, there is no automatic retry in pymemcache on socket error
or socket disconnection. Instead, pymemcache closes the invalid socket and raises
an Exception down the stack. This makes the oslo cache call fail, and any subsequent
calls will also fail until all bad sockets are hit and closed.

Try can consistently been triggered by:
  1. running "openstack service list" on the overcloud to create connection to memcache

  2. restart memcached with "systemctl restart tripleo_memcached" to
     force the connected sockets to close one side of its connection.
     This will leave <x> opened sockets on the controller:
     the keystone service will have its side of the socket still
     opened.

  3. the next call to "openstack service list" will fail because
     pymemcache will hit a half-closed socket, close its side, and
     raise an exception

  4. the keystone service will recover only once the remaining <x>-1 half-closed sockets
     get hit and closed.

Changed in tripleo:
milestone: xena-2 → xena-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.