tempest intermittently fails waiting for volume delete because etcd is unavailable during volume delete

Bug #1770714 reported by Matt Riedemann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
New
Undecided
Unassigned

Bug Description

Seen here:

http://logs.openstack.org/25/567625/2/gate/tempest-full/8f0fba4/controller/logs/screen-c-vol.txt.gz#_May_11_16_09_09_717256

May 11 16:09:09.717256 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server [None req-c61329b5-7af7-4905-b93c-21907aca9257 tempest-VolumesListAdminTestJSON-1432869514 None] Exception during message handling: ConnectionFailedError: Service Unavailable
May 11 16:09:09.717547 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
May 11 16:09:09.717981 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming
May 11 16:09:09.718212 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
May 11 16:09:09.718442 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch
May 11 16:09:09.718662 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
May 11 16:09:09.718882 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch
May 11 16:09:09.719108 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
May 11 16:09:09.719325 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "<decorator-gen-240>", line 2, in delete_volume
May 11 16:09:09.719542 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/coordination.py", line 165, in _synchronized
May 11 16:09:09.719801 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server lock = coordinator.get_lock(lock_name.format(**call_args))
May 11 16:09:09.720020 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/coordination.py", line 111, in get_lock
May 11 16:09:09.720241 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server return self.coordinator.get_lock(lock_name)
May 11 16:09:09.720464 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/tooz/drivers/etcd3gw.py", line 186, in get_lock
May 11 16:09:09.720683 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server return Etcd3Lock(self, name, self.lock_timeout)
May 11 16:09:09.720901 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/tooz/drivers/etcd3gw.py", line 71, in __init__
May 11 16:09:09.721179 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server self._lease = self._coord.client.lease(self._timeout)
May 11 16:09:09.721392 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/etcd3gw/client.py", line 113, in lease
May 11 16:09:09.721693 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server json={"TTL": ttl, "ID": 0})
May 11 16:09:09.721906 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/etcd3gw/client.py", line 80, in post
May 11 16:09:09.722125 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server raise _EXCEPTIONS_BY_CODE[resp.status_code](resp.reason)
May 11 16:09:09.722344 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server ConnectionFailedError: Service Unavailable
May 11 16:09:09.722566 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server
May 11 16:09:09.772626 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server [None req-c98a22d5-5d57-4b35-ba99-1107df92df43 tempest-VolumesListAdminTestJSON-2118054397 None] Exception during message handling: ConnectionFailedError: Service Unavailable
May 11 16:09:09.772906 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
May 11 16:09:09.773152 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 160, in _process_incoming
May 11 16:09:09.773373 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
May 11 16:09:09.773587 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 213, in dispatch
May 11 16:09:09.773800 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
May 11 16:09:09.774016 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 183, in _do_dispatch
May 11 16:09:09.774226 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
May 11 16:09:09.774446 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "<decorator-gen-240>", line 2, in delete_volume
May 11 16:09:09.774658 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/coordination.py", line 165, in _synchronized
May 11 16:09:09.774867 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server lock = coordinator.get_lock(lock_name.format(**call_args))
May 11 16:09:09.775078 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/opt/stack/cinder/cinder/coordination.py", line 111, in get_lock
May 11 16:09:09.775288 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server return self.coordinator.get_lock(lock_name)
May 11 16:09:09.775496 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/tooz/drivers/etcd3gw.py", line 186, in get_lock
May 11 16:09:09.775743 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server return Etcd3Lock(self, name, self.lock_timeout)
May 11 16:09:09.775958 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/tooz/drivers/etcd3gw.py", line 71, in __init__
May 11 16:09:09.776186 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server self._lease = self._coord.client.lease(self._timeout)
May 11 16:09:09.776396 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/etcd3gw/client.py", line 113, in lease
May 11 16:09:09.776618 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server json={"TTL": ttl, "ID": 0})
May 11 16:09:09.776829 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/etcd3gw/client.py", line 80, in post
May 11 16:09:09.777039 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server raise _EXCEPTIONS_BY_CODE[resp.status_code](resp.reason)
May 11 16:09:09.777248 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server ConnectionFailedError: Service Unavailable
May 11 16:09:09.777455 ubuntu-xenial-rax-ord-0003979398 cinder-volume[11787]: ERROR oslo_messaging.rpc.server

http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ConnectionFailedError%3A%20Service%20Unavailable%5C%22%20AND%20tags%3A%5C%22screen-c-vol.txt%5C%22&from=7d

Doesn't look like this is happening much.

Tags: etcd
Revision history for this message
Matt Riedemann (mriedem) wrote :

I see this in the etcd logs around the time of the error but they don't tell us anything:

http://logs.openstack.org/25/567625/2/gate/tempest-full/8f0fba4/controller/logs/screen-etcd.txt.gz

May 11 16:09:09.672196 ubuntu-xenial-rax-ord-0003979398 etcd[14958]: proto: no coders for int
May 11 16:09:09.770247 ubuntu-xenial-rax-ord-0003979398 etcd[14958]: proto: no coders for int

Because of this issue:

https://github.com/grpc-ecosystem/grpc-gateway/pull/256

Revision history for this message
Matt Riedemann (mriedem) wrote :

I don't know if there is anything cinder can do here except maybe put a retry wrapper on anything that involves hitting etcd and if you get a service unavailable error retry until some timeout.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.