Refreshing locks crashes

Bug #1895952 reported by Michael Still
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
etcd3gw
Fix Released
Undecided
Mitya Eremeev

Bug Description

It looks to me like the lease doesn't have the expected TTL field?

Sep 17 07:56:09 cbr-sf-2 INFO sf-queues-1600328820.1042686-000[2587413] Refreshing lock /sf/image/sf-2/1b01f4bcb02f3a060610a4f73b34012d59197a12c2794b495dd583e43d0f65e8; met
hod=etcd.py:120:refresh_lock()
Sep 17 07:56:09 cbr-sf-2 ERROR sf-queues-1600328820.1042686-000[2587413] [Exception] Ignored error in sf-queues: 'TTL'
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/shakenfist/daemons/queues.py", line 53, in handle
    image_fetch(task.get('url'), instance_uuid)
  File "/usr/local/lib/python3.8/dist-packages/shakenfist/daemons/queues.py", line 105, in image_fetch
    img.get([], instance)
  File "/usr/local/lib/python3.8/dist-packages/shakenfist/images.py", line 94, in get
    db.refresh_lock(lock)
  File "/usr/local/lib/python3.8/dist-packages/shakenfist/db.py", line 42, in refresh_lock
    etcd.refresh_lock(lock, relatedobjects=relatedobjects)
  File "/usr/local/lib/python3.8/dist-packages/shakenfist/etcd.py", line 121, in refresh_lock
    lock.refresh()
  File "/usr/local/lib/python3.8/dist-packages/etcd3gw/lock.py", line 101, in refresh
    return self.lease.refresh()
  File "/usr/local/lib/python3.8/dist-packages/etcd3gw/lease.py", line 64, in refresh
    return int(result['result']['TTL'])
KeyError: 'TTL'
; method=util.py:227:ignore_exception()

Revision history for this message
Michael Still (mikal) wrote :

Looks like https://github.com/dims/etcd3-gateway/issues/1 is the same thing, but I am able to reliably make it happen.

Revision history for this message
Michael Still (mikal) wrote :

(Oh, and the FAQ link on the github bug report is now an unrelated page).

Revision history for this message
Michael Still (mikal) wrote :

This appears to happen when you try to refresh an expired lock. Regardless, etcd3gw should not throw an exception in this case. A worked example:

import time

from etcd3gw.client import Etcd3Client
from etcd3gw.lock import Lock

def main():
    client = Etcd3Client()
    lock = Lock('foo-%s' % time.time(), ttl=30, client=client)

    print('Acquire: %s' % lock.acquire())

    print('Acquired: %s' % lock.is_acquired())
    print('Refresh quickly: %s' % lock.refresh())
    time.sleep(60)

    # This will return this stack trace as the lease has expired...
    #
    # Traceback (most recent call last):
    # File "ttl.py", line 18, in <module>
    # main()
    # File "ttl.py", line 14, in main
    # print('Refresh slowly: %s' % lock.refresh())
    # File "/usr/local/lib/python3.8/dist-packages/etcd3gw/lock.py", line 101, in refresh
    # return self.lease.refresh()
    # File "/usr/local/lib/python3.8/dist-packages/etcd3gw/lease.py", line 64, in refresh
    # return int(result['result']['TTL'])
    # KeyError: 'TTL'

    print('Acquired: %s' % lock.is_acquired())
    print('Refresh slowly: %s' % lock.refresh())

if __name__ == '__main__':
    main()

Which will produce this output:

$ python3 ttl.py
Acquire: True
Acquired: True
Refresh quickly: 30
Acquired: False
Traceback (most recent call last):
  File "ttl.py", line 35, in <module>
    main()
  File "ttl.py", line 31, in main
    print('Refresh slowly: %s' % lock.refresh())
  File "/usr/local/lib/python3.8/dist-packages/etcd3gw/lock.py", line 101, in refresh
    return self.lease.refresh()
  File "/usr/local/lib/python3.8/dist-packages/etcd3gw/lease.py", line 64, in refresh
    return int(result['result']['TTL'])
KeyError: 'TTL'

Mitya Eremeev (mitos)
Changed in python-etcd3gw:
assignee: nobody → Mitya Eremeev (mitos)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to etcd3gw (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/etcd3gw/+/843003

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to etcd3gw (master)

Reviewed: https://review.opendev.org/c/openstack/etcd3gw/+/843003
Committed: https://opendev.org/openstack/etcd3gw/commit/e35c7aa1f617278c17649206f98cceb8800bda98
Submitter: "Zuul (22348)"
Branch: master

commit e35c7aa1f617278c17649206f98cceb8800bda98
Author: Mitya_Eremeev <email address hidden>
Date: Mon May 23 19:45:04 2022 +0300

    Handle refreshing of expired lease.

    Refreshing of expired lease causes non-obvious error:
    Retrying tooz.drivers.etcd3gw.Etcd3Driver.heartbeat
    in 1.0 seconds as it raised KeyError: 'TTL'.
    The patch handles the error.

    Closes-Bug: 1895952
    Change-Id: I440cedb711149a5f12eb2311e78181b01666d274

Changed in python-etcd3gw:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/etcd3gw 2.0.0

This issue was fixed in the openstack/etcd3gw 2.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.