Watcher cannot be started after start-stop sequence

Bug #1790912 reported by Michal Dulko
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
Fix Released
High
Michal Dulko

Bug Description

Our watcher code is pretty spaghetti, so no doubt this happened. In controller HA configuration it may happen that Watcher is doing something like this due to leadership transitions:

watcher.start()
...
watcher.stop()
...
watcher.start()

Turns our watcher cannot be started again. This is due to the fact that we thought the watcher threads will be cleaning up after themselves. Meanwhile in stop() we're simply killing those threads with thread.stop() method, so thread is unable to do the cleanup.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kuryr-kubernetes (master)

Fix proposed to branch: master
Review: https://review.openstack.org/600142

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kuryr-kubernetes (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/601284

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kuryr-kubernetes (master)

Reviewed: https://review.openstack.org/600142
Committed: https://git.openstack.org/cgit/openstack/kuryr-kubernetes/commit/?id=346f76292af3ccb1abd59b66c52b09ef480f4717
Submitter: Zuul
Branch: master

commit 346f76292af3ccb1abd59b66c52b09ef480f4717
Author: Michał Dulko <email address hidden>
Date: Wed Sep 5 18:37:10 2018 +0200

    Clean up watch resources after watcher.stop()

    We were assuming that watcher threads will be cleaning up after
    themselves - i.e. will remove paths from Watcher._watching dict on
    Watcher._stop_watch(). Turns out _stop_watch() is killing the threads in
    a hard way using thread.stop(). This means that paths are never removed
    from Watcher._watching dict and on restart (i.e. Watcher.start()), the
    method considers that there is no path that we're not already
    processing and does nothing.

    This commit fixes that by cleaning up Watcher._watching dict in
    Watcher._stop_watch() method.

    Closes-Bug: 1790912
    Change-Id: I17baaab1769ca5882f0b8edf496f92ac39507969

Changed in kuryr-kubernetes:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kuryr-kubernetes (stable/rocky)

Reviewed: https://review.openstack.org/601284
Committed: https://git.openstack.org/cgit/openstack/kuryr-kubernetes/commit/?id=84ad28ef65b3ad07c83795887df20282fcf7036f
Submitter: Zuul
Branch: stable/rocky

commit 84ad28ef65b3ad07c83795887df20282fcf7036f
Author: Michał Dulko <email address hidden>
Date: Wed Sep 5 18:37:10 2018 +0200

    Clean up watch resources after watcher.stop()

    We were assuming that watcher threads will be cleaning up after
    themselves - i.e. will remove paths from Watcher._watching dict on
    Watcher._stop_watch(). Turns out _stop_watch() is killing the threads in
    a hard way using thread.stop(). This means that paths are never removed
    from Watcher._watching dict and on restart (i.e. Watcher.start()), the
    method considers that there is no path that we're not already
    processing and does nothing.

    This commit fixes that by cleaning up Watcher._watching dict in
    Watcher._stop_watch() method.

    Closes-Bug: 1790912
    Change-Id: I17baaab1769ca5882f0b8edf496f92ac39507969
    (cherry picked from commit 346f76292af3ccb1abd59b66c52b09ef480f4717)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kuryr-kubernetes 0.5.1

This issue was fixed in the openstack/kuryr-kubernetes 0.5.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kuryr-kubernetes 0.6.0

This issue was fixed in the openstack/kuryr-kubernetes 0.6.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.