kuryr-k8s does not reconnect to API in case of API restart

Bug #1705429 reported by Kirill Zaitsev
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
Fix Released
High
Maysa de Macedo Souza

Bug Description

To reproduce:
1) launch kuryr-k8s-controller
2) restart kubernetes-api

Expected behaviour:
kuryr-k8s would reconnect to k8s-api

Observed:

kuryr-k8s stops watching for any events

2017-07-20 09:28:13.481 7 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/endpoints'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/poll.py", line 115, in wait
    listener.cb(fileno)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/kuryr_kubernetes/watcher.py", line 138, in _watch
    for event in self._client.watch(path):
  File "/usr/local/lib/python2.7/dist-packages/kuryr_kubernetes/k8s_client.py", line 141, in watch
    for line in response.iter_lines(delimiter='\n'):
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 791, in iter_lines
    for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 750, in generate
    raise ChunkedEncodingError(e)
ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
Removing descriptor: 5
2017-07-20 09:28:14.120 7 DEBUG kuryr_kubernetes.handlers.dispatch [-] 1 handler(s) available __call__ /usr/local/lib/python2.7/dist-packages/kuryr_kubernetes/handlers/dispatch.py:62
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/poll.py", line 115, in wait
    listener.cb(fileno)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/kuryr_kubernetes/watcher.py", line 138, in _watch
    for event in self._client.watch(path):
  File "/usr/local/lib/python2.7/dist-packages/kuryr_kubernetes/k8s_client.py", line 141, in watch
    for line in response.iter_lines(delimiter='\n'):
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 791, in iter_lines
    for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 750, in generate
    raise ChunkedEncodingError(e)
ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
Removing descriptor: 3
2017-07-20 09:28:14.121 7 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/pods'
2017-07-20 09:28:14.122 7 INFO kuryr_kubernetes.watcher [-] Stopped watching '/api/v1/services'
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/poll.py", line 115, in wait
    listener.cb(fileno)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/kuryr_kubernetes/watcher.py", line 138, in _watch
    for event in self._client.watch(path):
  File "/usr/local/lib/python2.7/dist-packages/kuryr_kubernetes/k8s_client.py", line 141, in watch
    for line in response.iter_lines(delimiter='\n'):
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 791, in iter_lines
    for chunk in self.iter_content(chunk_size=chunk_size, decode_unicode=decode_unicode):
  File "/usr/local/lib/python2.7/dist-packages/requests/models.py", line 750, in generate
    raise ChunkedEncodingError(e)
ChunkedEncodingError: ('Connection broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
Removing descriptor: 4

summary: - kuryr-k8s does not reconnect to API in case of failure
+ kuryr-k8s does not reconnect to API in case of API restart
Revision history for this message
Antoni Segura Puimedon (celebdor) wrote :

There's two ways to address this:

a) kuryr controller service manager restarts exited watches with a maximum reconnect attempts
b) Implement /healthz/k8s API endpoint so a liveness probe can decide to restart the controller.

We probably need both approaches.

Changed in kuryr-kubernetes:
importance: Undecided → High
Changed in kuryr-kubernetes:
milestone: none → pike-3
Revision history for this message
Antoni Segura Puimedon (celebdor) wrote :

Maysa is working on approach (b)

Changed in kuryr-kubernetes:
assignee: nobody → Maysa de Macedo Souza (maysa)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kuryr-kubernetes (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/533292

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on kuryr-kubernetes (master)

Change abandoned by Maysa de Macedo Souza (<email address hidden>) on branch: master
Review: https://review.openstack.org/533292
Reason: Another patch [535548] is already addressing this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kuryr-kubernetes (master)

Reviewed: https://review.openstack.org/535548
Committed: https://git.openstack.org/cgit/openstack/kuryr-kubernetes/commit/?id=c00897c02e02e5e0a859d45027bf838b1fdff778
Submitter: Zuul
Branch: master

commit c00897c02e02e5e0a859d45027bf838b1fdff778
Author: Maysa Macedo <email address hidden>
Date: Fri Jan 19 00:31:25 2018 +0000

    Add liveness checks to Kuryr Controller

    This patch adds liveness checks for watcher and handlers, without passing the
    manager's reference to modules that probably should not be aware of it.

    Related-Bug: #1705429
    Change-Id: I0192756c556b13f98302a57acedce269c278e260

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/529832
Committed: https://git.openstack.org/cgit/openstack/kuryr-kubernetes/commit/?id=58e6b1914c516ec87e04cfc935f33352887d1926
Submitter: Zuul
Branch: master

commit 58e6b1914c516ec87e04cfc935f33352887d1926
Author: Eunsoo Park <email address hidden>
Date: Thu Feb 22 16:12:34 2018 +0900

    Watcher restarts watching resources in failure

    kuryr-kubernetes watcher watches k8s resources and trigger registered
    pipeline.

    This patch handles restarting watching when watch thread has failed.

    Change-Id: I27a719a326dc37f97c46b88d0c171d0f12ded605
    Closes-Bug: 1739776
    Related-Bug: 1705429
    Signed-off-by: Eunsoo Park <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to kuryr-kubernetes (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/554826

Changed in kuryr-kubernetes:
status: New → Triaged
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to kuryr-kubernetes (stable/queens)

Reviewed: https://review.openstack.org/554826
Committed: https://git.openstack.org/cgit/openstack/kuryr-kubernetes/commit/?id=138c25338bb704cd659718a667b462e86769aafb
Submitter: Zuul
Branch: stable/queens

commit 138c25338bb704cd659718a667b462e86769aafb
Author: Eunsoo Park <email address hidden>
Date: Thu Feb 22 16:12:34 2018 +0900

    Watcher restarts watching resources in failure

    kuryr-kubernetes watcher watches k8s resources and trigger registered
    pipeline.

    This patch handles restarting watching when watch thread has failed.

    Change-Id: I27a719a326dc37f97c46b88d0c171d0f12ded605
    Closes-Bug: 1739776
    Related-Bug: 1705429
    Signed-off-by: Eunsoo Park <email address hidden>
    (cherry picked from commit 58e6b1914c516ec87e04cfc935f33352887d1926)

tags: added: in-stable-queens
Revision history for this message
Michal Dulko (michal-dulko-f) wrote :

I think this one is fixed now.

Changed in kuryr-kubernetes:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.