Neutron loadbalancer is not deleted after endpoint has been deleted already.

Bug #1748890 reported by Eunsoo Park
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
Fix Released
High
Yossi Boaron

Bug Description

In my testbed, I found this bug when I repeated deleting and creating rc with 40 pods and 1 svc of which endpoint points out those pods.

Logs are as below.
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.k8s_client [-] Exception response, headers: {'Date': 'Wed, 10 Jan 2018 04:51:51 GMT', 'Content-Length': '200', 'Content-Type': 'application/json', 'Cache-Control': 'no-store'}, content: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \"webserver-1\" not found","reason":"NotFound","details":{"name":"webserver-1","kind":"endpoints"},"code":404}
, text: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \"webserver-1\" not found","reason":"NotFound","details":{"name":"webserver-1","kind":"endpoints"},"code":404}
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging [-] Failed to handle event {u'object': {u'kind': u'Endpoints', u'subsets': [{u'addresses': [{u'ip': u'10.0.6.1', u'targetRef': {u'kind': u'Pod', u'resourceVersion': u'35481', u'namespace': u'default', u'name': u'webserver-1-zq5bf', u'uid': u'e44aab3b-f5c1-11e7-ae32-fa163e7d22e2'}, u'nodeName': u'es2-vm-10-0-4-11.novalocal'}], u'ports': [{u'protocol': u'TCP', u'port': 8080}]}], u'apiVersion': u'v1', u'metadata': {u'name': u'webserver-1', u'namespace': u'default', u'resourceVersion': u'35482', u'creationTimestamp': u'2018-01-10T04:51:16Z', u'annotations': {u'openstack.org/kuryr-lbaas-spec': u'{"versioned_object.data": {"ip": "10.0.5.120", "lb_ip": null, "ports": [{"versioned_object.data": {"name": null, "port": 80, "protocol": "TCP"}, "versioned_object.name": "LBaaSPortSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}], "project_id": "9243b6fce8704943805121f4992b7f5e", "security_groups_ids": ["3df3c214-2d29-468b-9c39-3adb645dcb88"], "subnet_id": "edc0fa91-e5c5-4e08-9b47-5dfa1bde709d", "type": "ClusterIP"}, "versioned_object.name": "LBaaSServiceSpec", "versioned_object.namespace": "kuryr_kubernetes", "versioned_object.version": "1.0"}'}, u'selfLink': u'/api/v1/namespaces/default/endpoints/webserver-1', u'uid': u'e44d32b1-f5c1-11e7-ae32-fa163e7d22e2'}}, u'type': u'MODIFIED'}: K8sClientException: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \"webserver-1\" not found","reason":"NotFound","details":{"name":"webserver-1","kind":"endpoints"},"code":404}
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging Traceback (most recent call last):
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/logging.py", line 37, in __call__
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event)
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/retry.py", line 61, in __call__
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging self._handler(event)
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/handlers/k8s_base.py", line 60, in __call__
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging self.on_present(obj)
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 247, in on_present
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging self._set_lbaas_state(endpoints, lbaas_state)
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/handlers/lbaas.py", line 569, in _set_lbaas_state
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging resource_version=endpoints['metadata']['resourceVersion'])
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging File "/usr/lib/python2.7/site-packages/kuryr_kubernetes/k8s_client.py", line 148, in annotate
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging raise exc.K8sClientException(response.text)
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging K8sClientException: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \"webserver-1\" not found","reason":"NotFound","details":{"name":"webserver-1","kind":"endpoints"},"code":404}
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging
2018-01-10 04:51:51.665 1 ERROR kuryr_kubernetes.handlers.logging

Have a look at this bug situation.
1. First pod creation
2. Endpoint has been changed
3. Neutron LBaaS has been synced with the LBaaSSpec (It's a high latency job)
 -> Create LoadBalancer, Listener, Pool, and Member sequentially.

[3] takes too much time at neutron server. When k8s endpoint has been deleted before the LBaaS resources are created properly, hence, set_annotate has failed since the k8s object gone already as above log.

This raises the created lbaas resources remain, because there's no rollback logic anywhere in the code.

I could see many times this bug, and patched the rollback code on lbaas handler (after annotation failed).
Is there anyone who experienced this and knows the brilliant solution?

Revision history for this message
Eunsoo Park (esevan.park) wrote :

If this is a definite bug on kuryr, I'd like to contribute from my patch.

Revision history for this message
Yossi Boaron (yossi-boaron-1234) wrote :

Hi Eunsoo,

Could you please share more information about your test/bug.

1. Your environment (devstack, other , LBaaS provider, etc)
2. Detailed reproducing steps/script

Revision history for this message
Eunsoo Park (esevan.park) wrote :

Hello,
1. Environment
 Openstack: Packstack (Openstack Ocata)
   LBaaS Provider: Neutron lbaasv2 (neutron-lbaasv2-agent)
   1 Openstack Controller Node (All core openstack services are here)
   1 Network Node
   N Compute Node
 Openshift: v3.7.0
   k8s: v1.7.6
   3 Masters
   3 Minions

2. Repeat creation and deletion of following rc (Deletion before all pods are ready)
apiVersion: v1
kind: ReplicationController
metadata:
  name: webserver-1
spec:
  replicas: 40
  selector:
    app: webserver-1
  template:
    metadata:
      labels:
        app: webserver-1
        version: v1
    spec:
      containers:
      - image: manjae31/webserver:latest
        name: web-server-http
        ports:
        - containerPort: 8080
          protocol: TCP

---
kind: Service
apiVersion: v1
metadata:
  name: webserver-1
spec:
  selector:
    app: webserver-1
  ports:
  - protocol: TCP
    port: 80
    targetPort: 8080

I could see below messages a lot.
ERROR kuryr_kubernetes.handlers.logging K8sClientException: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"endpoints \"webserver-1\" not found","reason":"NotFound","details":{"name":"webserver-1","kind":"endpoints"},"code":404}

Changed in kuryr-kubernetes:
assignee: nobody → Yossi Boaron (yossi-boaron-1234)
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kuryr-kubernetes (master)

Fix proposed to branch: master
Review: https://review.openstack.org/546784

Changed in kuryr-kubernetes:
status: New → In Progress
Revision history for this message
Yossi Boaron (yossi-boaron-1234) wrote :

Hi Eunsoo,

Could you please verify the solution with https://review.openstack.org/546784 ?

Many thanks in advance
Yossi

Revision history for this message
Eunsoo Park (esevan.park) wrote :

Hello,

This code is actually implemented in the same way I'm using.
No more error happend so far.

But for the safety, let me try this code and report soon ;)

Thank you for the commit.

Revision history for this message
Yossi Boaron (yossi-boaron-1234) wrote :

Hi Eunsoo,

Thank you very much.

Yossi

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kuryr-kubernetes (master)

Reviewed: https://review.openstack.org/546784
Committed: https://git.openstack.org/cgit/openstack/kuryr-kubernetes/commit/?id=2e6c7eaae7a198cc169f060351b54e9c0586de11
Submitter: Zuul
Branch: master

commit 2e6c7eaae7a198cc169f060351b54e9c0586de11
Author: Yossi Boaron <email address hidden>
Date: Wed Feb 21 23:54:52 2018 +0200

    Services: Rollback openstack resources in case of annotation failure

    Upon K8S service creation the LBaaS handler creates all LB resources
    at neutron (LB,Listener,Pool,etc) and store them at K8S resource
     using annotation.
    When K8S service is deleted, the LBaaS handler retrieves LB
    resources details from annotation and release them at neutron.

    This patch handles the case in which K8S service resource was deleted
    before LBaaS handler stored openstack resource details.

    Closes-Bug: 1748890

    Change-Id: Iea806d32c99cd3cf51a832b576ff4054fc522bd3

Changed in kuryr-kubernetes:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to kuryr-kubernetes (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/553207

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to kuryr-kubernetes (stable/queens)

Reviewed: https://review.openstack.org/553207
Committed: https://git.openstack.org/cgit/openstack/kuryr-kubernetes/commit/?id=2b1b53f49a0cbe2212d62df58ed66afe259ca042
Submitter: Zuul
Branch: stable/queens

commit 2b1b53f49a0cbe2212d62df58ed66afe259ca042
Author: Yossi Boaron <email address hidden>
Date: Wed Feb 21 23:54:52 2018 +0200

    Services: Rollback openstack resources in case of annotation failure

    Upon K8S service creation the LBaaS handler creates all LB resources
    at neutron (LB,Listener,Pool,etc) and store them at K8S resource
     using annotation.
    When K8S service is deleted, the LBaaS handler retrieves LB
    resources details from annotation and release them at neutron.

    This patch handles the case in which K8S service resource was deleted
    before LBaaS handler stored openstack resource details.

    Closes-Bug: 1748890

    Change-Id: Iea806d32c99cd3cf51a832b576ff4054fc522bd3
    (cherry picked from commit 2e6c7eaae7a198cc169f060351b54e9c0586de11)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kuryr-kubernetes 0.4.2

This issue was fixed in the openstack/kuryr-kubernetes 0.4.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/kuryr-kubernetes 0.5.0

This issue was fixed in the openstack/kuryr-kubernetes 0.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.