Comment 0 for bug 1753653

Revision history for this message
Eunsoo Park (esevan.park) wrote :

Dears,

K8s endpoints are supposed to be created and updated frequently in production.
The kuryr-kubernetes currently supports endpoint with Neutron LBaaS (or Octavia).
However, I've seen many PENDING_UPDATE state of LBaaS when multiple endpoints are updated at the same time.
(In both LBaaSv2 and Octavia Env)

Log example:
018-02-05 19:19:56.376 1 DEBUG kuryr_kubernetes.controller.drivers.lbaasv2 [-] Provisioning status PENDING_UPDATE for LBaaSLoadBalancer(id=1e539dac-1123-4fe5-8b16-2a2c78b41563,ip=172.16.69.117,name='e2e-tests-kubectl-6n98c/rm2',port_id=62a3f317-378a-4ad7-99e6-f99e8659ff27,project_id='9243b6fce8704943805121f4992b7f5e',subnet_id=45a46ea1-6e05-4ce1-8530-9826bb49ef30), 1.8e+03s remaining until timeout _wait_for_provisioning /usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py:323
2018-02-05 19:19:56.594 1 DEBUG kuryr_kubernetes.controller.drivers.lbaasv2 [-] Provisioning status PENDING_UPDATE for LBaaSLoadBalancer(id=a228a89c-fad8-464a-a196-5def8f8f0a96,ip=172.16.69.56,name='e2e-tests-kubectl-6n98c/rm3',port_id=a02b52f7-9ba8-438b-b03d-ed7ac82e0ef0,project_id='9243b6fce8704943805121f4992b7f5e',subnet_id=45a46ea1-6e05-4ce1-8530-9826bb49ef30), 1.79e+03s remaining until timeout _wait_for_provisioning /usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py:323

kuryr-kubernetes now waits for LBaaS activation delay with FIXED CONSTANT TIMEOUT.
-> https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/drivers/lbaasv2.py#L33

If LBaaS is not activated after the timeout, kuryr-controller raises the exception and doesn't do any behavior without mercy on Neutron.
-> https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/drivers/lbaasv2.py#L411
-> It does not clean up created resources, does not check any failure on k8s resource, and does not have deferred falesafe mechanism either.

I've tried to clean up resources after timeout, but releasing resources are failed again due to LBaaS delay.

The only solution I've come up with was just to configure _ACTIVATION_TIMEOUT of LBaaS and wait for longer dealy.

I'm not sure that this solution is right way to fix.
Please review the change or suggest another change.

Thanks.
Eunsoo Park (Evan)