Neutron LBaaS delay raises kuryr exception

Bug #1753653 reported by Eunsoo Park on 2018-03-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kuryr-kubernetes
High
Unassigned

Bug Description

Dears,

K8s endpoints are supposed to be created and updated frequently in production.
The kuryr-kubernetes currently supports endpoint with Neutron LBaaS (or Octavia).
However, I've seen many PENDING_UPDATE state of LBaaS when multiple endpoints are updated at the same time.
(In both LBaaSv2 and Octavia Env)

Log example:
018-02-05 19:19:56.376 1 DEBUG kuryr_kubernetes.controller.drivers.lbaasv2 [-] Provisioning status PENDING_UPDATE for LBaaSLoadBalancer(id=1e539dac-1123-4fe5-8b16-2a2c78b41563,ip=172.16.69.117,name='e2e-tests-kubectl-6n98c/rm2',port_id=62a3f317-378a-4ad7-99e6-f99e8659ff27,project_id='9243b6fce8704943805121f4992b7f5e',subnet_id=45a46ea1-6e05-4ce1-8530-9826bb49ef30), 1.8e+03s remaining until timeout _wait_for_provisioning /usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py:323
2018-02-05 19:19:56.594 1 DEBUG kuryr_kubernetes.controller.drivers.lbaasv2 [-] Provisioning status PENDING_UPDATE for LBaaSLoadBalancer(id=a228a89c-fad8-464a-a196-5def8f8f0a96,ip=172.16.69.56,name='e2e-tests-kubectl-6n98c/rm3',port_id=a02b52f7-9ba8-438b-b03d-ed7ac82e0ef0,project_id='9243b6fce8704943805121f4992b7f5e',subnet_id=45a46ea1-6e05-4ce1-8530-9826bb49ef30), 1.79e+03s remaining until timeout _wait_for_provisioning /usr/lib/python2.7/site-packages/kuryr_kubernetes/controller/drivers/lbaasv2.py:323

kuryr-kubernetes now waits for LBaaS activation delay with FIXED CONSTANT TIMEOUT.
-> https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/drivers/lbaasv2.py#L33

If LBaaS is not activated after the timeout, kuryr-controller raises the exception and doesn't do any behavior (No mercy on Neutron).
-> https://github.com/openstack/kuryr-kubernetes/blob/master/kuryr_kubernetes/controller/drivers/lbaasv2.py#L411
-> It does not clean up created resources, does not check any failure on k8s resource, and does not have deferred failsafe mechanism either.

I've tried to clean up resources after timeout, but releasing resources are failed again due to LBaaS delay.

The only solution I've come up with was just to configure _ACTIVATION_TIMEOUT of LBaaS and wait for longer dealy.

I'm not sure that this solution is right way to fix.
Please review the change or suggest another change.

Thanks.
Eunsoo Park (Evan)

description: updated

Hi Eunsoo,

Out of curiosity - did you face the timeout issue also with LBaaS and provider=ha-proxy?

Eunsoo Park (esevan.park) wrote :

Hi Yossi,

If this configuration means provider=ha-proxy, yes.
service_provider=LOADBALANCERV2:Haproxy:neutron_lbaas.drivers.haproxy.plugin_driver.HaproxyOnHostPluginDriver:default

I'm currently diving into neutron lbaasv2 for the root causes of this issue.
If you have experience for this, any advice will be very appreciated.

Thanks

Changed in kuryr-kubernetes:
status: New → Triaged
importance: Undecided → High
Michal Dulko (michal-dulko-f) wrote :

The correct way to fix it is to make sure LBaaS handler will restart waiting for the LB once timed out instead of trying to create it again.

Reviewed: https://review.openstack.org/549945
Committed: https://git.openstack.org/cgit/openstack/kuryr-kubernetes/commit/?id=32cd153791d938a94d93eb1843bcd3c0da16c0de
Submitter: Zuul
Branch: master

commit 32cd153791d938a94d93eb1843bcd3c0da16c0de
Author: Eunsoo Park <email address hidden>
Date: Tue Mar 6 15:02:48 2018 +0900

    Make Neutron LBaaS Activation Timeout configurable

    This changes _ACTIVATION_TIMEOUT of LBaaS driver from constant to
    configurable value in order to make it flexible to production
    environment.

    This commit also increases the timeout value in DevStack plugin to make
    sure Octavia has time to run Amphorae in the gate.

    Co-Authored-By: Michał Dulko <email address hidden>
    Change-Id: I895d3e5af71ccc7219be422b9ca9e9f8833bad8f
    Related-Bug: 1753653
    Signed-off-by: Eunsoo Park <email address hidden>

Reviewed: https://review.openstack.org/563006
Committed: https://git.openstack.org/cgit/openstack/kuryr-kubernetes/commit/?id=c31680e8f91723e57d7bb488dc12837251220a05
Submitter: Zuul
Branch: stable/queens

commit c31680e8f91723e57d7bb488dc12837251220a05
Author: Eunsoo Park <email address hidden>
Date: Tue Mar 6 15:02:48 2018 +0900

    Make Neutron LBaaS Activation Timeout configurable

    This changes _ACTIVATION_TIMEOUT of LBaaS driver from constant to
    configurable value in order to make it flexible to production
    environment.

    This commit also increases the timeout value in DevStack plugin to make
    sure Octavia has time to run Amphorae in the gate.

    Co-Authored-By: Michał Dulko <email address hidden>
    Change-Id: I895d3e5af71ccc7219be422b9ca9e9f8833bad8f
    Related-Bug: 1753653
    Signed-off-by: Eunsoo Park <email address hidden>

tags: added: in-stable-queens
Michal Dulko (michal-dulko-f) wrote :

Seems like it's fixed.

Changed in kuryr-kubernetes:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers