Too many pools created from heat template when both listeners and pools depend on a item

Bug #1635449 reported by Daniel Russell
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
octavia
Fix Released
Critical
Unassigned

Bug Description

When you deploy a heat template that has both listeners and pools depending on a item, due to the order of locking, you may get additional pools created erroneously.

Excerpt of heat template showing the issue :

##### LOADBALANCERS #####

  test-loadbalancer:
    type: OS::Neutron::LBaaS::LoadBalancer
    properties:
      name: test
      description: test
      vip_subnet: { get_param: subnet }

##### LISTENERS #####

  http-listener:
    type: OS::Neutron::LBaaS::Listener
    depends_on: test-loadbalancer
    properties:
      name: listener1
      description: listener1
      protocol_port: 80
      loadbalancer: { get_resource: test-loadbalancer }
      protocol: HTTP

  https-listener:
    type: OS::Neutron::LBaaS::Listener
    depends_on: http-listener
    properties:
      name: listener2
      description: listener2
      protocol_port: 443
      loadbalancer: { get_resource: test-loadbalancer }
      protocol: TERMINATED_HTTPS
      default_tls_container_ref: '<tls container>'

##### POOLS #####

  http-pool:
    type: OS::Neutron::LBaaS::Pool
    depends_on: http-listener
    properties:
      name: pool1
      description: pool1
      lb_algorithm: 'ROUND_ROBIN'
      listener: { get_resource: http-listener }
      protocol: HTTP

  https-pool:
    type: OS::Neutron::LBaaS::Pool
    depends_on: https-listener
    properties:
      name: pool2
      description: pool2
      lb_algorithm: 'ROUND_ROBIN'
      listener: { get_resource: https-listener }
      protocol: HTTP

After the http-listener is created, both a pool and another listener attempt to create but we end up with a number of pools (not always the same number).

Revision history for this message
Daniel Russell (danielr-2) wrote :

We believe that this may be due to https://github.com/openstack/neutron-lbaas/blob/master/neutron_lbaas/services/loadbalancer/plugin.py#L537-L543 performing test_and_set_status first and then creating the listener while https://github.com/openstack/neutron-lbaas/blob/master/neutron_lbaas/services/loadbalancer/plugin.py#L668-L671 creates the pool first and then performs the test_and_set_status.

We suspect that the listener creation puts it into pending update while it is setting that up, but the pools keep creating until it finally completes successfully (after the status goes back active)

Revision history for this message
Michael Johnson (johnsom) wrote :

Confirmed, the pool is being created in the database before the lock status of the load balancer is being checked. This will lead to ghost pools in the database.

Changed in neutron:
importance: Undecided → Critical
status: New → Triaged
tags: added: mitaka-backport-potential newton-backport-potential
Revision history for this message
fengbeihong (fengbeihong) wrote :

I use the keyword ‘depends_on’ of heat template to skip this bug.

Excerpt of heat template showing the FIX :

##### LOADBALANCERS #####

  test-loadbalancer:
    type: OS::Neutron::LBaaS::LoadBalancer
    properties:
      name: test
      description: test
      vip_subnet: { get_param: subnet }

##### LISTENERS #####

  http-listener:
    type: OS::Neutron::LBaaS::Listener
    properties:
      name: listener1
      description: listener1
      protocol_port: 80
      loadbalancer: { get_resource: test-loadbalancer }
      protocol: HTTP

  http-pool:
    type: OS::Neutron::LBaaS::Pool
    properties:
      name: pool1
      description: pool1
      lb_algorithm: 'ROUND_ROBIN'
      listener: { get_resource: http-listener }
      protocol: HTTP

  https-listener:
    type: OS::Neutron::LBaaS::Listener
    depends_on: http-pool # HERE BUG FIX
    properties:
      name: listener2
      description: listener2
      protocol_port: 443
      loadbalancer: { get_resource: test-loadbalancer }
      protocol: TERMINATED_HTTPS
      default_tls_container_ref: '<tls container>'

  https-pool:
    type: OS::Neutron::LBaaS::Pool
    properties:
      name: pool2
      description: pool2
      lb_algorithm: 'ROUND_ROBIN'
      listener: { get_resource: https-listener }
      protocol: HTTP

affects: neutron → octavia
Revision history for this message
Michael Johnson (johnsom) wrote :
Changed in octavia:
status: Triaged → Fix Released
Revision history for this message
Mathias Ewald (mewald) wrote :

@fengbeihong: How can this proposed workaround work? Putting a dependency on the listener to the pool while the pool references the listener creates a cyclic dependency.

Is there any other way around this?

Revision history for this message
Phil Kunze (p2k) wrote :

@mewald: I used this method while awaiting the canonical fix. This workaround is to explicitly control the order in which Heat performs the operations.

Normally you would just make http-pool dependent on http-listener (it already is by nature of needing the resource created to satisfy get_resource), and https-pool dependent on https-listener (http vs https).

If the LB is PENDING_UPDATE due to another operation, it should wait until it can make it's change. The bug is that the pool entry was created in the DB before heat knew that it would be able to perform the actual creation. Thus there were unexpected entries, and these prevent the Load Balancer from being terminated until they are manually removed from the DB.

By making https-listener (TLS) explicitly depend on http-pool, Heat will finish the clear-text HTTP operations before starting the TLS ones. This prevents an attempt for creation when the LB is not ready, and does not create the 'ghost' pools in the DB.

It's not cyclic as the https-listener is only related to the http-pool in that they reside on the same Load Balancer instance (which is why there can be conflict) before explicitly setting this.

The patch resolves it by checking the Load Balancer state ahead of the DB operations, which is the better long term fix, but adding this dependency remains valid even once the fix is in place (it may just take a short while longer).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.