LBaas V2: operating_status of 'dead' member is always online with Healthmonitor

Bug #1548774 reported by Cindia-blue
36
This bug affects 6 people
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Undecided
Cindia-blue
octavia
Invalid
Undecided
Unassigned
senlin
New
Undecided
Unassigned

Bug Description

Expectation:
Lbaas v2 healthmonitor will update status of "bad" member just as it behaves with v1. However, operating_status of pool members will not change no matter it is normal or not.

ENV:
My devstack runs in a single node of ubuntu14.04 and uses master branch code, mysql and rabbitmq. Tenantname is 'demo', username is 'demo'. I am using private-subnet for loadbalancer and member VM. octavia provider.

Steps to reproduce:
create a vm from cirros-0.3.4-x86_64-uec image and create one member accordingly into loadbalancer pool with healthmonitor. Then curl to get the statues of loadbalancer, find member status is online. Then nova stop the member mapped VM, curl again and again. Its operating_status of member keeps 'online' instead of 'error'.

Below comes the curl response. No difference before and after pool member VM turns into SHUTOFF since no status change happens ever.

{"statuses": {"loadbalancer": {"name": "", "listeners": [{"pools": [{"name": "", "provisioning_status": "ACTIVE", "healthmonitor": {"type": "PING", "id": "cb41b4e4-7008-479f-a6d9-4751ac7a1ee4", "name": "", "provisioning_status": "ACTIVE"}, "members": [{"name": "", "provisioning_status": "ACTIVE", "address": "10.0.0.13", "protocol_port": 80, "id": "6d682536-e9fe-4456-ad24-df8521857ee0", "operating_status": "ONLINE"}], "id": "eaef79a9-d5e0-4582-b45b-cd460beea4fc", "operating_status": "ONLINE"}], "name": "", "id": "4e3a7d98-3ab9-4a39-b915-a9651fcada65", "operating_status": "ONLINE", "provisioning_status": "ACTIVE"}], "id": "ef45be96-15e0-42d9-af34-34608dafdb6c", "operating_status": "ONLINE", "provisioning_status": "ACTIVE"}}}

Revision history for this message
Dariusz Smigiel (smigiel-dariusz) wrote :

Cindia-blue, thank you for this bug report.
Could you add more information about it, based on this template?
http://docs.openstack.org/developer/neutron/policies/bugs.html#bug-report-template

Changed in neutron:
status: New → Incomplete
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
description: updated
Yang Yu (yuyangbj)
Changed in neutron:
assignee: nobody → Yang Yu (yuyangbj)
Revision history for this message
Elena Ezhova (eezhova) wrote :

Reproduced for both Octavia and HaproxyOnHostPluginDriver service providers.

Changed in neutron:
status: Incomplete → Confirmed
tags: added: lbaas
Revision history for this message
cloudbuilders (operations-8) wrote :

We've came across the same problem. Is there any update on this bug? We've tried with Neutron + A10 lbaas driver and with HAProxy. The status is not updated in any case. The troubleshoot we did indicates that Neutron Agent is having a timeout waiting for a response on a rabbit queue. Seems that Neutron Server is either using another rabbit queue, or not responding the message posted by the agent at all.

Revision history for this message
Cindia-blue (miaoxinhuili) wrote :

Assign the bug to myself again and will try to fix it in next few days.

Changed in neutron:
assignee: Yang Yu (yuyangbj) → Cindia-blue (miaoxinhuili)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-lbaas (master)

Fix proposed to branch: master
Review: https://review.openstack.org/325624

Changed in neutron:
status: Confirmed → In Progress
Changed in neutron:
assignee: Cindia-blue (miaoxinhuili) → KaiLi (damonl1)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-lbaas (master)

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/323645
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Looks like it lost momentum.

Changed in neutron:
status: In Progress → Incomplete
assignee: KaiLi (damonl1) → nobody
tags: added: low-hanging-fruit
Revision history for this message
Cindia-blue (miaoxinhuili) wrote :

For the bug fix, please review this:
https://review.openstack.org/#/c/325624/

Changed in neutron:
assignee: nobody → Cindia-blue (miaoxinhuili)
Changed in neutron:
assignee: Cindia-blue (miaoxinhuili) → nobody
Revision history for this message
Cindia-blue (miaoxinhuili) wrote :

release the assignee, and need help to review below related fixes:
https://review.openstack.org/#/c/325624/
https://review.openstack.org/#/c/324197

Changed in neutron:
assignee: nobody → Cindia-blue (miaoxinhuili)
Damon Li (damonl1)
Changed in neutron:
status: Incomplete → Opinion
status: Opinion → Incomplete
Changed in neutron:
status: Incomplete → In Progress
Changed in neutron:
assignee: Cindia-blue (miaoxinhuili) → KaiLi (damonl1)
Changed in neutron:
assignee: KaiLi (damonl1) → Cindia-blue (miaoxinhuili)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/325624
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
status: In Progress → Won't Fix
Revision history for this message
Qiming Teng (tengqim) wrote :

what does this "won't fix" mean? the bug is invalid? there are other alternatives or workarounds?

Revision history for this message
Cindia-blue (miaoxinhuili) wrote :
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

We should reassess whether or not a neutron-lbaas fix is worth addressing.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Michael Johnson (<email address hidden>) on branch: master
Review: https://review.openstack.org/324197
Reason: This is already handled by enabling the event streamer in octavia.
event_streamer_driver = queue_event_streamer

Revision history for this message
Michael Johnson (johnsom) wrote :

Can you confirm you enabled the existing event streamer in octavia?
event_streamer_driver = queue_event_streamer

Changed in octavia:
status: New → Incomplete
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-lbaas (master)

Fix proposed to branch: master
Review: https://review.openstack.org/480933

Changed in octavia:
assignee: nobody → Gary Kotton (garyk)
status: Incomplete → In Progress
Changed in octavia:
assignee: Gary Kotton (garyk) → Nir Magnezi (nmagnezi)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.openstack.org/521250

Changed in octavia:
assignee: Nir Magnezi (nmagnezi) → Carlos Goncalves (cgoncalves)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-lbaas (master)

Reviewed: https://review.openstack.org/521250
Committed: https://git.openstack.org/cgit/openstack/neutron-lbaas/commit/?id=80a086695d11aa65251889671a3d729f62779c80
Submitter: Zuul
Branch: master

commit 80a086695d11aa65251889671a3d729f62779c80
Author: Carlos Goncalves <email address hidden>
Date: Sat Nov 18 10:21:43 2017 +0000

    Update pool member operating status for haproxy

    Collection of stats is a period task for which the HAProxy driver also
    returns member statuses, including 'status' field which maps to
    MemberV2.operating_status. Updating such field in neutron-lbaas DB
    ensures the stored operational status is up-to-date. Subsequently, the
    operating status of pools, listeners and load balancers can also be
    correctly reported to users upon querying in for the load balancer
    status tree API.

    This fixes cases where a pool member becomes offline in the dataplane
    but is not observed in neutron-lbaas. If a member turns back online
    again, its operating status will be updated too.

    Partial-Bug: #1548774

    Change-Id: Ief872b0463002b4e339f8eab71b37a4225d461cc

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron-lbaas (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/525355

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron-lbaas (master)

Change abandoned by Damon Li (<email address hidden>) on branch: master
Review: https://review.openstack.org/323645
Reason: Do not need now

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron-lbaas (stable/pike)

Reviewed: https://review.openstack.org/525355
Committed: https://git.openstack.org/cgit/openstack/neutron-lbaas/commit/?id=ebc3f735e50aab5cf23eaa67b907687a4ce49b43
Submitter: Zuul
Branch: stable/pike

commit ebc3f735e50aab5cf23eaa67b907687a4ce49b43
Author: Carlos Goncalves <email address hidden>
Date: Sat Nov 18 10:21:43 2017 +0000

    Update pool member operating status for haproxy

    Collection of stats is a period task for which the HAProxy driver also
    returns member statuses, including 'status' field which maps to
    MemberV2.operating_status. Updating such field in neutron-lbaas DB
    ensures the stored operational status is up-to-date. Subsequently, the
    operating status of pools, listeners and load balancers can also be
    correctly reported to users upon querying in for the load balancer
    status tree API.

    This fixes cases where a pool member becomes offline in the dataplane
    but is not observed in neutron-lbaas. If a member turns back online
    again, its operating status will be updated too.

    Partial-Bug: #1548774

    Change-Id: Ief872b0463002b4e339f8eab71b37a4225d461cc
    (cherry picked from commit 80a086695d11aa65251889671a3d729f62779c80)

tags: added: in-stable-pike
Revision history for this message
yanpuqing (ycx) wrote :

Hello everyone, I find a error for operating_status of 'dead' member.
When I update the "admin_state_up" of member, operating_status of pool members will not change.
In fact, It should become "DISABLED".
Maybe it's a new bug?

Revision history for this message
yanpuqing (ycx) wrote :

I'm sorry, I mean, When I update the "admin_state_up" of member to "False", operating_status of pool members will not change.

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote : auto-abandon-script

Abandoned after re-enabling the Octavia launchpad.

Changed in octavia:
assignee: Carlos Goncalves (cgoncalves) → nobody
status: In Progress → Invalid
tags: added: auto-abandon
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.