Failover to an amphora from a spares pool fails

Bug #1597451 reported by Elena Ezhova
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
Fix Released
Critical
Unassigned

Bug Description

spare_amphora_pool_size = 1

Steps to reproduce:
1. Create a load balancer
2. Update its amphora's port in lb-mgmt-net with admin_state_up=False

Expected result: Load balancer fail-overs to an amphora from a spares pool.
Actual result: Failover fails with the following error:

branch: master - http://paste.openstack.org/show/524109/
branch: stable/mitaka - http://paste.openstack.org/show/523795/

As for the master branch the issue should be fixed by checking if dns-integration extension is enabled before calling port_update [1].

Related issue:https://bugs.launchpad.net/octavia/+bug/1558934

[1] https://github.com/openstack/octavia/blob/master/octavia/network/drivers/neutron/allowed_address_pairs.py#L436-L438

Elena Ezhova (eezhova)
description: updated
Elena Ezhova (eezhova)
Changed in octavia:
assignee: nobody → Elena Ezhova (eezhova)
Changed in octavia:
importance: Undecided → High
Revision history for this message
Michael Johnson (johnsom) wrote :

I am seeing this on normal failover too now. Obviously something has changed upstream.
Bumping to critical.

Changed in octavia:
importance: High → Critical
status: New → Confirmed
status: Confirmed → Triaged
Revision history for this message
Michael Johnson (johnsom) wrote :

It looks like this change triggered this:
https://review.openstack.org/#/c/311640/

Revision history for this message
Michael Johnson (johnsom) wrote :

Elena, this issue is blocking my failover work. I hope you don't mind if I propose a patch. I looked for you in IRC today, but didn't see you log in.

Revision history for this message
Elena Ezhova (eezhova) wrote :

@Michael, it's great that you are working on a fix, but the problem with 'dns-integration' extension is apparently not the only issue we are facing. With your patch failover fails on listeners update with the following 500 error in health manager logs: http://paste.openstack.org/show/526520/ (I only included a small part of the log, can provide the full snippet if needed).

And on the failover amphora side there are the following errors:

amphora-agent.log
2016-07-06 08:26:54.871 1339 INFO werkzeug [-] 192.168.0.9 - - [06/Jul/2016 08:26:54] "PUT /0.5/listeners/b8ba451c-3ff9-40a9-ab2c-86ecba8cdfeb/1047ff69-86dc-46fc-b295-8222c89c0d2b/haproxy HTTP/1.1" 202 -
2016-07-06 08:26:54.922 1339 INFO werkzeug [-] 192.168.0.9 - - [06/Jul/2016 08:26:54] "GET /0.5/listeners/1047ff69-86dc-46fc-b295-8222c89c0d2b HTTP/1.1" 200 -
2016-07-06 08:26:55.203 1339 DEBUG octavia.amphorae.backends.agent.api_server.listener [-] Failed to start HAProxy service: Command '['/usr/sbin/service', 'haproxy-1047ff69-86dc-46fc-b295-8222c89c0d2b', 'start']' returned non-zero exit status 1 start_stop_listener /usr/local/lib/python2.7/dist-packages/octavia/amphorae/backends/agent/api_server/listener.py:187
2016-07-06 08:26:55.239 1339 INFO werkzeug [-] 192.168.0.9 - - [06/Jul/2016 08:26:55] "PUT /0.5/listeners/1047ff69-86dc-46fc-b295-8222c89c0d2b/start HTTP/1.1" 500 -

ubuntu@amphora-b8ba451c-3ff9-40a9-ab2c-86ecba8cdfeb:~$ sudo cat /var/log/upstart/haproxy-1047ff69-86dc-46fc-b295-8222c89c0d2b.log
sort: cannot read: /var/lib/octavia/plugged_interfaces: No such file or directory

If I try to trigger a failover in case affected load balancer doesn't have a listener, error spring out all over octavia, nova-compute and neutron-server.
I should probably have had to devote more time to finding the root cause, but I didn't want to upload an incomplete fix.

Revision history for this message
Elena Ezhova (eezhova) wrote :
Changed in octavia:
assignee: Elena Ezhova (eezhova) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to octavia (master)

Reviewed: https://review.openstack.org/337939
Committed: https://git.openstack.org/cgit/openstack/octavia/commit/?id=4f208c3dc2c7a1b3d6e4ad14e17ac8ce5abcb940
Submitter: Jenkins
Branch: master

commit 4f208c3dc2c7a1b3d6e4ad14e17ac8ce5abcb940
Author: Michael Johnson <email address hidden>
Date: Wed Jul 6 00:49:54 2016 +0000

    Fixes failover issue with neutron dns integration

    Neutron has added and removed [1] the dns integration extension
    as enabled by default. This patch tells Octavia to check the
    neutron extension list to see if the dns integration extension
    is enabled before preparing a neutron port for failover and formating
    the request appropriately.

    Closes-Bug: #1597451
    Change-Id: I15f7cb50642616f87fc8fb5bcb2af1c2e849264d

Changed in octavia:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to octavia (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/338757

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/octavia 0.9.0

This issue was fixed in the openstack/octavia 0.9.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on octavia (stable/mitaka)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/mitaka
Review: https://review.openstack.org/338757
Reason: Mitaka is CVE only at this point; and it's not a CVE.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.