Bug #1690812 “Octavia active/standby config+ pool with sourceip ...” : Bugs : octavia

Revision history for this message

Alex Stafeyev (astafeye) wrote on 2017-05-15:

#1

bug Edit (10.9 KiB, text/plain)

Ubuntu amphora

Revision history for this message

Alex Stafeyev (astafeye) wrote on 2017-05-15:

#2

bug2rhelamphora.txt Edit (18.2 KiB, text/plain)

More investigation logs

Revision history for this message

Nir Magnezi (nmagnezi) wrote on 2017-07-04:

#3

Debugging shows that this is indeed an issue, probably in how Octavia configures HAProxy.

Looking at the logs inside the amphora instance i noticed the following:

host-192-168-199-59 haproxy: [ALERT] 134/082747 (11416) : Proxy 'cce34da1-7b4d-4659-bb0b-6cf01ffbcd68': unable to find local peer 'amphora-b8928a25-ca71-4389-8753-6ab3b2fb3d2c.localdomain' in peers section '9c530de5653d474181b73fe70c398ad5_peers'.
host-192-168-199-59 haproxy: [ALERT] 134/082747 (11416) : Fatal errors found in configuration.

Revision history for this message

Michael Johnson (johnsom) wrote on 2017-07-05:

#4

The first issue I see is a python 3 bug:

File "/usr/local/lib/python3.5/dist-packages/octavia/amphorae/backends/agent/api_server/listener.py", line 230, in start_stop_listener
if 'Job is already running' not in e.output:
TypeError: a bytes-like object is required, not 'str'

That comparison "if 'Job is already running' not in e.output:" should probably be "if b'Job is already running' not in e.output:"

That should fix the first issue. I confirmed, that has not yet been fixed.

Changed in octavia:
status:	New → Triaged
importance:	Undecided → Critical

Revision history for this message

Michael Johnson (johnsom) wrote on 2017-07-05:

#5

Issue two, with the peers being incorrect. I think you hit the bug reported here: https://launchpad.net/bugs/1681623

There is a patch here: https://review.openstack.org/#/c/455569
I have been struggling with that patch as I could not reproduce, but I think you found the magic combination.

We will move forward with that bug/patch for your second issue.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-07-06: Related fix proposed to octavia (master)

#6

Related fix proposed to branch: master
Review: https://review.openstack.org/480919

Revision history for this message

Nir Magnezi (nmagnezi) wrote on 2017-07-06:

#7

Michael, Thanks for looking into this.
I submitted a tiny patch based on what you wrote in comment#4.
I'll test both this and the patch submitted against bug 1681623 and report my findings.

Revision history for this message

Nir Magnezi (nmagnezi) wrote on 2017-07-11:

#8

Download full text (5.8 KiB)

Looks like the fix for bug 1681623 did not resolve this bug.

Using loadbalancer_topology = ACTIVE_STANDBY and tested with the following pool configuration:
neutron lbaas-pool-create --lb-algorithm ROUND_ROBIN --listener listener1 --protocol HTTP --session-persistence type=SOURCE_IP
Created a new pool:
+---------------------+------------------------------------------------+
| Field | Value |
+---------------------+------------------------------------------------+
| admin_state_up | True |
| description | |
| healthmonitor_id | |
| id | b084ed49-038b-45dc-9b4b-8a277f60ba5b |
| lb_algorithm | ROUND_ROBIN |
| listeners | {"id": "8ac3b6b3-680e-4a58-b51d-883283a3caf1"} |
| loadbalancers | {"id": "d9800fd6-f010-4540-8d41-ac24ae325cc2"} |
| members | |
| name | |
| protocol | HTTP |
| session_persistence | {"cookie_name": null, "type": "SOURCE_IP"} |
| tenant_id | d67bee545d534850aedfbe77da709c68 |
+---------------------+------------------------------------------------+

The Octavia Worker shows the following exception:
https://paste.fedoraproject.org/paste/vpzv0KnOjaqYdbtf5K9Bng

Digging inside the amphora VMs, I noticed the following

One Amphora managed to spawn haproxy with no errors, looks like the standby amphora.

haproxy.cfg:
============
root@amphora-e643829b-4de3-4ebf-a96c-0bef10389f6f:~# cat /var/lib/octavia/8ac3b6b3-680e-4a58-b51d-883283a3caf1/haproxy.cfg
# Configuration for lb_nir
global
    daemon
    user nobody
    log /dev/log local0
    log /dev/log local1 notice
    stats socket /var/lib/octavia/8ac3b6b3-680e-4a58-b51d-883283a3caf1.sock mode 0666 level user

defaults
    log global
    retries 3
    option redispatch
    timeout connect 5000
    timeout client 50000
    timeout server 50000

peers 8ac3b6b3680e4a58b51d883283a3caf1_peers
peer yZB0PtEhlFqOwQHLLY3Zj9U_QAg 10.0.0.6:1025
peer C2BTWaiZ2FW-oF4uOy7c2LeC0mU 10.0.0.14:1025

frontend 8ac3b6b3-680e-4a58-b51d-883283a3caf1
    option httplog
    bind 10.0.0.9:80
    mode http

haproxy log:
============
cat /var/log/haproxy.log
Jul 11 11:11:06 amphora-e643829b-4de3-4ebf-a96c-0bef10389f6f haproxy[1723]: Proxy 8ac3b6b3-680e-4a58-b51d-883283a3caf1 started.
Jul 11 11:11:06 amphora-e643829b-4de3-4ebf-a96c-0bef10389f6f haproxy[1723]: Proxy 8ac3b6b3-680e-4a58-b51d-883283a3caf1 started.

The second amphora fails to spawn haproxy, looks like the active amphora:

haproxy.cfg:
============
root@amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0:~# cat /var/lib/octavia/8ac3b6b3-680e-4a58-b51d-883283a3caf1/haproxy.cfg
# Configuration for lb_nir
global
    daemon
    user nobody
    log /dev/log local0
    log /dev/log local1 notice
    stats socket /var/lib/octavia/8ac3b6b3-680e-4a58-b51d-883283a3caf1...

Looks like the fix for bug 1681623 did not resolve this bug.

Using loadbalancer_topology = ACTIVE_STANDBY and tested with the following pool configuration:
neutron lbaas-pool-create --lb-algorithm ROUND_ROBIN --listener listener1 --protocol HTTP --session-persistence type=SOURCE_IP
Created a new pool:
+---------------------+------------------------------------------------+
| Field               | Value                                          |
+---------------------+------------------------------------------------+
| admin_state_up      | True                                           |
| description         |                                                |
| healthmonitor_id    |                                                |
| id                  | b084ed49-038b-45dc-9b4b-8a277f60ba5b           |
| lb_algorithm        | ROUND_ROBIN                                    |
| listeners           | {"id": "8ac3b6b3-680e-4a58-b51d-883283a3caf1"} |
| loadbalancers       | {"id": "d9800fd6-f010-4540-8d41-ac24ae325cc2"} |
| members             |                                                |
| name                |                                                |
| protocol            | HTTP                                           |
| session_persistence | {"cookie_name": null, "type": "SOURCE_IP"}     |
| tenant_id           | d67bee545d534850aedfbe77da709c68               |
+---------------------+------------------------------------------------+

The Octavia Worker shows the following exception:
https://paste.fedoraproject.org/paste/vpzv0KnOjaqYdbtf5K9Bng

Digging inside the amphora VMs, I noticed the following

One Amphora managed to spawn haproxy with no errors, looks like the standby amphora.

haproxy.cfg:
============
root@amphora-e643829b-4de3-4ebf-a96c-0bef10389f6f:~# cat /var/lib/octavia/8ac3b6b3-680e-4a58-b51d-883283a3caf1/haproxy.cfg 
# Configuration for lb_nir
global
    daemon
    user nobody
    log /dev/log local0
    log /dev/log local1 notice
    stats socket /var/lib/octavia/8ac3b6b3-680e-4a58-b51d-883283a3caf1.sock mode 0666 level user

defaults
    log global
    retries 3
    option redispatch
    timeout connect 5000
    timeout client 50000
    timeout server 50000

peers 8ac3b6b3680e4a58b51d883283a3caf1_peers
    peer yZB0PtEhlFqOwQHLLY3Zj9U_QAg 10.0.0.6:1025
    peer C2BTWaiZ2FW-oF4uOy7c2LeC0mU 10.0.0.14:1025

frontend 8ac3b6b3-680e-4a58-b51d-883283a3caf1
    option httplog
    bind 10.0.0.9:80
    mode http

haproxy log:
============
cat /var/log/haproxy.log 
Jul 11 11:11:06 amphora-e643829b-4de3-4ebf-a96c-0bef10389f6f haproxy[1723]: Proxy 8ac3b6b3-680e-4a58-b51d-883283a3caf1 started.
Jul 11 11:11:06 amphora-e643829b-4de3-4ebf-a96c-0bef10389f6f haproxy[1723]: Proxy 8ac3b6b3-680e-4a58-b51d-883283a3caf1 started.

The second amphora fails to spawn haproxy, looks like the active amphora:

haproxy.cfg:
============
root@amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0:~# cat /var/lib/octavia/8ac3b6b3-680e-4a58-b51d-883283a3caf1/haproxy.cfg 
# Configuration for lb_nir
global
    daemon
    user nobody
    log /dev/log local0
    log /dev/log local1 notice
    stats socket /var/lib/octavia/8ac3b6b3-680e-4a58-b51d-883283a3caf1.sock mode 0666 level user

defaults
    log global
    retries 3
    option redispatch
    timeout connect 5000
    timeout client 50000
    timeout server 50000

peers 8ac3b6b3680e4a58b51d883283a3caf1_peers
    peer yZB0PtEhlFqOwQHLLY3Zj9U_QAg 10.0.0.6:1025
    peer C2BTWaiZ2FW-oF4uOy7c2LeC0mU 10.0.0.14:1025

frontend 8ac3b6b3-680e-4a58-b51d-883283a3caf1
    option httplog
    bind 10.0.0.9:80
    mode http
    default_backend b084ed49-038b-45dc-9b4b-8a277f60ba5b

backend b084ed49-038b-45dc-9b4b-8a277f60ba5b
    mode http
    balance roundrobin
    stick-table type ip size 10k peers 8ac3b6b3680e4a58b51d883283a3caf1_peers
    stick on src

haproxy log:
============
root@amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0:~# cat /var/log/haproxy.log
Jul 11 11:11:05 amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0 haproxy[1724]: Proxy 8ac3b6b3-680e-4a58-b51d-883283a3caf1 started.
Jul 11 11:11:05 amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0 haproxy[1724]: Proxy 8ac3b6b3-680e-4a58-b51d-883283a3caf1 started.
Jul 11 11:30:03 amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0 haproxy[3079]: [ALERT] 191/113003 (3079) : Proxy 'b084ed49-038b-45dc-9b4b-8a277f60ba5b': unable to find local peer 'amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0' in peers section '8ac3b6b3680e4a58b51d883283a3caf1_peers'.
Jul 11 11:30:03 amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0 haproxy[3079]: [WARNING] 191/113003 (3079) : Removing incomplete section 'peers 8ac3b6b3680e4a58b51d883283a3caf1_peers' (no peer named 'amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0').
Jul 11 11:30:03 amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0 haproxy[3079]: [ALERT] 191/113003 (3079) : Fatal errors found in configuration.

[ALERT] 191/115358 (4580) : Proxy 'b084ed49-038b-45dc-9b4b-8a277f60ba5b': unable to find local peer 'amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0' in peers section '8ac3b6b3680e4a58b51d883283a3caf1_peers'.

It seems like something is not right with they way we configure peers, I just can't put my finger on what exactly is wrong with that haproxy.cfg file.
Attempts to manually invoke haproxy resulted the same error:

root@amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0:/etc/systemd# /usr/sbin/haproxy -f /var/lib/octavia/8ac3b6b3-680e-4a58-b51d-883283a3caf1/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -c -q
[ALERT] 191/115358 (4580) : Proxy 'b084ed49-038b-45dc-9b4b-8a277f60ba5b': unable to find local peer 'amphora-6ca8b5ae-3185-48b0-9615-9bf8d5df30f0' in peers section '8ac3b6b3680e4a58b51d883283a3caf1_peers'.
[ALERT] 191/115358 (4580) : Fatal errors found in configuration.

P.S.
====
1. Used the default Ubuntu based amphora.
2. None of this happens with loadbalancer_topology = SINGLE, this is way CI is not failing.

Revision history for this message

Michael Johnson (johnsom) wrote on 2017-07-12:

#9

Can you confirm you have a newly built image with the fix from: https://review.openstack.org/#/c/455569

It is behaving like the "-L" command line flag is missing which is what the fix addressed.

The manual run below is also missing the -L flag.

/usr/sbin/haproxy -f /var/lib/octavia/8ac3b6b3-680e-4a58-b51d-883283a3caf1/haproxy.cfg -f /var/lib/octavia/haproxy-default-user-group.conf -c -q

See these lines in the source:
https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/agent/api_server/templates/systemd.conf.j2#L27
https://github.com/openstack/octavia/blob/master/octavia/amphorae/backends/agent/api_server/templates/systemd.conf.j2#L28

Revision history for this message

Nir Magnezi (nmagnezi) wrote on 2017-07-13:

#10

@Michael, you were right.
apparently, my devstack used a cached amphora image without this fix.
As soon as I generated a new one, test_session_persistence worked with loadbalancer_topology = ACTIVE_STANDBY

{0} neutron_lbaas.tests.tempest.v2.scenario.test_session_persistence.TestSessionPersistence.test_session_persistence [246.013035s] ... ok

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-07-04: Change abandoned on octavia (master)

#11

Change abandoned by Nir Magnezi (<email address hidden>) on branch: master
Review: https://review.openstack.org/480919

Revision history for this message

Gregory Thiemonge (gthiemonge) wrote on 2023-03-31: auto-abandon-script

#12

Abandoned after re-enabling the Octavia launchpad.

Changed in octavia:
status:	Triaged → Invalid
tags:	added: auto-abandon

octavia

Octavia active/standby config+ pool with sourceip session persistence configuration- Service is not available and LB is not deleted after test

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches