Tripleo 16.1 Beta Train Octavia service returns 503

Bug #1887801 reported by Courtney Oakley on 2020-07-16
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
High
Unassigned

Bug Description

Description
===========
TripleO 16.1 Beta Centos7 Train/Current deploys Octavia on overcloud using the supplied template

-e /usr/share/openstack-tripleo-heat-templates/environment/service/octavia.yaml to the deployment command.

The configuration is 3 Controllers,3 Compute,3 Ceph-Storage with SSL Public Endpoints and OVN (onv-ha-dvr).

However, once deployed, all loadbalancer commands result in a HTTP 503 Service unavailable response. Here are the logs from driver-agent.log and octavia.log from any of the three controllers.

/var/log/containers/octavia/octavia.log
2020-07-15 11:00:47.730 24 INFO octavia.common.config [-] Logging enabled!
2020-07-15 11:00:47.730 24 INFO octavia.common.config [-] mod_wsgi version 5.0.2
2020-07-15 11:00:47.774 25 INFO octavia.common.config [-] Logging enabled!
2020-07-15 11:00:47.774 25 INFO octavia.common.config [-] mod_wsgi version 5.0.2
2020-07-15 11:03:12.962 25 ERROR octavia.api.drivers.driver_factory [-] Unable to load provider driver ovn due to: Unable to open the driver agent socket: /var/run/octavia/status.sock: DriverAgentNotFound: Unable to open the driver agent socket: /var/run/octavia/status.sock
2020-07-15 11:03:12.977 24 ERROR octavia.api.drivers.driver_factory [-] Unable to load provider driver ovn due to: Unable to open the driver agent socket: /var/run/octavia/status.sock: DriverAgentNotFound: Unable to open the driver agent socket: /var/run/octavia/status.sock
2020-07-15 11:03:13.100 24 INFO octavia.common.config [-] Logging enabled!
2020-07-15 11:03:13.100 24 INFO octavia.common.config [-] mod_wsgi version 5.0.2
2020-07-15 11:03:13.106 25 INFO octavia.common.config [-] Logging enabled!
2020-07-15 11:03:13.107 25 INFO octavia.common.config [-] mod_wsgi version 5.0.2
2020-07-15 11:05:30.297 24 ERROR octavia.api.drivers.driver_factory [-] Unable to load provider driver ovn due to: Unable to open the driver agent socket: /var/run/octavia/status.sock: DriverAgentNotFound: Unable to open the driver agent socket: /var/run/octavia/status.sock
2020-07-15 11:05:30.298 25 ERROR octavia.api.drivers.driver_factory [-] Unable to load provider driver ovn due to: Unable to open the driver agent socket: /var/run/octavia/status.sock: DriverAgentNotFound: Unable to open the driver agent socket: /var/run/octavia/status.sock

/var/log/containers/octavia/driver-agent.log
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent [-] stats_listener raised exception: [Errno 13] Permission denied. Restarting stats_listener.: error: [Errno 13] Permission denied
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent Traceback (most recent call last):
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent File "/usr/lib/python2.7/site-packages/octavia/cmd/driver_agent.py", line 65, in _process_wrapper
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent function(exit_event)
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent File "/usr/lib/python2.7/site-packages/octavia/api/drivers/driver_agent/driver_listener.py", line 138, in stats_listener
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent StatsRequestHandler)
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent File "/usr/lib64/python2.7/SocketServer.py", line 419, in __init__
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent self.server_bind()
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent File "/usr/lib64/python2.7/SocketServer.py", line 430, in server_bind
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent self.socket.bind(self.server_address)
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent File "/usr/lib64/python2.7/socket.py", line 224, in meth
2020-07-15 11:59:34.673 32 ERROR octavia.cmd.driver_agent return getattr(self._sock,name)(*args)

Steps to reproduce
==================
1) Deploy Undercloud configuration of Tripleo 16.1 Beta (upstream current) for Train on CentOS7.
2) Deploy Overcloud with Octavia by adding

-e /usr/share/openstack-tripleo-heat-templates/environment/service/octavia.yaml

3) Deployment has 3 controllers, 3 compute, 3 ceph-storage nodes; manila, barbican, octavia, ironic and sahara projects.

THT=/usr/share/openstack-tripleo-heat-templates
openstack overcloud deploy --templates $THT \
-e $THT/environments/network-environment.yaml \
-e /home/stack/templates/network-environment.yaml -e /home/stack/templates_generated/environments/network-isolation.yaml \
-e $THT/environments/services/octavia.yaml \
-e $THT/environments/docker.yaml \
-e $THT/environments/docker-ha.yaml \
-e $THT/environments/ceph-ansible/ceph-ansible.yaml --ntp-server 10.2.2.14 \
-e $THT/environments/ceph-ansible/ceph-mds.yaml \
-e /home/stack/templates/my-ceph-settings.yaml \
-e $THT/environments/docker.yaml \
-r /home/stack/templates/roles_data.yaml \
-n /home/stack/templates/network_data.yaml \
-e /home/stack/ssl-heat-templates/environments/ssl/enable-tls.yaml \
-e /home/stack/ssl-heat-templates/environments/ssl/tls-endpoints-public-ip.yaml \
-e /home/stack/ssl-heat-templates/environments/ssl/inject-trust-anchor.yaml \
-e /home/stack/ssl-heat-templates/environments/ssl/inject-trust-anchor-hiera.yaml \
-e /home/stack/templates/public_vip.yaml \
-e /home/stack/containers-prepare-parameter.yaml \
-e $THT/environments/services/sahara.yaml \
-e $THT/environments/manila-cephfsnative-config.yaml \
-e $THT/environments/services/ironic.yaml \
-e $THT/environments/services/ironic-overcloud.yaml \
-e $THT/environments/services/neutron-ovn-dvr-ha.yaml \
-e /home/stack/ironic-config.yaml \
-e /home/stack/templates/node-info.yaml \
-e $THT/environments/services/barbican.yaml \
-e $THT/environments/barbican-backend-simple-crypto.yaml \
-e /home/stack/barbican-configure.yaml \
-e /home/stack/templates/fencing.yaml \
-e $THT/environments/services/mistral.yaml \
--libvirt-type kvm \
--timeout 210

4) From Undercloud source overcloudrc then - "openstack loadbalancer create lb1 ..."
5) HTTP 503 Service unavailable returned.

Expected result
===============
The command should result in the creation of loadbalance lb1 with default settings.

Actual result
=============
Service Unavailable HTTP 503.

ADDITIONAL INFORMATION
- I have tracked the bug to the octavia_driver_agent docker container which has Ownership root.root for /var/run/octavia where the octavia socket is created.

- The docker container USER is "octavia". So hence the "[Errno 13] Permission Denied".

- The docker image is centos-binary-octavia-api. Which probably needs updating.

Revision history for this message
Courtney Oakley (courts3003) wrote :
Download full text (4.0 KiB)

Okay, I have workarounds for all the bugs that stopped Octavia (Openstack Loadbalancer) coming up for TripleO 16.1 Beta - Train on CentOS7 production capable installation. I hate having to answer all my own questions.

OCTAVIA BUGS OR MISCONFIGURATIONS

1. /var/run/octavia has incorrect ownership of root.root. It needs to be oactavia.octavia it prevents the new driver_agent from running, so nothing runs then.

WORKAROUND
Go to the Docker Container octavia_driver_agent and change the ownership to octavia.octavia then restart all octavia Docker Containers. Notice, we are all Docker here because Podman does not work with Pacemaker HA on CentOS7.

You have to execute these on all controllers.

-----------------COMMANDS
sudo docker exec -ti --user root octavia_driver_agent /bin/bash # Root on the container
chown octavi.octavia /var/run/octavia
exit

sudo docker restart octavia_driver_agent
sudo docker restart octavia_worker
sudo docker restart octavia_housekeeping
sudo docker restart octavia_health_manager
------------------

I did try changing the docker image in the container-image-prepare command, but somewhere in the YAML deployment scripts /var/run/octavia gets reset to root.root ownership.

2. No Octavia certs get generated automatically.

The logic within the deployment jinga2 script /usr/share/openstack-tripleo-heat-templates/deployment/octavia/octavia-deployment-config.j2.yaml means that Octavia Certs are only generated if OctaviaGenerateCerts is true and IMPORTANTLY the Stack is at state CREATE.

WORKAROUND
So it only works if your TripleO deploys completely the first time round. If you are looking at multiple Controllers and Computes the share number of moving parts makes this highly unlikely. I think the logic should be changed so that generate_certs is true when OctaviaGenerateCerts is true and Stack is CREATE or STACK is UPDATE.

Is there a problem with regenerating the certs each time you deploy??? (it's better than not creating them at all)

My TripleO never deploys completely each time, I always have at least one restart even just owing to docker.io pull bandwidth issues. So I changed Stack - CREATE to UPDATE. I'll change it to CREATE or UPDATE (the best option) on my next redeployment.

-------------------------------- /usr/share/openstack-tripleo-heat-templates/deployment/octavia/octavia-deployment-config.j2.yaml

  generate_certs:
      and:
      - get_param: OctaviaGenerateCerts
      - equals:
        - get_param: StackAction
# - CREATE
        - UPDATE

3. No Octavia roles are generated by deployment scripts. This is a strange one. The Policy file is generated and in place but no roles are set. Octavia cannot work without the 'load-balancer_member' and 'load-balancer_admin' roles being set for users. So we might as well generate the roles then.

This was an abandoned update (shown below possible fix).

WORKAROUND
------------------------------COMMANDS
source overcloudrc
openstack role create load-balancer_member
openstack role create load-balancer_admin
openstack role create load-balancer_observer
openstack role create load-balancer_global_observer
openstack role create load-balancer_quota_admin

For USER and...

Read more...

Revision history for this message
Wojciech (suzumushi) wrote :

>> Is there a problem with regenerating the certs each time you deploy???

yes, because next time You will update stack with already running balancers, it will broke everyting up

please see https://bugzilla.redhat.com/show_bug.cgi?id=1645536

regards
W

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/750778

Changed in tripleo:
milestone: none → victoria-3
status: New → Triaged
importance: Undecided → High
tags: added: queens-backport-potential train-backport-potential ussuri-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/750778
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=1cd151e9dc2e403e6b844fa87af7d3cafe0d1cb8
Submitter: Zuul
Branch: master

commit 1cd151e9dc2e403e6b844fa87af7d3cafe0d1cb8
Author: Brent Eagles <email address hidden>
Date: Wed Sep 9 15:46:28 2020 -0230

    Change permissions on /run/octavia to octavia

    The driver agent and the API processes need permission to manage the
    contents of /run/octavia.

    Change-Id: I103d88a1acdc9843fc419746779bdaa132ca569f
    Related-Bug: #1887801

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/753702

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/753703

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/753704

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/ussuri)

Reviewed: https://review.opendev.org/753702
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=3e4745e92105730b18470362a328f82ceb4fe4fc
Submitter: Zuul
Branch: stable/ussuri

commit 3e4745e92105730b18470362a328f82ceb4fe4fc
Author: Brent Eagles <email address hidden>
Date: Wed Sep 9 15:46:28 2020 -0230

    Change permissions on /run/octavia to octavia

    The driver agent and the API processes need permission to manage the
    contents of /run/octavia.

    Change-Id: I103d88a1acdc9843fc419746779bdaa132ca569f
    Related-Bug: #1887801
    (cherry picked from commit 1cd151e9dc2e403e6b844fa87af7d3cafe0d1cb8)

tags: added: in-stable-ussuri
tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/753703
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=d4ad97028e5f4a80bc16e75c148603c727ddfe5a
Submitter: Zuul
Branch: stable/train

commit d4ad97028e5f4a80bc16e75c148603c727ddfe5a
Author: Brent Eagles <email address hidden>
Date: Wed Sep 9 15:46:28 2020 -0230

    Change permissions on /run/octavia to octavia

    The driver agent and the API processes need permission to manage the
    contents of /run/octavia.

    Change-Id: I103d88a1acdc9843fc419746779bdaa132ca569f
    Related-Bug: #1887801
    (cherry picked from commit 1cd151e9dc2e403e6b844fa87af7d3cafe0d1cb8)

Changed in tripleo:
milestone: victoria-3 → wallaby-1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/stein)

Reviewed: https://review.opendev.org/753704
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=042380e0001e4ea71f83e266414d32f9941e3689
Submitter: Zuul
Branch: stable/stein

commit 042380e0001e4ea71f83e266414d32f9941e3689
Author: Brent Eagles <email address hidden>
Date: Wed Sep 9 15:46:28 2020 -0230

    Change permissions on /run/octavia to octavia

    The driver agent and the API processes need permission to manage the
    contents of /run/octavia.

    Conflicts:
            deployment/octavia/octavia-api-container-puppet.yaml

    Change-Id: I103d88a1acdc9843fc419746779bdaa132ca569f
    Related-Bug: #1887801
    (cherry picked from commit 1cd151e9dc2e403e6b844fa87af7d3cafe0d1cb8)

tags: added: in-stable-stein
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Brent Eagles (beagles) on 2021-03-24
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.