LB instance is in pending create state then in error

Bug #1911029 reported by Narinder Gupta
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Octavia Charm
Expired
Undecided
Unassigned

Bug Description

Openstack: Ussuri / focal with OVN
Charm:
octavia 6.1.0 active 3 octavia jujucharms 30 ubuntu
octavia-mysql-router 8.0.22 active 3 mysql-router jujucharms 4 ubuntu
octavia-ovn-chassis 20.03.1 active 3 ovn-chassis jujucharms 7 ubuntu

When we create the load balancer we get the state of LB as pending create then to error state after timeout.

octavia-api service is in dead state. tried to start the API

Failed to start octavia-api.service: Unit octavia-api.service is masked. I tried to start but give this error Failed to start octavia-api.service: Unit octavia-api.service is masked

sudo service octavia-api status
● octavia-api.service
     Loaded: masked (Reason: Unit octavia-api.service is masked.)
     Active: inactive (dead)

Dec 18 07:50:45 juju-2ffe71-12-lxd-3 systemd[1]: Started OpenStack Octavia API (octavia-api).
Dec 18 07:50:48 juju-2ffe71-12-lxd-3 systemd[1]: Stopping OpenStack Octavia API (octavia-api)...
Dec 18 07:50:48 juju-2ffe71-12-lxd-3 systemd[1]: octavia-api.service: Succeeded.
Dec 18 07:50:48 juju-2ffe71-12-lxd-3 systemd[1]: Stopped OpenStack Octavia API (octavia-api).

ubuntu@juju-2ffe71-12-lxd-3:~$ sudo service octavia-api start
Failed to start octavia-api.service: Unit octavia-api.service is masked.

Tags: field-high
tags: added: field-high
description: updated
Revision history for this message
Billy Olsen (billy-olsen) wrote :

Do you have some logs and bundle configuration information to diagnose and debug this? A juju crashdump would be quite helpful to see what's going on.

Revision history for this message
Frode Nordahl (fnordahl) wrote :

The octavia API is a WSGI service run through Apache, so the octavia-api systemd service being maskied is perfectly normal.

It is indeed impossible to say anything about the issue with so little information, but I'd suspect it could be rooted in the ongoing problem of LXD containers coming up in such a way that services are unable to determine their FQDN on deploy/boot, see details on bug 1896630.

Changed in charm-octavia:
status: New → Incomplete
Revision history for this message
Narinder Gupta (narindergupta) wrote :

I think configure-resources were not successfull and failed earlier.

2020-12-18 16:22:15 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:88:run_default_update_status
2020-12-18 16:22:15 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:121:default_request_certificates
2020-12-18 16:22:15 INFO juju-log Invoking reactive handler: hooks/relations/tls-certificates/requires.py:79:joined:certificates
2020-12-18 16:22:15 INFO juju-log Invoking reactive handler: hooks/relations/ovsdb-subordinate/requires.py:129:joined:ovsdb-subordinate
2020-12-18 16:22:16 INFO juju.worker.uniter.operation runhook.go:142 ran "update-status" hook (via explicit, bespoke hook script)
2020-12-18 16:26:14 INFO juju-log Created router 3014e166-b9d5-4d84-84bc-38b34b842bac
2020-12-18 16:26:14 ERROR juju-log action "configure-resources" failed: "'NoneType' object is not iterable" "Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-octavia-1/charm/actions/configure-resources", line 116, in main
    action(args)
  File "/var/lib/juju/agents/unit-octavia-1/charm/actions/configure-resources", line 56, in configure_resources
    (network, secgrp) = api_crud.get_mgmt_network(
  File "/var/lib/juju/agents/unit-octavia-1/charm/lib/charm/openstack/api_crud.py", line 603, in get_mgmt_network
    for subnet in subnets:
TypeError: 'NoneType' object is not iterable
"
2020-12-18 16:26:14 INFO juju-log DEPRECATION WARNING: Function action_fail is being removed : moved to function_fail()
2020-12-18 16:27:26 INFO juju-log Reactive main running for hook update-status
2020-12-18 16:27:26 INFO juju-log Initializing Leadership Layer (is leader)
2020-12-18 16:27:27 INFO juju-log Invoking reactive handler: reactive/layer_openstack.py:59:default_update_status
2020-12-18 16:27:27 INFO juju-log Invoking reactive handler: reactive/layer_openstack_api.py:6:default_amqp_connection
2020-12-18 16:27:27 INFO juju-log Invoking reactive handler: reactive/layer_openstack_api.py:20:default_setup_database

Revision history for this message
Narinder Gupta (narindergupta) wrote :

I have posted the bundle in chat also dump is huge like 18GB but can post any specific log files if needed.

Revision history for this message
Narinder Gupta (narindergupta) wrote :
Download full text (9.1 KiB)

Today i redeployed the environment and not seeing 1896630 but facing issue in getting response from octavia

REQ: curl -g -i --cacert "/home/ubuntu/tls/root.pem" -X GET https://octavia.hou-01.cloud.prod.cpanel.net:9876/v2.0/lbaas/loadbalancers -H "Accept: application/json" -H "User-Agent: openstacksdk/0.36.0 keystoneauth1/3.17.1 python-requests/2.18.4 CPython/3.6.9" -H "X-Auth-Token: {SHA256}aa2e2a64f33974aca28d4675de38f778632546d731436e44013d11cbee95b4df"
Starting new HTTPS connection (1): octavia.hou-01.cloud.prod.cpanel.net

    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 367, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/lib/python3/dist-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 387, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 383, in _make_request
    httplib_response = conn.getresponse()
  File "/usr/lib/python3.6/http/client.py", line 1373, in getresponse
    response.begin()
  File "/usr/lib/python3.6/http/client.py", line 311, in begin
    version, status, reason = self._read_status()
  File "/usr/lib/python3.6/http/client.py", line 280, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/keystoneauth1/session.py", line 979, in _send_request
    resp = self.session.request(method, url, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 520, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/lib/python3/dist-packages/requests/sessions.py", line 630, in send
    r = adapter.send(request, **kwargs)
  File "/usr/lib/python3/dist-packages/requests/adapters.py", line 490, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cliff/app.py", line 401, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python3/dist-packages/cliff/display.py", line 116, in run
    column_nam...

Read more...

Changed in charm-octavia:
status: Incomplete → New
Revision history for this message
Narinder Gupta (narindergupta) wrote :
Download full text (4.5 KiB)

I logged into the aphomra instance and finding the following error from agents side.

an 21 23:47:44 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 systemd[1751]: Startup finished in 97ms.
Jan 21 23:47:52 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:47:52.942 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.

Jan 21 23:48:02 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:48:02.955 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.
Jan 21 23:48:12 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:48:12.968 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.
Jan 21 23:48:22 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:48:22.980 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.
Jan 21 23:48:32 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:48:32.993 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.
Jan 21 23:48:42 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:48:42.996 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.
Jan 21 23:48:53 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:48:53.010 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.
Jan 21 23:49:03 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:49:03.024 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.
Jan 21 23:49:13 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:49:13.037 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.
Jan 21 23:49:23 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:49:23.050 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.
Jan 21 23:49:33 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:49:33.062 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found. Unable to send heartbeat.
Jan 21 23:49:33 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 systemd[1]: Starting Cleanup of Temporary Directories...
Jan 21 23:49:33 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Jan 21 23:49:33 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 systemd[1]: Finished Cleanup of Temporary Directories.
Jan 21 23:49:43 amphora-737e8ddc-db17-4a98-95cf-c9a3d6d4c161 amphora-agent: 2021-01-21 23:49:43.075 1418 ERROR octavia.amphorae.backends.health_daemon.health_sender [-] No controller address found....

Read more...

Revision history for this message
Liam Young (gnuoy) wrote :

Hi Narinder, sorry you are having issues getting Octavia working. I'm struggling to follow this bug as it seems to cover a number of different issues. I've tried to summarise them below:

1) When we create the load balancer we get the state of LB as pending create then to error state after timeout.
This is quite a generic error message. What does `openstack loadbalancer amphora show <amphora-id>` return ? Can you provide the logs from ocatavia units please (/var/log/octavia and /var/log/apache2) ?

2) octavia-api service is in dead state.
As Frode mentioned this is expected

3) Charm configure-resources failed
It looks like there was an issue creating lb-mgmt-subnet. Could you provide the charm log and action output from the unit you ran configure-resources against please ? Also a list of networks and subnets in the deployment.

4) Error from octavia api
Please can you provide the logs from /var/log/octavia and /var/log/apache2 from the octavia units, these should contain any errors raised by the octavia api service

5) Aphomra instance cannot send heartbeat
I will have a look at the code and see if the heartbeat does more than a ping

Revision history for this message
Narinder Gupta (narindergupta) wrote : Re: [Bug 1911029] Re: LB instance is in pending create state then in error
Download full text (3.8 KiB)

Liam,
Yes i found few issues where certs generated by vault were not in sync with
oam interface on the Octavia unit so change the communication on the
internal network. Currently issue is communication amphora instances are
getting terminated. And nova is getting a request to delete the instance.

1) When we create the load balancer we get the state of LB as pending
create then to an error state. after aphomra instance is up nova-compute
gets the message to delete the instance and gets deleted.

crashdump is available at my jump box and not possible to attach to the
bug. Please message me and I will send you separately.

Thanks and Regards,
Narinder Gupta
Canonical, Ltd.
+1.281.736.5150

Ubuntu- Linux for human beings | www.ubuntu.com | www.canonical.com

On Fri, Jan 22, 2021 at 6:10 AM Liam Young <email address hidden>
wrote:

> Hi Narinder, sorry you are having issues getting Octavia working. I'm
> struggling to follow this bug as it seems to cover a number of different
> issues. I've tried to summarise them below:
>
> 1) When we create the load balancer we get the state of LB as pending
> create then to error state after timeout.
> This is quite a generic error message. What does `openstack loadbalancer
> amphora show <amphora-id>` return ? Can you provide the logs from ocatavia
> units please (/var/log/octavia and /var/log/apache2) ?
>
>
> 2) octavia-api service is in dead state.
> As Frode mentioned this is expected
>
> 3) Charm configure-resources failed
> It looks like there was an issue creating lb-mgmt-subnet. Could you
> provide the charm log and action output from the unit you ran
> configure-resources against please ? Also a list of networks and subnets in
> the deployment.
>
> 4) Error from octavia api
> Please can you provide the logs from /var/log/octavia and /var/log/apache2
> from the octavia units, these should contain any errors raised by the
> octavia api service
>
> 5) Aphomra instance cannot send heartbeat
> I will have a look at the code and see if the heartbeat does more than a
> ping
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1911029
>
> Title:
> LB instance is in pending create state then in error
>
> Status in OpenStack Octavia Charm:
> New
>
> Bug description:
> Openstack: Ussuri / focal with OVN
> Charm:
> octavia 6.1.0 active 3 octavia
> jujucharms 30 ubuntu
> octavia-mysql-router 8.0.22 active 3 mysql-router
> jujucharms 4 ubuntu
> octavia-ovn-chassis 20.03.1 active 3 ovn-chassis
> jujucharms 7 ubuntu
>
> When we create the load balancer we get the state of LB as pending
> create then to error state after timeout.
>
> octavia-api service is in dead state. tried to start the API
>
> Failed to start octavia-api.service: Unit octavia-api.service is
> masked. I tried to start but give this error Failed to start octavia-
> api.service: Unit octavia-api.service is masked
>
> sudo service octavia-api status
> ● octavia-api.service
> Loaded: masked (Reason: Unit octavia-api.service is masked.)
> Active: inactive (dead)
>
> Dec 1...

Read more...

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Marked as incomplete, as I believe this was related to the certificates that were in place. Narinder, please correct me if I'm wrong.

Changed in charm-octavia:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Octavia Charm because there has been no activity for 60 days.]

Changed in charm-octavia:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.