unable to reach nova compute api

Bug #1808555 reported by Jason Hobbs
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Expired
Undecided
Unassigned

Bug Description

During a rally run, we were unable to reach the nova compute api.

shade.exc.OpenStackCloudException: Error fetching keypair list (Inner Exception: Unable to establish connection to http://10.244.40.91:8774/v2.1/os-keypairs: HTTPConnectionPool(host='10.244.40.91', port=8774): Max retries exceeded with url: /v2.1/os-keypairs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8b20218a50>: Failed to establish a new connection: [Errno 113] No route to host',)))
Unhandled exception in thread started by <bound method Transport.__bootstrap of <paramiko.Transport at 0x233478d0L (cipher aes128-ctr, 128 bits) (active; 0 open channel(s))>>

I'm not sure why.

Opening this to track it to see if we hit it often.

https://solutions.qa.canonical.com/#/qa/testRun/ff356bc1-7429-49a6-a398-bc10f077c3e2

Revision history for this message
John George (jog) wrote :
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

we need to move this to a n-c-c charm bug.

affects: cdoqa-system-tests → charm-nova-cloud-controller
tags: added: cdo-qa foundations-engine
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :

sub'd to field high - we have been hitting this since December, twice in the last 10 days.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Jason Hobbs (jason-hobbs) wrote :
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hi Jason, I'm taking a look at the crashdump.

Useful for crashdump navigation:
./18/lxd/5/var/log/juju/unit-nova-cloud-controller-0.log
./20/lxd/5/var/log/juju/unit-nova-cloud-controller-1.log
./21/lxd/5/var/log/juju/unit-nova-cloud-controller-2.log

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Looking at the hacluster and nova-cc logs, nothing is jumping out:

No obvious issues appear in:
./18/lxd/5/var/log/juju/unit-hacluster-nova-2.log
./20/lxd/5/var/log/juju/unit-hacluster-nova-1.log
./21/lxd/5/var/log/juju/unit-hacluster-nova-0.log **
** there are some errors in DEBUG msgs such as "ha-relation-changed ERROR: could not replace cib (rc=203)" and "ha-relation-changed Resource 'cl_nova_haproxy' not found" but things finally settle to "Pacemaker is ready" so I assume those are normal messages as a deployment settles.

No obvious issues appear in:
./18/lxd/5/var/log/juju/unit-nova-cloud-controller-0.log
./20/lxd/5/var/log/juju/unit-nova-cloud-controller-1.log
./21/lxd/5/var/log/juju/unit-nova-cloud-controller-2.log

nova api port status looks good:
$ grep -r -E '8764|8774' ./18/lxd/5/listening.txt
tcp 0 0 0.0.0.0:8774 0.0.0.0:* LISTEN 331500/haproxy
tcp6 0 0 :::8774 :::* LISTEN 331500/haproxy
tcp6 0 0 :::8764 :::* LISTEN 331275/apache2
$ grep -r -E '8764|8774' ./20/lxd/5/listening.txt
tcp 0 0 0.0.0.0:8774 0.0.0.0:* LISTEN 344845/haproxy
tcp6 0 0 :::8774 :::* LISTEN 344845/haproxy
tcp6 0 0 :::8764 :::* LISTEN 345073/apache2
$ grep -r -E '8764|8774' ./21/lxd/5/listening.txt
tcp 0 0 0.0.0.0:8774 0.0.0.0:* LISTEN 294541/haproxy
tcp6 0 0 :::8774 :::* LISTEN 294541/haproxy
tcp6 0 0 :::8764 :::* LISTEN 333501/apache2

Revision history for this message
Corey Bryant (corey.bryant) wrote :

There are also no obvious errors in nova API logs (other than the typical early deployment sqlite errors that occur prior to percona bing introduced to the deployment):
./21/lxd/5/var/log/nova/nova-api-os-compute.log
./20/lxd/5/var/log/nova/nova-api-os-compute.log
./18/lxd/5/var/log/nova/nova-api-os-compute.log

Revision history for this message
Corey Bryant (corey.bryant) wrote :

It might be useful to run with 'debug: true'. I see your bundle only has debug on for keystone-ldap. Although with that said I don't think this is pointing at a nova-cc issue.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I was going to take a look at SSL termination config in the crashdump but there are no /etc/apache2 directories in the crashdump. I'm not sure why.

Nonetheless it is a "No route to host" error that seems to be the issue. What that is caused by, I'm not sure but typically it means there was an issue that prior to the request reaching the API port.

Revision history for this message
Jason Hobbs (jason-hobbs) wrote : Re: [Bug 1808555] Re: unable to reach nova compute api

The 10.244.40.91 address that it's getting a no route to host on is a VIP.
I would suspect there is some problem with the VIP assignment.

On Fri, Apr 5, 2019 at 9:51 AM Corey Bryant <email address hidden>
wrote:

> I was going to take a look at SSL termination config in the crashdump
> but there are no /etc/apache2 directories in the crashdump. I'm not sure
> why.
>
> Nonetheless it is a "No route to host" error that seems to be the issue.
> What that is caused by, I'm not sure but typically it means there was an
> issue that prior to the request reaching the API port.
>
> --
> You received this bug notification because you are a member of Canonical
> Field High, which is subscribed to the bug report.
> https://bugs.launchpad.net/bugs/1808555
>
> Title:
> unable to reach nova compute api
>
> Status in OpenStack nova-cloud-controller charm:
> New
>
> Bug description:
> During a rally run, we were unable to reach the nova compute api.
>
> shade.exc.OpenStackCloudException: Error fetching keypair list (Inner
> Exception: Unable to establish connection to
> http://10.244.40.91:8774/v2.1/os-keypairs:
> HTTPConnectionPool(host='10.244.40.91', port=8774): Max retries exceeded
> with url: /v2.1/os-keypairs (Caused by
> NewConnectionError('<urllib3.connection.HTTPConnection object at
> 0x7f8b20218a50>: Failed to establish a new connection: [Errno 113] No route
> to host',)))
> Unhandled exception in thread started by <bound method
> Transport.__bootstrap of <paramiko.Transport at 0x233478d0L (cipher
> aes128-ctr, 128 bits) (active; 0 open channel(s))>>
>
> I'm not sure why.
>
> Opening this to track it to see if we hit it often.
>
>
> https://solutions.qa.canonical.com/#/qa/testRun/ff356bc1-7429-49a6-a398-bc10f077c3e2
>
> To manage notifications about this bug go to:
>
> https://bugs.launchpad.net/charm-nova-cloud-controller/+bug/1808555/+subscriptions
>

Changed in charm-nova-cloud-controller:
status: New → Triaged
Revision history for this message
Ryan Beisner (1chb1n) wrote :

If this issue persists, please provide new logs, juju-crashdump, and sanitized bundle artifacts -- with debug logging turned on for nova-cloud-controller, keystone and keystone-ldap. Thank you.

Changed in charm-nova-cloud-controller:
status: Triaged → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack nova-cloud-controller charm because there has been no activity for 60 days.]

Changed in charm-nova-cloud-controller:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.