nova-cloud-controller services are only running on 1 of 3 nodes
Bug #1584951 reported by
Francis Ginther
This bug report is a duplicate of:
Bug #1581171: pause/resume failing (workload status races).
Edit
Remove
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Landscape Server |
New
|
High
|
Andreas Hasenack | ||
16.06 |
New
|
High
|
Andreas Hasenack | ||
nova-cloud-controller (Juju Charms Collection) |
New
|
Undecided
|
Unassigned |
Bug Description
This was found with a landscape Autopilot deployment using swift object and ceph block storage and an internal nagios service. Nagios is reporting:
CRITICAL: nova-cloud-
The nagios check goes through each server in haproxy.cfg and tries to reach each endpoint. This fails for all but one of the nova-cloud-
Logs have been attached.
description: | updated |
description: | updated |
tags: | added: kanban-cross-team |
tags: | removed: kanban-cross-team |
Changed in nova-cloud-controller (Juju Charms Collection): | |
assignee: | nobody → Liam Young (gnuoy) |
Changed in landscape: | |
importance: | Undecided → High |
Changed in nova-cloud-controller (Juju Charms Collection): | |
assignee: | Liam Young (gnuoy) → nobody |
Changed in landscape: | |
assignee: | nobody → Andreas Hasenack (ahasenack) |
To post a comment you must log in.
Here's what I get from poking at the nagios check manually:
[From /etc/haproxy/ haproxy. cfg] ec2_10. 245.201. 243 controller- 2 10.245.201.243:8763 check controller- 1 10.245.201.60:8763 check controller- 0 10.245.201.231:8763 check
...
backend nova-api-
balance leastconn
server nova-cloud-
server nova-cloud-
server nova-cloud-
... e_10.245. 201.243 controller- 2 10.245.201.243:3323 check controller- 1 10.245.201.60:3323 check controller- 0 10.245.201.231:3323 check
backend nova-objectstor
balance leastconn
server nova-cloud-
server nova-cloud-
server nova-cloud-
... os-compute_ 10.245. 201.243 controller- 2 10.245.201.243:8764 check controller- 1 10.245.201.60:8764 check controller- 0 10.245.201.231:8764 check
backend nova-api-
balance leastconn
server nova-cloud-
server nova-cloud-
server nova-cloud-
...
Now the nagios check tries to access each of those IPs and ports. Doing this manually I get:
ubuntu@ juju-machine- 1-lxc-2: /etc/nagios/ nrpe.d$ nc -vz 10.245.201.243 8763 juju-machine- 1-lxc-2: /etc/nagios/ nrpe.d$ nc -vz 10.245.201.243 8764 juju-machine- 1-lxc-2: /etc/nagios/ nrpe.d$ nc -vz 10.245.201.243 3323
nc: connect to 10.245.201.243 port 8763 (tcp) failed: Connection refused
ubuntu@
Connection to 10.245.201.243 8764 port [tcp/*] succeeded!
ubuntu@
nc: connect to 10.245.201.243 port 3323 (tcp) failed: Connection refused
ubuntu@ juju-machine- 1-lxc-2: /etc/nagios/ nrpe.d$ nc -vz 10.245.201.60 3323 juju-machine- 1-lxc-2: /etc/nagios/ nrpe.d$ nc -vz 10.245.201.60 8763 juju-machine- 1-lxc-2: /etc/nagios/ nrpe.d$ nc -vz 10.245.201.60 8764
nc: connect to 10.245.201.60 port 3323 (tcp) failed: Connection refused
ubuntu@
nc: connect to 10.245.201.60 port 8763 (tcp) failed: Connection refused
ubuntu@
nc: connect to 10.245.201.60 port 8764 (tcp) failed: Connection refused
ubuntu@ juju-machine- 1-lxc-2: /etc/nagios/ nrpe.d$ nc -vz 10.245.201.231 8764 juju-machine- 1-lxc-2: /etc/nagios/ nrpe.d$ nc -vz 10.245.201.231 8763 juju-machine- 1-lxc-2: /etc/nagios/ nrpe.d$ nc -vz 10.245.201.231 3323
nc: connect to 10.245.201.231 port 8764 (tcp) failed: Connection refused
ubuntu@
nc: connect to 10.245.201.231 port 8763 (tcp) failed: Connection refused
ubuntu@
nc: connect to 10.245.201.231 port 3323 (tcp) failed: Connection refused
Only one of the ports is reachable.