Requesting nrpe check for no active routers in HA neutron
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Neutron Gateway Charm |
Confirmed
|
Wishlist
|
Unassigned |
Bug Description
I ran into an issue where I had 5 neutron-gateways in my model, yet none of them contained any l3-agents that were active. This occurred because we were trying to work around an issue where neutron-l3-agents crash when they are told to join a rabbitmq-service which has yet to be clustered. (https:/
juju run -a neutron-gateway -- iptables -A OUTPUT -p tcp --dport 5672 -j DROP
While clever at preventing the l3-agents from crashing, there were some risks with this work-around approach.
1) the openstack service checks showed 'neutron agents dropped' when the rabbit port was blocked -- however they were still routing traffic. So we ignored this check
2) When the l3-agents were restarted (maybe relation changes from the new rabbitmq unit), each of the ha qrouters wouldn't go active and didn't have IP addresses in their ha interfaces.
after the new rabbitmq-server was actively in the cluster, we used
juju run -a neutron-gateway -- iptables -D OUTPUT -p tcp --dport 5672 -j DROP
and quickly L3 services were restored.
We could have spotted the L3 service outage earlier if there was an NRPE check indicating that there were no active l3 services available.. This sounds a bit similar to the openstack service checks, but in essence are slightly different.
Changed in charm-neutron-gateway: | |
status: | New → Confirmed |
importance: | Undecided → Wishlist |