Add Monitoring for amphora failing to build due to quota/IP exhaustion

Bug #1875961 reported by Drew Freiberger
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Octavia Charm
Triaged
Medium
Unassigned

Bug Description

There are several failure scenarios when a site has multiple loadbalancers deployed on openstack (for instance, a cloud with k8s on top of openstack) where loadbalancers fail to create due to either IP exhaustion on the chosen VIP network, or quota limitations on the services project (see lp#1850985.

The octavia API interface doesn't expose these reasons for LB creation failure, but they're easily found within the octavia logs and should be scraped by nagios for operator notification.

I'll include some sample errors in the comments of this bug for inclusion in these checks.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

In the octavia-worker.log:

Quota exceeded message for RAM:

|__Flow 'octavia-create-loadbalancer-flow': octavia.common.exceptions.ComputeBuildException: Failed to build compute instance due to: Quota exceeded for ram: Requested 4096, but already used 49152 of 51200 ram (HTTP 403) (Request-ID: req-xyz)

Quota Exhaustion for vCPU and Instances:

                                         |__Flow 'octavia-create-loadbalancer-flow': octavia.common.exceptions.ComputeBuildException: Failed to build compute instance due to: Quota exceeded for cores, instances: Requested 2, 1, but already used 20, 10 of 20, 10 cores, instances (HTTP 403) (Request-ID: req-55ed3f13-03ad-4312-8d19-7cf087afcc05)

Searching for "Quota exceeded for (.*):" and reporting the (.*) quota buckets in the check would be quite useful.

Re: VIP Exhaustion:

2020-04-29 19:55:11.946 12954 ERROR octavia.network.drivers.neutron.allowed_address_pairs [req-f4c079ef-a110-45f4-8445-0363d98a7d37 - d4605f6a55ad47fabc406331ac0d94b2 - - -] Error creating the base (VRRP) port for the VIP with port details: {'port': {'name': 'octavia-lb-vrrp-7c9b12a6-3245-482c-87e0-62e1036ba0c1', 'network_id': '6f05a9de-4fc9-41f5-9c51-d5f43cd244b9', 'fixed_ips': [{'subnet_id': '4ab2ec1c-50bf-4d13-9fcc-dede97e5178c'}], 'admin_state_up': True, 'device_owner': 'Octavia'}}: neutronclient.common.exceptions.IpAddressGenerationFailureClient: No more IP addresses available on network 6f05a9de-4fc9-41f5-9c51-d5f43cd244b9.

Searching for "No more IP addresses available on network (.*)\." and reporting the matching network ID that has IP exhaustion would be helpful for the operator.

Revision history for this message
Drew Freiberger (afreiberger) wrote :

This may be something to be addressed by a grok-exporter or graylog feed into alertmanager.

James Page (james-page)
Changed in charm-octavia:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.