Index on maasserver_routable_pairs would improve performance

Bug #1702669 reported by Mark Shuttleworth on 2017-07-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
MAAS
High
Mike Pontillo
2.2
Medium
Mike Pontillo

Bug Description

I am watching postgres logs for slow queries and noticed a lot of these:

2017-07-06 12:21:08 BST [67989-13] maas@maasdb LOG: duration: 51.371 ms statement: SELECT left_node_id, left_ip,
        right_node_id, right_ip
   FROM maasserver_routable_pairs
  WHERE left_node_id IN (236)
    AND right_node_id IN (1)
  ORDER BY metric ASC

That's a very simple query on a single flat table, it should be fast. Adding an index on (left_node_id, right_node_id) would address that.

Related branches

Changed in maas:
importance: Undecided → High
status: New → Triaged
milestone: none → 2.3.0
tags: added: performance
Mark Shuttleworth (sabdfl) wrote :

Turns out maasserver_routable_pairs is a view not a table. Have we explored a materialized view for this? That would be indexable.

Mike Pontillo (mpontillo) wrote :

Can you quantify "a lot"? It's not clear to me from the bug description if optimizing this path is going to be worth the effort.

We might consider moving away from using this view altogether, in favor of more explicit ways to determine mutual routes between nodes and racks. I think switching to a materialized view would have a similar effort level, though.

That said, a quick win might be to optimize the underlying query the view uses. I've done a quick EXPLAIN ANALYZE of the view's query; we should take a closer look:

http://paste.ubuntu.com/25034508/

Mike Pontillo (mpontillo) wrote :

Mark, if you feel like running that EXPLAIN ANALYZE in your setup, it might provide better results. Mine was done in a development sandbox. (I'm waiting at the court to serve jury duty, so I don't have easy access to a real MAAS.) =)

Mike Pontillo (mpontillo) wrote :

I took a closer look at the EXPLAIN ANALYZE command. I think you can indeed speed up that query by adding an index.

In summary, I think if you add the following index, you'll get the most improvement:

    CREATE INDEX maasserver_staticipaddress__ip__family__idx ON maasserver_staticipaddress (family(ip));

---

Long explanation:

First, you can run EXPLAIN ANALYZE as follows on the view (no need to paste in the giant SQL snippet that describes the view):

    EXPLAIN ANALYZE SELECT * from maasserver_routable_pairs;

On my production MAAS at home, that query originally executed in ~10ms on the average. After adding that particular index, query execution time dropped to ~3.5ms.

I looked at the join conditions in the EXPLAIN ANALYZE output and tested adding the following indexes:

CREATE INDEX maasserver_staticipaddress__ip__family__idx ON maasserver_staticipaddress (family(ip));
CREATE INDEX maasserver_vlan__space_id__idx ON maasserver_vlan (space_id);
CREATE INDEX maasserver_staticipaddress__subnet_id__idx ON maasserver_staticipaddress (subnet_id);
CREATE INDEX maasserver_staticipaddress__ip__is_not_null__idx ON maasserver_staticipaddress (ip) where (ip IS NOT NULL);

Then I dropped the indexes, on the off chance that we decide to create them [with the same name] from MAAS code at a later date:

DROP INDEX maasserver_staticipaddress__ip__family__idx;
DROP INDEX maasserver_vlan__space_id__idx;
DROP INDEX maasserver_staticipaddress__subnet_id__idx;
DROP INDEX maasserver_staticipaddress__ip__is_not_null__idx;

All of my tests indicated that the index on "family(ip)" made the most significant impact.

The other two indexes may speed up the query as well, but the benefit seems marginal compared to indexing the address family of each IP address (which affects the decision about every row in that view, and is a derived value that must otherwise be computed every time).

Changed in maas:
assignee: nobody → Mike Pontillo (mpontillo)
status: Triaged → In Progress

Thanks Mike. Adding that index manually has definitely dropped the query
time. I was seeing the entry in the log every 2 minutes on average, even
with an idle MAAS.

What I am doing is logging postgres queries that take longer than a
threshold, currently 50ms. I will file bugs on those, then drop the
threshold to 40ms, and repeat.

My MAAS is small (20 nodes, 50 devices, 4 controllers) but I think as
part of the next cycle we should repeat this exercise using data from a
much larger MAAS.

Mark

Changed in maas:
status: In Progress → Fix Committed
Mike Pontillo (mpontillo) wrote :

This cannot be easily fixed in MAAS 2.2.x, since a migration has already landed for MAAS 2.3, and migrations must always happen in the same order.

Changed in maas:
milestone: 2.3.0 → 2.3.0alpha1
Changed in maas:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers