Blazar

Bug #1838025
Comment #2

Comment 2 for bug 1838025

Revision history for this message

Jacob Colleran (jcolleran) wrote on 2019-09-10:

Had a chance now to look at this a little closer. We got some pretty big gains from breaking up the 3-table join into two queries with sqlaclhemy in Blazar. With the original code it was taking 40+ seconds for us to get a return. When I was testing this out with raw SQL, I was getting similar long 40+ second results. However, I noticed Tetsuro was specifying what ids to join the tables on, and when I test this out in raw sql it is more efficient then breaking up the queries into two. However, this is not what our code in Blazar is currently doing. I think any of the optimization updates need to add what fields to join on, example below:

    query = (session.query(models.Reservation, models.ComputeHostAllocation)
             .join(models.Lease, models.Lease.id == models.Reservation.lease_id)
             .join(models.ComputeHostAllocation,
                   models.ComputeHostAllocation.reservation_id == models.Reservation.id)
             .filter(models.ComputeHostAllocation.compute_host_id
                     .in_(host_ids))
             .filter(sa.and_(border0, border1)))

I'm not totally sure what the ORM is doing when we do not specify what fields to join the tables on, but it's so inefficient that I wouldn't be surprised if it were essentially running a cross join instead of an inner join. Below is a raw sql representation of what I believe sqlalchemy is actually running in blazar/db/sqlalchemy/utils/get_reservation_allocations_by_host_ids():

    SELECT reservations.*, computehost_allocations.* FROM reservations
    JOIN leases
    JOIN computehost_allocations
    WHERE computehost_allocations.compute_host_id IN (?,?)
    AND leases.end_date <= ?
    AND leases.start_date >= ?;

Run on Chameleon's UC baremetal cluster (155 nodes):

Two Queries: 0.050 seconds
example 1 Tetsuro: 0.024 seconds
example 2 Tetsuro: 0.047 seconds
original (raw above)*: 74.433 seconds

* this was run with a limit of 1000000 because I got impatient waiting for it to finish.

I'll leave comments in the patches Tetsuro opened up.

Two Queries:               0.050 seconds
example 1 Tetsuro:         0.024 seconds
example 2 Tetsuro:         0.047 seconds
original (raw above)*:    74.433 seconds

* this was run with a limit of 1000000 because I got impatient waiting for it to finish.

I'll leave comments in the patches Tetsuro opened up.