Had a chance now to look at this a little closer. We got some pretty big gains from breaking up the 3-table join into two queries with sqlaclhemy in Blazar. With the original code it was taking 40+ seconds for us to get a return. When I was testing this out with raw SQL, I was getting similar long 40+ second results. However, I noticed Tetsuro was specifying what ids to join the tables on, and when I test this out in raw sql it is more efficient then breaking up the queries into two. However, this is not what our code in Blazar is currently doing. I think any of the optimization updates need to add what fields to join on, example below:
I'm not totally sure what the ORM is doing when we do not specify what fields to join the tables on, but it's so inefficient that I wouldn't be surprised if it were essentially running a cross join instead of an inner join. Below is a raw sql representation of what I believe sqlalchemy is actually running in blazar/db/sqlalchemy/utils/get_reservation_allocations_by_host_ids():
SELECT reservations.*, computehost_allocations.* FROM reservations
JOIN leases
JOIN computehost_allocations
WHERE computehost_allocations.compute_host_id IN (?,?)
AND leases.end_date <= ?
AND leases.start_date >= ?;
Run on Chameleon's UC baremetal cluster (155 nodes):
Two Queries: 0.050 seconds
example 1 Tetsuro: 0.024 seconds
example 2 Tetsuro: 0.047 seconds
original (raw above)*: 74.433 seconds
* this was run with a limit of 1000000 because I got impatient waiting for it to finish.
I'll leave comments in the patches Tetsuro opened up.
Had a chance now to look at this a little closer. We got some pretty big gains from breaking up the 3-table join into two queries with sqlaclhemy in Blazar. With the original code it was taking 40+ seconds for us to get a return. When I was testing this out with raw SQL, I was getting similar long 40+ second results. However, I noticed Tetsuro was specifying what ids to join the tables on, and when I test this out in raw sql it is more efficient then breaking up the queries into two. However, this is not what our code in Blazar is currently doing. I think any of the optimization updates need to add what fields to join on, example below:
query = (session. query(models. Reservation, models. ComputeHostAllo cation)
. join(models. Lease, models.Lease.id == models. Reservation. lease_id)
. join(models. ComputeHostAllo cation,
models. ComputeHostAllo cation. reservation_ id == models. Reservation. id)
. filter( models. ComputeHostAllo cation. compute_ host_id
.in_(host_ ids))
. filter( sa.and_ (border0, border1)))
I'm not totally sure what the ORM is doing when we do not specify what fields to join the tables on, but it's so inefficient that I wouldn't be surprised if it were essentially running a cross join instead of an inner join. Below is a raw sql representation of what I believe sqlalchemy is actually running in blazar/ db/sqlalchemy/ utils/get_ reservation_ allocations_ by_host_ ids():
SELECT reservations.*, computehost_ allocations. * FROM reservations allocations allocations. compute_ host_id IN (?,?)
JOIN leases
JOIN computehost_
WHERE computehost_
AND leases.end_date <= ?
AND leases.start_date >= ?;
Run on Chameleon's UC baremetal cluster (155 nodes):
Two Queries: 0.050 seconds
example 1 Tetsuro: 0.024 seconds
example 2 Tetsuro: 0.047 seconds
original (raw above)*: 74.433 seconds
* this was run with a limit of 1000000 because I got impatient waiting for it to finish.
I'll leave comments in the patches Tetsuro opened up.