Comment 4 for bug 1400376

Revision history for this message
Ben Shum (bshum) wrote :

Looking back at old bug 1374091 and working through some EXPLAIN ANALYZE we used at the time, this is what I saw in our database when trying with LEFT JOIN vs. (INNER) JOIN:

=== Query ===

EXPLAIN ANALYZE SELECT XMLAGG(foo.y)
          FROM (
            SELECT XMLELEMENT(
                        name field,
                        XMLATTRIBUTES(
                            mra.attr AS name,
                            cvm.value AS "coded-value",
                            cvm.id AS "cvmid",
                            rad.composite,
                            rad.multi,
                            rad.filter,
                            rad.sorter
                        ),
                        mra.value
                    )
              FROM metabib.record_attr_flat mra
                    JOIN config.record_attr_definition rad ON (mra.attr = rad.name)
                    LEFT JOIN config.coded_value_map cvm ON (cvm.ctype = mra.attr AND code = mra.value)
              WHERE mra.id = 93978
            )foo(y);

=== With LEFT JOIN ===

Aggregate (cost=65.30..65.31 rows=1 width=83) (actual time=0.024..0.025 rows=1 loops=1)
  -> Hash Right Join (cost=45.54..65.25 rows=20 width=83) (actual time=0.024..0.024 rows=0 loops=1)
        Hash Cond: ((cvm.ctype = mra.attr) AND (cvm.code = mra.value))
        -> Seq Scan on coded_value_map cvm (cost=0.00..14.26 rows=726 width=29) (never executed)
        -> Hash (cost=45.24..45.24 rows=20 width=68) (actual time=0.018..0.018 rows=0 loops=1)
              Buckets: 1024 Batches: 1 Memory Usage: 0kB
              -> Hash Join (cost=43.51..45.24 rows=20 width=68) (actual time=0.018..0.018 rows=0 loops=1)
                    Hash Cond: (rad.name = mra.attr)
                    -> Seq Scan on record_attr_definition rad (cost=0.00..1.38 rows=38 width=12) (actual time=0.004..0.004 rows=1 loops=1)
                    -> Hash (cost=43.26..43.26 rows=20 width=64) (actual time=0.010..0.010 rows=0 loops=1)
                          Buckets: 1024 Batches: 1 Memory Usage: 0kB
                          -> Subquery Scan on mra (cost=42.86..43.26 rows=20 width=64) (actual time=0.010..0.010 rows=0 loops=1)
                                -> HashAggregate (cost=42.86..43.06 rows=20 width=24) (actual time=0.010..0.010 rows=0 loops=1)
                                      -> Append (cost=0.86..42.71 rows=20 width=24) (actual time=0.010..0.010 rows=0 loops=1)
                                            -> Nested Loop Left Join (cost=0.86..24.57 rows=10 width=27) (actual time=0.008..0.008 rows=0 loops=1)
                                                  -> Index Scan using record_attr_vector_list_pkey on record_attr_vector_list v (cost=0.43..3.45 rows=1 width=90) (actual time=0.008..0.008 rows=0 loops=1)
                                                        Index Cond: (source = 93978)
                                                  -> Index Scan using uncontrolled_record_attr_value_pkey on uncontrolled_record_attr_value m (cost=0.43..21.02 rows=10 width=27) (never executed)
                                                        Index Cond: (id = ANY (v.vlist))
                                            -> Nested Loop Left Join (cost=0.71..17.95 rows=10 width=22) (actual time=0.002..0.002 rows=0 loops=1)
                                                  -> Index Scan using record_attr_vector_list_pkey on record_attr_vector_list v_1 (cost=0.43..3.45 rows=1 width=90) (actual time=0.002..0.002 rows=0 loops=1)
                                                        Index Cond: (source = 93978)
                                                  -> Index Scan using coded_value_map_pkey on coded_value_map c (cost=0.29..14.40 rows=10 width=18) (never executed)
                                                        Index Cond: (id = ANY (v_1.vlist))
Total runtime: 0.110 ms

=== With INNER JOIN ===

Aggregate (cost=65.30..65.31 rows=1 width=83) (actual time=0.058..0.058 rows=1 loops=1)
  -> Hash Right Join (cost=45.54..65.25 rows=20 width=83) (actual time=0.056..0.056 rows=0 loops=1)
        Hash Cond: ((cvm.ctype = mra.attr) AND (cvm.code = mra.value))
        -> Seq Scan on coded_value_map cvm (cost=0.00..14.26 rows=726 width=29) (never executed)
        -> Hash (cost=45.24..45.24 rows=20 width=68) (actual time=0.046..0.046 rows=0 loops=1)
              Buckets: 1024 Batches: 1 Memory Usage: 0kB
              -> Hash Join (cost=43.51..45.24 rows=20 width=68) (actual time=0.046..0.046 rows=0 loops=1)
                    Hash Cond: (rad.name = mra.attr)
                    -> Seq Scan on record_attr_definition rad (cost=0.00..1.38 rows=38 width=12) (actual time=0.008..0.008 rows=1 loops=1)
                    -> Hash (cost=43.26..43.26 rows=20 width=64) (actual time=0.023..0.023 rows=0 loops=1)
                          Buckets: 1024 Batches: 1 Memory Usage: 0kB
                          -> Subquery Scan on mra (cost=42.86..43.26 rows=20 width=64) (actual time=0.023..0.023 rows=0 loops=1)
                                -> HashAggregate (cost=42.86..43.06 rows=20 width=24) (actual time=0.023..0.023 rows=0 loops=1)
                                      -> Append (cost=0.86..42.71 rows=20 width=24) (actual time=0.022..0.022 rows=0 loops=1)
                                            -> Nested Loop (cost=0.86..24.57 rows=10 width=27) (actual time=0.020..0.020 rows=0 loops=1)
                                                  -> Index Scan using record_attr_vector_list_pkey on record_attr_vector_list v (cost=0.43..3.45 rows=1 width=90) (actual time=0.020..0.020 rows=0 loops=1)
                                                        Index Cond: (source = 93978)
                                                  -> Index Scan using uncontrolled_record_attr_value_pkey on uncontrolled_record_attr_value m (cost=0.43..21.02 rows=10 width=27) (never executed)
                                                        Index Cond: (id = ANY (v.vlist))
                                            -> Nested Loop (cost=0.71..17.95 rows=10 width=22) (actual time=0.002..0.002 rows=0 loops=1)
                                                  -> Index Scan using record_attr_vector_list_pkey on record_attr_vector_list v_1 (cost=0.43..3.45 rows=1 width=90) (actual time=0.002..0.002 rows=0 loops=1)
                                                        Index Cond: (source = 93978)
                                                  -> Index Scan using coded_value_map_pkey on coded_value_map c (cost=0.29..14.40 rows=10 width=18) (never executed)
                                                        Index Cond: (id = ANY (v_1.vlist))
Total runtime: 0.191 ms

The overall shape of the two explains looked the same to me, other than the "Nested Loop Left Join" vs. "Nested Loop" and otherwise the runtime was comparable and not significantly affected. I dropped the change to JOIN into our database and have been conducting tests on our database and so far, no significant loss of speed either for searching, etc.

It's possible that this may affect other areas of Evergreen outside the catalog, so I'd like to see further tests by others on the potential differences, but otherwise, my current conclusion is that tsbere's fix does not break anything for our PostgreSQL 9.3 production dataset. My inclination is to pick both his change and Mike's to attack the problem on both fronts. Will make a final decision before the next maintenance releases.