Regression in HPIT workload due NJ costing checkin
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Trafodion |
Fix Released
|
High
|
Ravisha Neelakanthappa |
Bug Description
From: Hall, Gary
We’ve experienced a very large regression in our BENCH_HPIT test set as of the 141209 build, and the “NJ costing changes” is suspect, particularly since the five queries in the test set that have increased in response time by severalfold are also showing plan changes that have switched to using nested joins as of this build. What can we do to help you with this item?
Here’s a copy of one of the queries that is experiencing nearly over triple the response times. I’ve attached the corresponding full explain plans.
SQL>explain options 'f' select /* HPIT Query 05 using template Q5 */ totals.Customer, totals.Prodline, totals.K_Dol K_Dollars, totals.Prior_K_Dol Prior_K_Dollars, totals.Total_K_Dol Total_K_Dollars from ( select cust.CUST_
OLD PLAN:
LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------
15 . 16 root 7.17E+003
14 . 15 esp_exchange 1:20(hash2) (m) 7.17E+003
13 . 14 sort 7.17E+003
12 . 13 hash_partial_
11 . 12 esp_exchange 20(hash2):20(hash2) 7.17E+003
10 . 11 hash_partial_
9 8 10 hybrid_hash_join 5.63E+005
. . 9 trafodion_scan HPIT_CUST_
7 2 8 hybrid_hash_join u 5.63E+005
6 4 7 hybrid_hash_join u 5.63E+005
5 . 6 esp_exchange 20(hash2):10(hash2) 2.25E+006
. . 5 trafodion_scan HPIT_TOT_
3 . 4 esp_exchange 20(rep-b):1 (m) 1.20E+001
. . 3 trafodion_scan HPIT_DT_MTH_D 1.20E+001
1 . 2 esp_exchange 20(rep-b):1 (m) 3.40E+002
. . 1 trafodion_scan HPIT_PROD_LN_D 3.40E+002
NEW PLAN:
LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------
15 . 16 root 7.17E+003
14 . 15 esp_exchange 1:20(hash2) (m) 7.17E+003
13 . 14 sort 7.17E+003
12 . 13 hash_partial_
11 . 12 esp_exchange 20(hash2):20(hash2) 7.17E+003
10 . 11 hash_partial_
8 9 10 nested_join 5.63E+005
. . 9 trafodion_scan HPIT_CUST_
7 2 8 hybrid_hash_join u 5.63E+005
6 4 7 hybrid_hash_join u 5.63E+005
5 . 6 esp_exchange 20(hash2):10(hash2) 2.25E+006
. . 5 trafodion_scan HPIT_TOT_
3 . 4 esp_exchange 20(rep-b):1 (m) 1.20E+001
. . 3 trafodion_scan HPIT_DT_MTH_D 1.20E+001
1 . 2 esp_exchange 20(rep-b):1 (m) 3.40E+002
. . 1 trafodion_scan HPIT_PROD_LN_D 3.40E+002
Changed in trafodion: | |
milestone: | r0.9 → r1.1 |
Five queries switched plan from HJ to NJ. When I looked the NJ plan, it looked like a good plan, but runtime showed we were doing more random IOs. Investigated why NJ is costed cheap, found two issues:
1. NJ is costed cheap because of discount given due to probechache benefit, but codeGen didn’t insert probeCache operator, so executor really didn’t cache NJ result. The query is a good candidate for probe cache because out of 563,167 probes, only 5561 are unique probes, remaining are all duplicate probes. I have added code to NJ:: isProbeCacheApp licable( ) to recognize it’s a unique access scan and enable probeCache.
2. Also noticed one bug which got introduced as a result of moving part of code from getEstNumActive PartitionsAtRun time() to PartitionsAtRun timeForHbaseReg ions(). There were other places still calling old method getEstNumActive Partitions( ), I have modified code to call the new method.
getEstNumActive
With this change, we still get NJ, but they perform as good as HJ because runtime we use probe cache.
Good to know that optimizer can produce good NJ plans for OLTP queries as well as HPIT queries.
Changes files: RelExpr. cpp ScanOptimizer. cpp SimpleScanOptim izer.cpp costmethod. cpp
# modified: sql/optimizer/
# modified: sql/optimizer/
# modified: sql/optimizer/
# modified: sql/optimizer/
Gary tested UTT and confirmed that new plans use probeCache operator and the regression has been recovered
From Gary: probe_cache” . For example, here is the plan for query 05:
Looking at the explain plans now…it did NOT revert back to hybrid_hash_join. Instead there is now an additional step seen in the explain plans…”
LC RC OP OPERATOR OPT DESCRIPTION CARD ------- ------ -------- ------- ------- ------ ---------
---- ---- ---- -------
16 . 17 root 7.17E+003 groupby 7.17E+003 groupby 7.17E+003 ACCT_HIER_ 1.00E+000 END_CUST_ SL 2.25E+006
15 . 16 esp_exchange 1:20(hash2) (m) 7.17E+003
14 . 15 sort 7.17E+003
13 . 14 hash_partial_
12 . 13 esp_exchange 20(hash2):20(hash2) 7.17E+003
11 . 12 hash_partial_
8 10 11 nested_join 5.63E+005
9 . 10 probe_cache 1.00E+000
. . 9 trafodion_scan HPIT_CUST_
7 2 8 hybrid_hash_join u 5.63E+005
6 4 7 hybrid_hash_join u 5.63E+005
5 . 6 esp_exchange 20(hash2):10(hash2) 2.25E+006
. . 5 trafodion_scan HPIT_TOT_
3 . 4 esp_exchange 20(rep-b):1 (m) 1.20E+001
. . 3 trafodion_scan HPIT_DT_MTH_D 1.20E+001
1 . 2 esp_exchange 20(rep-b):1 (m) 3.40E+002
. . 1 trafodion_scan HPIT_PROD_LN_D 3.40E+002
Comparing the average r...