Regression in HPIT workload due NJ costing checkin

Bug #1404271 reported by Ravisha Neelakanthappa
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
Fix Released
High
Ravisha Neelakanthappa

Bug Description

From: Hall, Gary

We’ve experienced a very large regression in our BENCH_HPIT test set as of the 141209 build, and the “NJ costing changes” is suspect, particularly since the five queries in the test set that have increased in response time by severalfold are also showing plan changes that have switched to using nested joins as of this build. What can we do to help you with this item?

Here’s a copy of one of the queries that is experiencing nearly over triple the response times. I’ve attached the corresponding full explain plans.

SQL>explain options 'f' select /* HPIT Query 05 using template Q5 */ totals.Customer, totals.Prodline, totals.K_Dol K_Dollars, totals.Prior_K_Dol Prior_K_Dollars, totals.Total_K_Dol Total_K_Dollars from ( select cust.CUST_AMID_LVL_4_NM as Customer, prodln.PROD_LN_ID as Prodline, SUM(case when dmth.FISC_YR_QTR_CD = '2007Q2' then SLS_NET_US_DLR_AM/1000 else 0 end) K_Dol, SUM(case when dmth.FISC_YR_QTR_CD = '2007Q1' then SLS_NET_US_DLR_AM/1000 else 0 end) Prior_K_Dol, SUM(case when dmth.FISC_YR_NM = 'FY2007' then SLS_NET_US_DLR_AM/1000 else 0 end) Total_K_Dol from HPIT_TOT_END_CUST_SLS_F fact, HPIT_DT_MTH_D dmth, HPIT_CUST_ACCT_HIER_D cust, HPIT_PROD_LN_D prodln where dmth.DT_MTH_KY = fact.DT_MTH_KY AND fact.END_CUST_ACCT_HIER_KY = cust.CUST_ACCT_HIER_KY AND fact.PROD_LN_KY = prodln.PROD_LN_KY AND dmth.fisc_yr_nm = 'FY2007' GROUP BY cust.CUST_AMID_LVL_4_NM, prodln.PROD_LN_ID ) totals ORDER BY Total_K_Dol desc;

OLD PLAN:

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

15 . 16 root 7.17E+003
14 . 15 esp_exchange 1:20(hash2) (m) 7.17E+003
13 . 14 sort 7.17E+003
12 . 13 hash_partial_groupby 7.17E+003
11 . 12 esp_exchange 20(hash2):20(hash2) 7.17E+003
10 . 11 hash_partial_groupby 7.17E+003
9 8 10 hybrid_hash_join 5.63E+005
. . 9 trafodion_scan HPIT_CUST_ACCT_HIER_ 1.85E+006
7 2 8 hybrid_hash_join u 5.63E+005
6 4 7 hybrid_hash_join u 5.63E+005
5 . 6 esp_exchange 20(hash2):10(hash2) 2.25E+006
. . 5 trafodion_scan HPIT_TOT_END_CUST_SL 2.25E+006
3 . 4 esp_exchange 20(rep-b):1 (m) 1.20E+001
. . 3 trafodion_scan HPIT_DT_MTH_D 1.20E+001
1 . 2 esp_exchange 20(rep-b):1 (m) 3.40E+002
. . 1 trafodion_scan HPIT_PROD_LN_D 3.40E+002

NEW PLAN:

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

15 . 16 root 7.17E+003
14 . 15 esp_exchange 1:20(hash2) (m) 7.17E+003
13 . 14 sort 7.17E+003
12 . 13 hash_partial_groupby 7.17E+003
11 . 12 esp_exchange 20(hash2):20(hash2) 7.17E+003
10 . 11 hash_partial_groupby 7.17E+003
8 9 10 nested_join 5.63E+005
. . 9 trafodion_scan HPIT_CUST_ACCT_HIER_ 1.00E+000
7 2 8 hybrid_hash_join u 5.63E+005
6 4 7 hybrid_hash_join u 5.63E+005
5 . 6 esp_exchange 20(hash2):10(hash2) 2.25E+006
. . 5 trafodion_scan HPIT_TOT_END_CUST_SL 2.25E+006
3 . 4 esp_exchange 20(rep-b):1 (m) 1.20E+001
. . 3 trafodion_scan HPIT_DT_MTH_D 1.20E+001
1 . 2 esp_exchange 20(rep-b):1 (m) 3.40E+002
. . 1 trafodion_scan HPIT_PROD_LN_D 3.40E+002

Tags: sql-cmp
Revision history for this message
Ravisha Neelakanthappa (ravisha-neelakanthappa) wrote :
Download full text (4.2 KiB)

Five queries switched plan from HJ to NJ. When I looked the NJ plan, it looked like a good plan, but runtime showed we were doing more random IOs. Investigated why NJ is costed cheap, found two issues:

1. NJ is costed cheap because of discount given due to probechache benefit, but codeGen didn’t insert probeCache operator, so executor really didn’t cache NJ result. The query is a good candidate for probe cache because out of 563,167 probes, only 5561 are unique probes, remaining are all duplicate probes. I have added code to NJ:: isProbeCacheApplicable() to recognize it’s a unique access scan and enable probeCache.

2. Also noticed one bug which got introduced as a result of moving part of code from getEstNumActivePartitionsAtRuntime() to
getEstNumActivePartitionsAtRuntimeForHbaseRegions(). There were other places still calling old method getEstNumActivePartitions(), I have modified code to call the new method.

With this change, we still get NJ, but they perform as good as HJ because runtime we use probe cache.

Good to know that optimizer can produce good NJ plans for OLTP queries as well as HPIT queries.

Changes files:
# modified: sql/optimizer/RelExpr.cpp
# modified: sql/optimizer/ScanOptimizer.cpp
# modified: sql/optimizer/SimpleScanOptimizer.cpp
# modified: sql/optimizer/costmethod.cpp

Gary tested UTT and confirmed that new plans use probeCache operator and the regression has been recovered

From Gary:
Looking at the explain plans now…it did NOT revert back to hybrid_hash_join. Instead there is now an additional step seen in the explain plans…”probe_cache”. For example, here is the plan for query 05:

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

16 . 17 root 7.17E+003
15 . 16 esp_exchange 1:20(hash2) (m) 7.17E+003
14 . 15 sort 7.17E+003
13 . 14 hash_partial_groupby 7.17E+003
12 . 13 esp_exchange 20(hash2):20(hash2) 7.17E+003
11 . 12 hash_partial_groupby 7.17E+003
8 10 11 nested_join 5.63E+005
9 . 10 probe_cache 1.00E+000
. . 9 trafodion_scan HPIT_CUST_ACCT_HIER_ 1.00E+000
7 2 8 hybrid_hash_join u 5.63E+005
6 4 7 hybrid_hash_join u 5.63E+005
5 . 6 esp_exchange 20(hash2):10(hash2) 2.25E+006
. . 5 trafodion_scan HPIT_TOT_END_CUST_SL 2.25E+006
3 . 4 esp_exchange 20(rep-b):1 (m) 1.20E+001
. . 3 trafodion_scan HPIT_DT_MTH_D 1.20E+001
1 . 2 esp_exchange 20(rep-b):1 (m) 3.40E+002
. . 1 trafodion_scan HPIT_PROD_LN_D 3.40E+002

Comparing the average r...

Read more...

Changed in trafodion:
status: New → In Progress
Changed in trafodion:
milestone: r0.9 → r1.1
Revision history for this message
Ravisha Neelakanthappa (ravisha-neelakanthappa) wrote :

This bug has already been fixed in Traf1.0 release.
From: Trafodion Jenkins (Code Review) [mailto:<email address hidden>]
Sent: Monday, December 22, 2014 1:55 PM
To: Neelakanthappa, Ravisha
Subject: Change in trafodion/core[master]: fix for lp 1404271

Trafodion Jenkins has submitted this change and it was merged.

Change subject: fix for lp 1404271
......................................................................

Changed in trafodion:
status: In Progress → Fix Committed
status: Fix Committed → Fix Released
milestone: r1.1 → r1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.