Incorrect Small Side and Large Side estimate in HashJoin
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
InfiniDB Community |
Fix Released
|
Medium
|
Unassigned |
Bug Description
The Schema is:
CREATE TABLE dim_a (
level_A1 varchar(8)
) engine=infinidb; -- 5rows
CREATE TABLE fact (
Dim_A varchar(8) ,
num1 int
) engine=infinidb; --23000000 rows
The Query is:
SELECT
dim_a.level_A1 as level_A1 ,
fact.dim_a as dim_a ,
SUM(fact.num1) as num1
FROM
dim_a dim_a,
fact fact
WHERE
fact.dim_a = dim_a.level_A1
GROUP BY
dim_a.level_A1,
fact.dim_a
The Trace information of ExeMgr is:
Locale is : C
Starting ExeMgr: st = 50, sq = 100, qs = 20, cf = /usr/local/
### UM wants tuples
### For session id 2, got a CSEP
Stack:
doProject Emit pCol/pGetSignature for SimpleColumn
doProject Emit pCol/pGetSignature for SimpleColumn
doProject Emit pCol for SimpleColumn 3005
query steps:
N7joblist12pCol
N7joblist12pCol
N7joblist17Tupl
project steps:
N7joblist8pColS
N7joblist15pDic
N7joblist8pColS
N7joblist15pDic
N7joblist8pColS
delivery steps:
N7joblist12Deli
N7joblist12Deli
TupleInfoMap: (tupleKey oid tableName)
0 3000 dim_a
1 3003 fact
6 3000 $exp
TupleKey vector: (tupleKey oid name alias)
0 3000 dim_a dim_a
1 3003 fact fact
2 3004 Dim_A fact
3 3001 level_A1 dim_a
4 3006 Dim_A[d] fact
5 3002 level_A1[d] dim_a
6 3000 $exp $exp
7 0 num1 $exp
8 3005 num1 fact
Table Connection:
0 : 1
1 : 0
1 setting project BPP for TupleBPS ses:2 txn:0 ver:2 st:0 tb/col:3000/3001 alias:dim_a not del (0x1d5423a0[19]) nf:0 in:
BatchPrimitiv
-- scan driven
1 filter steps:
ColumnCom
0 projection steps:
with PassThruStep ses:2 txn:0 ver:2 st:0 tb/col:3000/3001 alias:dim_a (0x1d558d60[19]) in:3001 (0x1d558d60[19]), and pDictionaryStep ses:2 txn:0 ver:2 st:0 tb/col:3000/3002 (0x1d558d60[19]) in:3001 (0x1d558d60[19]),
RowGroup for 0 : columncount = 1
oids: 3002
keys: 5
offsets: 2 11
types: 12
scales: 0
precisions: 10
1 setting project BPP for TupleBPS ses:2 txn:0 ver:2 st:0 tb/col:3003/3004 alias:fact not del (0x1d542090[19]) nf:0 in:
BatchPrimitiv
-- scan driven
1 filter steps:
ColumnCom
0 projection steps:
with PassThruStep ses:2 txn:0 ver:2 st:0 tb/col:3003/3004 alias:fact (0x1d584840[19]) in:3004 (0x1d584840[19]), and pDictionaryStep ses:2 txn:0 ver:2 st:0 tb/col:3003/3006 (0x1d584840[19]) in:3004 (0x1d584840[19]),
2 setting project BPP for TupleBPS ses:2 txn:0 ver:2 st:0 tb/col:3003/3004 alias:fact not del (0x1d542090[19]) nf:0 in:
BatchPrimitiv
-- scan driven
1 filter steps:
ColumnCom
1 projection steps:
RTSCommandJL: oid=3006
DictStepJL
with pColStep ses:2 txn:0 ver:2 st:0 tb/col:3003/3005 alias:fact (0x1d55b160[19]) nf:0 in: and NULL
RowGroup for 1 : columncount = 2
oids: 3006 3005
keys: 4 8
offsets: 2 11 15
types: 12 6
scales: 0 0
precisions: 10 10
[0;1m
====== join info ======
[0;39mfact-
smallSideIndex-
small side RG
columncount = 2
oids: 3006 3005
keys: 4 8
offsets: 2 11 15
types: 12 6
scales: 0 0
precisions: 10 10
large side RG
columncount = 1
oids: 3002
keys: 5
offsets: 2 11
types: 12
scales: 0
precisions: 10
RowGroup join result:
columncount = 3
oids: 3002 3006 3005
keys: 5 4 8
offsets: 2 11 20 24
types: 12 12 6
scales: 0 0 0
precisions: 10 10 10
Output RowGroup 0: columncount = 3
oids: 3002 3006 3005
keys: 5 4 8
offsets: 2 11 20 24
types: 12 12 6
scales: 0 0 0
precisions: 10 10 10
Output RowGroup 1: columncount = 3
oids: 3002 3006 3005
keys: 5 4 8
offsets: 2 11 20 24
types: 12 12 6
scales: 0 0 0
precisions: 10 10 10
====== Aggregation RowGroups ======
projected RG: columncount = 3
oids: 3002 3006 3005
keys: 5 4 8
offsets: 2 11 20 24
types: 12 12 6
scales: 0 0 0
precisions: 10 10 10
aggregated RG: columncount = 3
oids: 3002 3006 3005
keys: 5 4 8
offsets: 2 11 20 28
types: 12 12 9
scales: 0 0 0
precisions: 10 10 19
====== Aggregation RowGroups ======
projected RG: columncount = 3
oids: 3002 3006 3005
keys: 5 4 8
offsets: 2 11 20 24
types: 12 12 6
scales: 0 0 0
precisions: 10 10 10
aggregated1 RG: columncount = 3
oids: 3002 3006 3005
keys: 5 4 8
offsets: 2 11 20 28
types: 12 12 9
scales: 0 0 0
precisions: 10 10 19
aggregated2 RG: columncount = 3
oids: 3002 3006 3005
keys: 5 4 8
offsets: 2 11 20 28
types: 12 12 9
scales: 0 0 0
precisions: 10 10 19
delivered RG: columncount = 3
oids: 3002 3006 0
keys: 5 4 7
offsets: 2 11 20 28
types: 12 12 9
scales: 0 0 0
precisions: 10 10 19
Delivered RowGroup: columncount = 3
oids: 3002 3006 0
keys: 5 4 7
offsets: 2 11 20 28
types: 12 12 9
scales: 0 0 0
precisions: 10 10 19
job parms:
maxBuckets = 128, maxElems = 131072, flushInterval = 16384, fifoSize = 32, ScanLimit/Threshold = 10000/5000
job filter steps:
TupleBPS ses:2 txn:0 ver:2 st:0 tb/col:3003/3004 alias:fact not del (0x1d58f1f0[19]) nf:0 in:
BatchPrimitiv
-- scan driven
1 filter steps:
ColumnCom
2 projection steps:
RTSCommandJL: oid=3006
DictStepJL
ColumnCom
TupleBPS ses:2 txn:0 ver:2 st:1 tb/col:3000/3001 alias:dim_a not del (0x1d555600[19]) nf:0 in:
BatchPrimitiv
-- scan driven
1 filter steps:
ColumnCom
1 projection steps:
RTSCommandJL: oid=3002
DictStepJL
TupleHashJoinStep ses:2 st:2 in tb/col:3003/1 (0x1d58f1f0[19]) in *tb/col:3003/0 (0x1d555600[19])
out tb/col:3003/100 (0x1d5571c0[19])
AggregateStep ses:2 txn:0 st:3 in:(0x1d5571c0[19])
job project steps:
job delivery steps:
AggregateStep ses:2 txn:0 st:3 in:(0x1d5571c0[19])
runtime updates: start at Tue Feb 15 11:25:44 2011
### UM wanted tuples and we'll do our best;-)
Session: 2; delaying start of query step 1; waitStepCount-1
### For session id 2, got a command = 100
UM join (0) join too big
HJ stopping... status is 2001
Join is too big, raise the UM join limit for now
ses:2 st: 0 finished at 2011-02-15 11:26:12.692463; PhyI/O-8193; CacheI/O-7295; MsgsSent-4; MsgsRvcd-2048; BlocksTouched-7279; BlockedFifoIn/
PartitionBlock
1st read 2011-02-15 11:25:44.808160; EOI 2011-02-15 11:26:12.691700; runtime-27.883540s
Job completion status 2001
ses:2 st: 1 finished at 2011-02-15 11:26:12.712262; PhyI/O-0; CacheI/O-0; MsgsSent-0; MsgsRvcd-0; BlocksTouched-0; BlockedFifoIn/
PartitionBlock
1st read 2011-02-15 11:26:12.711731; EOI 2011-02-15 11:26:12.711733; runtime-0.000002s
Job completion status 2001
ses:2 st: 3 finished at Tue Feb 15 11:26:12 2011; total rows returned-0
1st read 1970-01-01 08:00:00.000000; EOI 2011-02-15 11:26:12.725778; runtime-
Job completion status 2001
### For session id 2, got a command = 3
### For session id 2, got a command = 0
ses:2 Query Totals: MaxMemPct-31; NumTempFiles-0; TempFileSpace-0MB; ApproxPhyI/O-8193; CacheI/O-7295; BlocksTouched-7279
Partition
For session 2: 67 bytes sent back at Tue Feb 15 11:26:13 2011
-------
In the "Join Info", You can see that ExeMgr treats fact table as the small side in HashJoin.
OS: CentOS 5.5 x86_64
INFINIDB: 2.0.3 CE