MC splits require cumbersome CQD involvement

Bug #1347930 reported by QF Chen
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Trafodion
Fix Committed
High
QF Chen

Bug Description

The following two CQD s are currently needed to make MC stats based split to work. They are cumbersome to apply at the best.

cqd cache_histograms 'off'; -- use during dml
cqd USTAT_COLLECT_MC_SKEW_VALUES 'on'; -- use during upstats

Tags: sql-cmp
Changed in trafodion:
milestone: none → r1.0
QF Chen (qifan-chen)
Changed in trafodion:
assignee: taoufik ben abdellatif (taoufik-abdellatif) → QF Chen (qifan-chen)
status: New → In Progress
Revision history for this message
QF Chen (qifan-chen) wrote :

There are multiple contributing factors.

1. MultiColumnHistogram class represents only a single-interval version of MC,
not a multi-interval version.

2. MultiColumnHistogram is the representation of the MC in histogram cache,
in HistogramsCacheEntry, as a list of this MCs (*multiColumn_);

3. When CACHE_HISTOGRAMS is on, the statement-level NATable::colStats_ (aka StatsList)
is fetched from the histogram cache (even when it is not there initially, after a fetch
from the disk). The fetch for MC stats is done in
HistogramsCacheEntry::getMCStatsForColFromCacheIntoList(), which unfortunately only
inserts one interval.

Revision history for this message
QF Chen (qifan-chen) wrote :

The fix is to cache the multiple interval MC stats in cached MC stats in histogram cache, and use the MC histogram in that form during optimization.

Revision history for this message
QF Chen (qifan-chen) wrote :

Plan with the fix.

>>log dml.log clear;
>>obey dml;
>>set schema hcube;

--- SQL operation complete.
>>
>>--cqd cache_histograms 'off';
>>
>>cqd HBASE_RANGE_PARTITIONING_MC_SPLIT 'on';

--- SQL operation complete.
>>cqd HBASE_MIN_BYTES_PER_ESP_PARTITION '10';

--- SQL operation complete.

>>prepare xx from select * from cube1 <<+ cardinality 10e8 >> where e > 3;

--- SQL command prepared.
>>
>>explain options 'f' xx;

LC RC OP OPERATOR OPT DESCRIPTION CARD
---- ---- ---- -------------------- -------- -------------------- ---------

2 . 3 root 1.00E+009
1 . 2 esp_exchange 1:5(range) 1.00E+009
. . 1 trafodion_scan CUBE1 1.00E+009

--- SQL operation complete.

Revision history for this message
Trafodion-Gerrit (neo-devtools) wrote : Fix proposed to core (master)

Fix proposed to branch: master
Review: https://review.trafodion.org/1028

Revision history for this message
Trafodion-Gerrit (neo-devtools) wrote : Fix merged to core (master)

Reviewed: https://review.trafodion.org/1028
Committed: https://github.com/trafodion/core/commit/4f43c199626ca5ceaee331457d9d64aacf772f69
Submitter: Trafodion Jenkins
Branch: master

commit 4f43c199626ca5ceaee331457d9d64aacf772f69
Author: qchen <email address hidden>
Date: Fri Jan 23 21:25:49 2015 +0000

    fix LP 1347930

    The main idea of the fix is as follows.
    1. MC stats for compilation usage becomes multi-interval version,
    for key columns in a HBase table
    2. MC stats stored in histogram cache now contains the
    multi-interval histogram.

    Change-Id: Ief2f0746d154b714a9b780cce55ff536406b8fa1

QF Chen (qifan-chen)
Changed in trafodion:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.