Problem with aggregates within subqueries

Bug #662980 reported by Lee Bieber
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Drizzle
Fix Released
High
Andrew Hutchings
7.0
Fix Released
High
Andrew Hutchings

Bug Description

The following drizzle-automation query gets different results when using MySQL vs Drizzle. This is currently a show stopper for using Drizzle as our database in drizzle-automation

% mysql -u drizzle --password=drizzle drizzle_stats < /tmp/file.sql
operation_name seconds avg_seconds
alter_table_add 12.00 14.71
alter_table_drop 12.00 14.65
connect 2.00 1.98
connect+select_1_row 3.00 2.57
connect+select_simple 2.00 2.23
count 13.00 15.94
count_distinct 5.00 6.51
count_distinct_2 12.00 13.07
count_distinct_big 21.00 21.81
count_distinct_group 11.00 12.55

% drizzle --port=9306 drizzle_stats < /tmp/file.sql
operation_name seconds avg_seconds
alter_table_add 12.00 10000.00
alter_table_drop 12.00 10000.00
connect 2.00 10000.00
connect+select_1_row 3.00 10000.00
connect+select_simple 2.00 10000.00
count 13.00 10000.00
count_distinct 5.00 10000.00
count_distinct_2 12.00 10000.00
count_distinct_big 21.00 10000.00
count_distinct_group 11.00 10000.00

% cat /tmp/file.sql
SELECT
   i.operation_name
, ROUND(AVG(i.seconds),2) AS seconds
, ROUND(agg.avg_seconds,2) AS avg_seconds
FROM bench_config c
NATURAL JOIN bench_runs r
NATURAL JOIN sqlbench_run_iterations i
INNER JOIN (
   SELECT
     iter.operation_name
   , AVG(seconds) as avg_seconds
   FROM bench_config conf
   NATURAL JOIN bench_runs runs
   NATURAL JOIN sqlbench_run_iterations iter
   WHERE conf.name = 'sqlbench'
   AND runs.server = 'drizzled'
   AND runs.version LIKE 'staging%' GROUP BY iter.operation_name
) AS agg
   ON i.operation_name= agg.operation_name WHERE r.run_id = 568
GROUP BY i.operation_name ORDER BY i.operation_name
limit 10

Related branches

Revision history for this message
Lee Bieber (kalebral-deactivatedaccount) wrote :

Adding data to reproduce the problem. The problem goes away if you try to reduce the data to any less than what is attached.

Changed in drizzle:
assignee: nobody → Andrew Hutchings (linuxjedi)
Revision history for this message
Andrew Hutchings (linuxjedi) wrote :

Appears to be sub-query problem, this spits out 9999.999999 with dozens of warnings:

select * from (SELECT iter.operation_name , AVG(seconds) as avg_seconds FROM bench_config conf NATURAL JOIN bench_runs runs NATURAL JOIN sqlbench_run_iterations iter WHERE conf.name = 'sqlbench' AND runs.server = 'drizzled' AND runs.version LIKE 'staging%' GROUP BY iter.operation_name) as agg;

Whereas this works:

SELECT iter.operation_name , AVG(seconds) as avg_seconds FROM bench_config conf NATURAL JOIN bench_runs runs NATURAL JOIN sqlbench_run_iterations iter WHERE conf.name = 'sqlbench' AND runs.server = 'drizzled' AND runs.version LIKE 'staging%' GROUP BY iter.operation_name;

Revision history for this message
Andrew Hutchings (linuxjedi) wrote :

Simplified further:

select * from (SELECT AVG(seconds) as avg_seconds FROM sqlbench_run_iterations iter ) as agg;

Revision history for this message
Andrew Hutchings (linuxjedi) wrote :

Simplified further:

drizzle> create table t2 (a decimal);

drizzle> insert into t2 (SELECT AVG(seconds) FROM stats.sqlbench_run_iterations iter );
ERROR 1265 (01000): Data truncated for column 'a' at row 6523

drizzle> create table t3 (a varchar(255));
Query OK, 0 rows affected (0.09 sec)

drizzle> insert into t3 (SELECT AVG(seconds) FROM stats.sqlbench_run_iterations iter );
Query OK, 1 row affected (3.7 sec)
Records: 1 Duplicates: 0 Warnings: 0

drizzle> select * from t3;
+--------------+
| a |
+--------------+
| 61.958301395 |
+--------------+
1 row in set (0 sec)

Something is going wrong with the decimal type output when using AVG().

Revision history for this message
Andrew Hutchings (linuxjedi) wrote :

OK, so. The problem is the default DECIMAL type creates a precision 10 scale 0 object. When the AVG result is passed to that decimal it is checked by my_decimal2binary (called by Field_decimal::store_value) to see if it fits. It doesn't fit so Field_decimal::store_value (claims to) spit out the maximum value that does fit (for some reason this is 10000 rounded to 9999.999999).

Revision history for this message
Andrew Hutchings (linuxjedi) wrote :

MySQL creates DECIMAL types in implicit temporary tables as prec 14 scale 4 (but defaults to 10,0 for normal tables). I think that is how we should handle it too.

Revision history for this message
Andrew Hutchings (linuxjedi) wrote :

ok, I was kinda wrong there, but kinda right. In the "select * from (SELECT AVG(seconds) as avg_seconds FROM sqlbench_run_iterations iter ) as agg;" example we end up with:

Breakpoint 1, drizzled::Field_decimal::store_value (this=0x7fffd00faad0,
    decimal_value=0x7fffe2ffbcc0) at drizzled/field/decimal.cc:127
127 int error= 0;
(gdb) print *decimal_value
$12 = {<drizzled::st_decimal_t> = {intg = 9, frac = 9, len = 9, sign = false,
    buf = 0x7fffe2ffbcd8}, buffer = {61, 958301395, 2, 3, 4, 5, 6, 7, 8}}
(gdb) print precision
$13 = 10
(gdb) print dec
$14 = 6 '\006'

This then causes truncation which converts it into the 10,6 value 9999.999999. It *looks* like if the source getting stored has too many decimal places we just truncate and spit out the max decimal.

Revision history for this message
Andrew Hutchings (linuxjedi) wrote :

problem is:
int Field::warn_if_overflow(int op_result)
returns a code on E_DEC_TRUNCATED. It should return 0 in this case.

Revision history for this message
Andrew Hutchings (linuxjedi) wrote :

Caused by fix of bug#337038

Changed in drizzle:
importance: Undecided → High
milestone: none → 2010-10-25
status: New → In Progress
Changed in drizzle:
status: In Progress → Fix Committed
Changed in drizzle:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.