Percona Server moved to https://jira.percona.com/projects/PS

Bug #1671152
Comment #11

Comment 11 for bug 1671152

Revision history for this message

Rick Pizzi (pizzi) wrote on 2017-03-10: Re: [Bug 1671152] Re: tokudb does not use index even if cardinality is good

#11

Please see answers inline.

On 10/mar/2017, at 21:48, George Ormond Lorch III <email address hidden> wrote:

> Ahh, OK, so you are running to 100K rows, then _continuing_ to run, so
> 100K is roughly the max and you are just then continuing to work within
> that range of keys. That means more deletes or actual hits on existing
> rows. My bash does the same, originally was limited to 32K rows but I
> changed that and went to 3 million rows.

Correct, this is to simulate our workload when we are bitten by the bug.
We have more than 100k possible keys in that workload of course, I have restricted
the distribution amplitude to reproduce the problem quickly.

I would recommend to try using sysbench (we use 0.5.1) with the LUA code I posted.

>
> The logical row estimate with messages in flight will never be accurate,
> it can not be within a deferred work system without performing an actual
> recount each time it is asked for (a table scan).

Well, from the dumps I have done with the tokuft_dump utility, the logical count was always
accurate for some reason. Just see the samples I posted above in this bug report…. the estimate
was way off, but the logical row count was always approximately 100,000. Maybe a coincidence?

> So what is being
> returned now from ::records_in_range when queried for -+infinity and
> ::info is the logical row estimate for the PK. Prior to this fix, ::info
> was returning the logical row estimate for the PK and ::records_in_range
> -_infinity was returning the logical row estimate for that particular
> key, which could be very different from the PK.

I understood that and I thought that would have solved the issue, but unfortunately it didn’t,
although things are a bit better than before in our tests.

>
> There may be some other issue that I am trying to get in to if
> ::records_in_range is being provided actual key range values, this goes
> deep into the fractal tree and requires debugging as this is a code area
> that I have never been into before. It is entirely possible that this
> code is just dropping down to leaf nodes and making some wild
> assumptions about what the distance is between two keys without taking
> into account what messages are above it in the tree and is returning
> some insane number. If that is the case, it is possible that there is
> not an easy or practical fix that doesn't impact the performance of
> ::records_in_range and returns reasonably accurate estimates for all
> workloads and cases of the tree state. Meaning the 'fix' is to keep the
> trees optimized, which has been a historic problem with TokuDB and
> PerconaFT and is precisely what drove the switch from physical counts to
> logical counts as well as automatic table analysis.

I understand this may not be an easy thing to fix, nonetheless this is an important bug
and can prevent use of Toku in specific workloads where there are many deletions.
I have been trying to run OPTIMIZE TABLE (throttled) during the benchmark but it
seems to be taking too much time and the bug hits nonetheless.
I will have to try with more aggressive values (less throttling) hoping that it will not
create FT locking issues. But even if I succeed, this would be a workaround and not
a solution.

>
> Either way, I can't debug until I actually reproduce the issue and have
> a known key/query that I can target once I reach that state.

Try what I have suggested - run sysbench and the two line LUA code.
Very easy to reproduce the issue.

Thanks
Rick

Please see answers inline.

On 10/mar/2017, at 21:48, George Ormond Lorch III <george.lorch@percona.com> wrote:

Correct, this is to simulate our workload  when we are bitten by the bug.
We have more than 100k possible keys in that workload of course, I have restricted 
the distribution amplitude to reproduce the problem quickly.

I would recommend to try using sysbench (we use 0.5.1) with the LUA code I posted.

> 
> The logical row estimate with messages in flight will never be accurate,
> it can not be within a deferred work system without performing an actual
> recount each time it is asked for (a table scan).

I understood that and I thought that would have solved the issue, but unfortunately it didn’t, 
although things are a bit better than before in our tests.

> 
> There may be some other issue that I am trying to get in to if
> ::records_in_range is being provided actual key range values, this goes
> deep into the fractal tree and requires debugging as this is a code area
> that I have never been into before. It is entirely possible that this
> code is just dropping down to leaf nodes and making some wild
> assumptions about what the distance is between two keys without taking
> into account what messages are above it in the tree and is returning
> some insane number. If that is the case, it is possible that there is
> not an easy or practical fix that doesn't impact the performance of
> ::records_in_range and returns reasonably accurate estimates for all
> workloads and cases of the tree state. Meaning the 'fix' is to keep the
> trees optimized, which has been a historic problem with TokuDB and
> PerconaFT and is precisely what drove the switch from physical counts to
> logical counts as well as automatic table analysis.

I understand this may not be an easy thing to fix, nonetheless this is an important bug
and can prevent use of Toku in specific workloads where there are many deletions.
I have been trying to run OPTIMIZE TABLE (throttled) during the benchmark but it
seems to be taking too much time and the bug hits nonetheless.  
I will have to try with more aggressive values (less throttling) hoping that it will not
create FT locking issues. But even if I succeed, this would be a workaround and not
a solution.

> 
> Either way, I can't debug until I actually reproduce the issue and have
> a known key/query that I can target once I reach that state.

Try what I have suggested - run sysbench and the two line LUA code.
Very easy to reproduce the issue.

Thanks
Rick