> Now that the internals lesson is over, there are a couple of things here that I > think can be done. > - First, I think that since the FT logic has narrowed us down to a basement node > where we know that there is some cross section of the start key, it might make > sense when the basement is not in memory to compare the end key to the start > key for equality, and if so, just return 1 match, else go ahead and return the > phony estimate. This will not fix the same issue for narrow range scans, but > it will fix point operations such as the point deletion in this example. > - Second, it an be argued that this optimization is incorrect and that PerconaFT > should bring at least one of, if not all of the needed basements into memory > in order to obtain a more accurate Then this is no longer an estimate, it is a > real count. If the optimizer then chooses not to use this index, we just did > possibly a whole lot of read I/O for nothing. This will add to the TokuDB > over-read woes. Think of the case of maybe a table with many indices and > someone does a "SELECT * from table WHERE id between reallysmall AND > reallylarge". It is possible that the optimizer would call > ::records_in_range for all matching indices with this huge range, scanning > several indices, then just resorting to a table scan anyway. So you now have > an index scan for each matching index, just to test the index, then whatever > scan the optimizer chooses and the final fetch. So a lot of potential for > over-read. > > I think implementing the first idea would be nearly a no-op in terms of chances > of breaking something. Going into the second is a possible rat hole of breaking > established performance characteristics. Thank you, the “internal lesson” was a very interesting reading. I believe fixing the point deletions is a good idea and much needed, it will solve the problem in the current test case. Not sure what would happen when we add a condition on EXPIRE_DATE (which we actually do for partition pruning purposes), but considering that MySQL will at most use one index, perhaps it will just work fine as HASH_ID_IX will be selected by optimizer, and the other condition only used for pruning. I see you have found more issues about the rows estimate too. I am eager to test whatever fix you will submit. Thanks Rick — Riccardo Pizzi