Bug #677407 “Stale data in INFORMATION_SCHEMA.INNODB_LOCKS” : Bugs : Percona Server moved to https://jira.percona.com/projects/PS

Revision history for this message

Kristian Nielsen (knielsen) wrote on 2010-11-19:

#1

mbug677407.patch Edit (4.9 KiB, text/plain)

Patch for the problem, including BSD license for Percona for inclusion into Percona-Server.

Percona developers, please check the patch and consider it for inclusion (or let me
know if you see any problem with the patch).

I have committed the fix to XtraDB in MariaDB.

Changed in maria:
status:	New → Fix Committed
assignee:	nobody → Kristian Nielsen (knielsen)
importance:	Undecided → Medium
importance:	Medium → High

Alexey Kopytov (akopytov) on 2010-11-25

Changed in percona-server:
assignee:	nobody → Alexey Kopytov (akopytov)

Percona (percona-team) on 2010-11-25

Changed in percona-server:
status:	New → Confirmed
importance:	Undecided → High

Percona (percona-team) on 2010-11-25

Changed in percona-server:
milestone:	none → 5.1.53-11.7

Alexey Kopytov (akopytov) on 2010-11-29

Changed in percona-server:
status:	Confirmed → In Progress

Revision history for this message

Alexey Kopytov (akopytov) wrote on 2010-12-02:

#2

Pushed to release-5.1.53-11 and trunk.

Changed in percona-server:
status:	In Progress → Fix Committed

Kristian Nielsen (knielsen) on 2010-12-07

Changed in maria:
milestone:	none → 5.1

Revision history for this message

Vasil Dimov (vasild) wrote on 2011-02-17:

#3

Hello Kristian,

I am the author of the 3 InnoDB INFORMATION_SCHEMA tables INNODB_TRX, INNODB_LOCKS and INNODB_LOCK_WAITS. I am afraid your analysis is wrong and the patch breaks the logic that prevents inconsistent data being fed to MySQL during JOIN.

First you are talking about the INNODB_PLUGIN table. I assume this is a typo and you actually mean "any of INNODB_TRX, INNODB_LOCKS or INNODB_LOCK_WAITS" instead.

You say "First, a table cache is filled with the real information, under the
kernel_mutex. This is only done if at least 100 milliseconds has passed since
last filling the cache." this is not true, it should read "if at least 100 milliseconds have passed since last READING the cache."

"Second, the MySQL table data is populated from the cache, which may thus be up to 100 milliseconds old", this is also not true. The cache may be older than 100ms.

See http://dev.mysql.com/doc/refman/5.5/en/innodb-information-schema-notes.html

Let me outline the problem that this cache is trying to solve. When the user issues SELECT from any of those I_S tables then MySQL asks the storage engine "fill this table with data for me". When the user issues a JOIN on two or more tables then MySQL will ask the storage engine to fill the tables independently of each other. That is - the storage engine does not know it is a JOIN.

This combined with the fact that the contents of each of those 3 tables could change very fast means that it is possible to feed inconsistent contents of the tables to MySQL when a JOIN is executed. This is highly undesirable because the JOIN output will then be malformed.

The solution implemented is not to _update_ the cache if it has been _read_ recently.

If you query any of those I_S tables more often than once per 0.1 seconds you will see data from the past. But if you really do this on a production server (why?) then that server will be stalled so much that it could become unusable.

"2. If multiple queries against INNODB_PLUGIN start at around the same time,
they can _all_ cause the cache to be updated, as the "last updated" timestamp
is not reset immediately after updating the cache. This can cause extra
unnecessary contention on kernel_mutex."

It is "last READ" timestamp, not "last updated". This is just fine and is per designed. From the doc:
"Because InnoDB must temporarily stall while the transaction and locking data is collected, too frequent queries of these tables can negatively impact performance as seen by other users"

Can you come up with an example where your patch breaks the above logic and leads to inconsistent data (from different cache snapshots) being fed to MySQL resulting in bogus JOIN result?

Thanks!

Hello Kristian,

I am the author of the 3 InnoDB INFORMATION_SCHEMA tables INNODB_TRX, INNODB_LOCKS and INNODB_LOCK_WAITS. I am afraid your analysis is wrong and the patch breaks the logic that prevents inconsistent data being fed to MySQL during JOIN.

First you are talking about the INNODB_PLUGIN table. I assume this is a typo and you actually mean "any of INNODB_TRX, INNODB_LOCKS or INNODB_LOCK_WAITS" instead.

You say "First, a table cache is filled with the real information, under the
kernel_mutex. This is only done if at least 100 milliseconds has passed since
last filling the cache." this is not true, it should read "if at least 100 milliseconds have passed since last READING the cache."

"Second, the MySQL table data is populated from the cache, which may thus be up to 100 milliseconds old", this is also not true. The cache may be older than 100ms.

See http://dev.mysql.com/doc/refman/5.5/en/innodb-information-schema-notes.html

Let me outline the problem that this cache is trying to solve. When the user issues SELECT from any of those I_S tables then MySQL asks the storage engine "fill this table with data for me". When the user issues a JOIN on two or more tables then MySQL will ask the storage engine to fill the tables independently of each other. That is - the storage engine does not know it is a JOIN.

This combined with the fact that the contents of each of those 3 tables could change very fast means that it is possible to feed inconsistent contents of the tables to MySQL when a JOIN is executed. This is highly undesirable because the JOIN output will then be malformed.

The solution implemented is not to _update_ the cache if it has been _read_ recently.

If you query any of those I_S tables more often than once per 0.1 seconds you will see data from the past. But if you really do this on a production server (why?) then that server will be stalled so much that it could become unusable.

"2. If multiple queries against INNODB_PLUGIN start at around the same time,
they can _all_ cause the cache to be updated, as the "last updated" timestamp
is not reset immediately after updating the cache. This can cause extra
unnecessary contention on kernel_mutex."

It is "last READ" timestamp, not "last updated". This is just fine and is per designed. From the doc:
"Because InnoDB must temporarily stall while the transaction and locking data is collected, too frequent queries of these tables can negatively impact performance as seen by other users"

Can you come up with an example where your patch breaks the above logic and leads to inconsistent data (from different cache snapshots) being fed to MySQL resulting in bogus JOIN result?

Thanks!

Revision history for this message

Kristian Nielsen (knielsen) wrote on 2011-02-17: Re: [Bug 677407] Re: Stale data in INFORMATION_SCHEMA.INNODB_LOCKS

#4

Vasil Dimov <email address hidden> writes:

> Hello Kristian,

Hi Vasil, thanks for commenting on this issue!

> I am the author of the 3 InnoDB INFORMATION_SCHEMA tables INNODB_TRX,
> INNODB_LOCKS and INNODB_LOCK_WAITS. I am afraid your analysis is wrong
> and the patch breaks the logic that prevents inconsistent data being fed
> to MySQL during JOIN.

Yes, you are right, this is indeed documented behaviour:

http://dev.mysql.com/doc/refman/5.5/en/innodb-information-schema-notes.html

So clearly my patch breaks this :-(

So I think I will need to revert the patch, thanks for pointing out the
problem.

When I first read your explanation, I thought it was hardly any better to be
consistent within 100 milliseconds, but inconsistent for larger intervals.
Eg. if someone joined two INFORMATION_SCHEMA.INNODB_XXX tables with one large
table in the middle, presumably there could still be inconsistencies.

But clearly, there is a common use-case, which is a monitoring tool doing
basic joins against just these INNODB_XXX tables, and in practise the 100
milliseconds work well I suppose. And it is documented behaviour (shame on me
for missing that). And a full transactional snapshot isolation implementation
for INNODB_XXX does seem a bit overkill.

----

So, the original problem here is this MySQL bug:

http://bugs.mysql.com/bug.php?id=48883

The test case innodb.innodb_information_schema fails sporadically because of
this behaviour. I will find a better way to fix this failure.

> Can you come up with an example where your patch breaks the above logic
> and leads to inconsistent data (from different cache snapshots) being
> fed to MySQL resulting in bogus JOIN result?

Yes, I will try to put together a test case that fails with my patch. Though
it may be tricky due to the 100 milliseconds, a busy test server might get
inconsistent join anyway.

Thanks,

- Kristian.

Revision history for this message

Vasil Dimov (vasild) wrote on 2011-02-17:

#5

Hi,

Yes, the "write if only it has not been read for some time" solution is not 100% guarantee, but is a good enough approach.

Maybe the solution to the failing innodb_information_schema test is to inject "-- sleep 0.1" just before calling wait_until_rows_count.inc? Or actually the whole wait_until_rows_count.inc is useless because as soon as it starts it will query the I_S table very often and will thus "freeze" the cache. But that will also be non-deterministic.

Thanks!

Kristian Nielsen (knielsen) on 2011-02-17

Changed in maria:
status:	Fix Committed → In Progress

Revision history for this message

Vasil Dimov (vasild) wrote on 2011-02-17:

#6

http://lists.mysql.com/commits/131567

Revision history for this message

Kristian Nielsen (knielsen) wrote on 2011-02-17:

#7

Couldn't we make a similar solution without a magic "100msec" timeout?

For example, have something similar to a refcount on the cache, which counts the number of active statements that accessed the tables. The refcount would be incremented on first use of one of the tables in a statement, and decremented at statement end. And we would only refresh the cache if the refcount was zero. Or something like that, what do you think?

Revision history for this message

Vasil Dimov (vasild) wrote on 2011-02-21:

#8

Hi,

I need to look at the API again, but I think it is very limited - MySQL just calls a "fill" function for a given I_S table. I was thinking about finding out the SQL statement string and parsing it to see if it is a JOIN but then decided this would be an infinite source of bugs and abandoned the idea :)

Ignacio Nin (ignacio-nin) on 2011-03-08

Changed in percona-server:
status:	Fix Committed → Fix Released

Revision history for this message

Shahriyar Rzayev (rzayev-sehriyar) wrote on 2018-01-25:

#9

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-444

Percona Server moved to https://jira.percona.com/projects/PS

Stale data in INFORMATION_SCHEMA.INNODB_LOCKS

Bug Description

Related branches

Other bug subscribers

Patches

Remote bug watches

Affects		Status	Importance	Assigned to	Milestone
	MariaDB	In Progress	High	Kristian Nielsen	MariaDB 5.1
	Percona Server moved to https://jira.percona.com/projects/PS	Fix Released	High	Alexey Kopytov	Percona Server moved to https://jira.percona.com/projects/PS 5.1.53-11.7