set a hard limit for internal temporary tables

Bug #1870237 reported by Andrea Ieri
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Percona Cluster Charm
Triaged
Medium
Unassigned

Bug Description

Today I handled an alert on a customer cloud (bionic-rocky) in which one of the mysql units was *very rapidly* running out of space.

It turned out the culprit was a >300GB ibtmp1 file, which according to[0] and [1]:
* grows indefinitely unless capped
* is never shrunk
* is only truncated at service restart

The recommendation is therefore to explicitly limit its size with:

innodb_temp_data_file_path=ibtmp1:12M:autoextend:max:<some reasonable number>

This does not avoid the need to restart the mysql service if ibtmp1 reaches the threshold, as some queries would in that case fail, but at least ensures that a growing temporary file won't bring down an entire node by filling up all available space.

For the reason above, an nrpe check alerting when the file does reach the limit should also be provided (to signal an operator that a service restart should be performed when possible).

This seems to only apply to mysql 5.7.

[0] https://www.percona.com/blog/2017/12/04/internal-temporary-tables-mysql-5-7/
[1] https://bugs.mysql.com/bug.php?id=82556

Revision history for this message
Chris Sanders (chris.sanders) wrote :

Subscribing field medium, the only known work around to this is monitoring and service restarts which isn't a great solution and could still have a cluster go down before it's resolved.

Revision history for this message
Liam Young (gnuoy) wrote :

Although this does not fix the issue it can sometimes be mitigated by archiving old data from the database. The two biggest culprits are nova tables and gnocchi tables. You can see the row counts with:

SELECT table_name, table_rows FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'nova'
ORDER BY table_rows;

and

SELECT table_name, table_rows
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'gnocchi'
ORDER BY table_rows;

The nova-cloud-controller charm exposes one action called 'archive-data' which runs "nova-manage db archive_deleted_rows" with a given batch size. This will move deleted rows from production tables to shadow tables. If there is a large backlog you will need to run this many times.

Unfortunately things are bit more tricky with gnocchi, as far as I know the only option is to purge it through the api and that is slow. It took about three weeks for us to purge old data. I'll pastebin the script we use but it will DELETE INDISCRIMINATELY https://paste.ubuntu.com/p/gNN6rZbVDj/

James Page (james-page)
Changed in charm-percona-cluster:
status: New → Triaged
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.