Setting audit_log_rotate_on_size audit_log_flush & audit_log_rotate_on_size hangs server

Bug #1649500 reported by Marek
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Percona Server moved to https://jira.percona.com/projects/PS
Status tracked in 5.7
5.5
Triaged
High
Sergei Glushchenko
5.6
Triaged
High
Sergei Glushchenko
5.7
Triaged
High
Sergei Glushchenko

Bug Description

I have problem with audit_log.so, and tested it on
Ubuntu 12.04 LTS:
 - Percona 5.6
 - Percona 5.7
Ubuntu 16.04 LTS:
 - Percona 5.6
 - Percona 5.7

The issue is around audit_log_flush variable, not sure yet if it's just when setting it dynamically or always, but it's making mysql server unresponsive.

How to reproduce problem
Install ubuntu (possible the same issue on other systems)
just for check, do 10 000 queries to mysql.user
for i in `seq 10000`; do mysql -e 'select * from mysql.user limit 1'; done
Everything should work now.
Now login to mysql, and do:
>INSTALL PLUGIN audit_log SONAME "audit_log.so";
>set global audit_log_rotate_on_size=524288000;SET GLOBAL audit_log_flush = ON;

And do the same
for i in `seq 10000`; do mysql -e 'select * from mysql.user limit 1'; done

This will hang, and you cannot open mysql connection anymore.

I tried to do strace on mysqld, and one pid reporting that:

set_robust_list(0x7fa5537fe9e0, 24) = 0
clock_gettime(CLOCK_REALTIME, {1481604011, 308691667}) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, {1481604012, 308691600}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
write(21, "<AUDIT_RECORD\n NAME=\"Audit\"\n R"..., 194) = 194
clock_gettime(CLOCK_REALTIME, {1481604012, 308989921}) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 3, {1481604013, 308989900}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
clock_gettime(CLOCK_REALTIME, {1481604013, 309129167}) = 0
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 5, {1481604014, 309129100}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
clock_gettime(CLOCK_REALTIME, {1481604014, 309353798}) = 0
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 7, {1481604015, 309353700}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
clock_gettime(CLOCK_REALTIME, {1481604015, 309562835}) = 0
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 9, {1481604016, 309562800}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
clock_gettime(CLOCK_REALTIME, {1481604016, 309786755}) = 0
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 11, {1481604017, 309786700}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
clock_gettime(CLOCK_REALTIME, {1481604017, 310008118}) = 0
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 13, {1481604018, 310008100}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
write(21, "<AUDIT_RECORD\n NAME=\"Connect\"\n "..., 539) = 539
clock_gettime(CLOCK_REALTIME, {1481604018, 310255833}) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 15, {1481604019, 310255800}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
clock_gettime(CLOCK_REALTIME, {1481604019, 310497057}) = 0
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 17, {1481604020, 310497000}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
clock_gettime(CLOCK_REALTIME, {1481604020, 310696128}) = 0
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 19, {1481604021, 310696100}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
clock_gettime(CLOCK_REALTIME, {1481604021, 310873472}) = 0
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
futex(0x350fd1c, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 21, {1481604022, 310873400}, ffffffff) = -1 ETIMEDOUT (Connection timed out)
futex(0x350fcb0, FUTEX_WAKE_PRIVATE, 1) = 0
write(21, "<AUDIT_RECORD\n NAME=\"Query\"\n R"..., 943) = 943
lseek(21, 0, SEEK_CUR) = 8162
access("./audit.log.524287999", F_OK) = -1 ENOENT (No such file or directory)
access("./audit.log.524287998", F_OK) = -1 ENOENT (No such file or directory)
access("./audit.log.524287997", F_OK) = -1 ENOENT (No such file or directory)
access("./audit.log.524287996", F_OK) = -1 ENOENT (No such file or directory)
access("./audit.log.524287995", F_OK) = -1 ENOENT (No such file or directory)
access("./audit.log.524287994", F_OK) = -1 ENOENT (No such file or directory)
access("./audit.log.524287993", F_OK) = -1 ENOENT (No such file or directory)

...
and another 700 000 logs like that audit.log.timestamp = no such file or directory
It looks like some weird loop somewhere :)

# date ; grep 'audit.log' ./strace.5753 | grep 'No such file' | wc -l
Tue Dec 13 04:42:38 UTC 2016
8185527
# date ; grep 'audit.log' ./strace.5753 | grep 'No such file' | wc -l
Tue Dec 13 04:42:47 UTC 2016
8513283

Thanks for attantion, hopefully it can be resolved .
It looks like it could be related to: https://bugs.launchpad.net/percona-server/+bug/1382231
but there is not much informations in there :)

Tags: audit
Revision history for this message
Muhammad Irfan (muhammad-irfan) wrote :

I'm able to reproduce with mentioned steps.

mysql> set global audit_log_rotate_on_size=524288000;SET GLOBAL audit_log_flush = ON;
Query OK, 0 rows affected (0.04 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%audit%';
+-----------------------------+---------------+
| Variable_name | Value |
+-----------------------------+---------------+
| audit_log_buffer_size | 1048576 |
| audit_log_exclude_accounts | |
| audit_log_exclude_commands | |
| audit_log_exclude_databases | |
| audit_log_file | audit.log |
| audit_log_flush | OFF |
| audit_log_format | OLD |
| audit_log_handler | FILE |
| audit_log_include_accounts | |
| audit_log_include_commands | |
| audit_log_include_databases | |
| audit_log_policy | ALL |
| audit_log_rotate_on_size | 524288000 |
| audit_log_rotations | 0 |
| audit_log_strategy | ASYNCHRONOUS |
| audit_log_syslog_facility | LOG_USER |
| audit_log_syslog_ident | percona-audit |
| audit_log_syslog_priority | LOG_INFO |
+-----------------------------+---------------+
18 rows in set (0.00 sec)

$ for i in `seq 10000`; do mysql -e 'select * from mysql.user limit 1'; done

And it just hanged and I can't connect to MySQL server.

However, audit_log_flush shows OFF which is turned ON dynamically. This doesn't seems dynamic variable but doc mentions it's dynamic https://www.percona.com/doc/percona-server/5.6/management/audit_log_plugin.html#audit_log_flush

Interestingly, when I enabled audit_log_flush & audit_log_rotate_on_size from my.cnf after restarting instance I couldn't reproduce same problem.

$ for i in `seq 10000`; do mysql -e 'select * from mysql.user limit 1'; done

This completed absolutely fine. It could be related to audit_log_flush handling.

Changed in percona-server:
status: New → Confirmed
Changed in percona-server:
assignee: nobody → Muhammad Irfan (muhammad-irfan)
Revision history for this message
Muhammad Irfan (muhammad-irfan) wrote :

I'm able to replicate same issue on Percona Server 5.7.17 as described above.

summary: - audit_log.so bug
+ Setting audit_log_rotate_on_size audit_log_flush &
+ audit_log_rotate_on_size hangs server
Revision history for this message
Shahriyar Rzayev (rzayev-sehriyar) wrote :

Percona now uses JIRA for bug reports so this bug report is migrated to: https://jira.percona.com/browse/PS-1037

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.