pt-archiver --check-slave-lag agressiveness

Reported by Jonathan Nicol on 2012-09-25
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Percona Toolkit
Undecided
Unassigned

Bug Description

running pt-archiver 2.1.3 to delete old data from a table. It's going rather slowly, and the statistics show ~95% of the time is being spent on Action "other". Suspecting it might be too aggressive checking the slave lag I added --check-interval 10 --max-lag 10, which didn't give me any improvement. Still suspicious I ran tcpdump on the slave, and unless I'm misunderstanding something, it's being WAY too aggressive.

Command being used (tables/hosts obfuscated):
pt-archiver --purge --progress 50000 --statistics --bulk-delete --limit 10000 --commit-each --primary-key-only --check-slave-lag h=slave-host --check-interval 10 --max-lag 10 --source h=localhost,D=database,t=table --where "created_date < date_sub(now(), INTERVAL 6 month)"

While that was running on the master, I ran on the slave (using the port pt-archiver is connected to):

tcpdump -i bond0 port 49290 -s 65535 -x -n -q -tttt > tcpdump.txt
pt-query-digest tcpdump.txt --type tcpdump
...
# 2.1s user time, 80ms system time, 20.57M rss, 120.88M vsz
# Current date: Tue Sep 25 19:28:22 2012
# Hostname:
# Files: tcpdump.txt
# Overall: 1.16k total, 1 unique, 54.26 QPS, 0.01x concurrency ___________
# Time range: 2012-09-25 19:18:41.060553 to 19:19:02.511825
# Attribute total min max avg 95% stddev median
# ============ ======= ======= ======= ======= ======= ======= =======
# Exec time 113ms 28us 490us 96us 113us 15us 89us
# Rows affecte 0 0 0 0 0 0 0
# Query size 19.32k 17 17 17 17 0 17
# Warning coun 0 0 0 0 0 0 0

# Profile
# Rank Query ID Response time Calls R/Call Apdx V/M Item
# ==== ================== ============= ===== ====== ==== ===== ==========
# 1 0x791C5370A1021F19 0.1128 100.0% 1164 0.0001 1.00 0.00 SHOW SLAVE STATUS

# Query 1: 54.26 QPS, 0.01x concurrency, ID 0x791C5370A1021F19 at byte 3416555
# This item is included in the report because it matches --limit.
# Scores: Apdex = 1.00 [1.0], V/M = 0.00
# Query_time sparkline: | ^_ |
# Time range: 2012-09-25 19:18:41.060553 to 19:19:02.511825
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 100 1164
# Exec time 100 113ms 28us 490us 96us 113us 15us 89us
# Rows affecte 0 0 0 0 0 0 0 0
# Query size 100 19.32k 17 17 17 17 0 17
# Warning coun 0 0 0 0 0 0 0 0
# String:
# Errors none
# Hosts
# Query_time distribution
# 1us
# 10us ################################################################
# 100us ####################################
# 1ms
# 10ms
# 100ms
# 1s
# 10s+
SHOW SLAVE STATUS\G

I don't think it should be running "SHOW SLAVE STATUS" 55 times per second.

tags: added: pt-archiver replication-lag useless-work
Changed in percona-toolkit:
status: New → Triaged
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers