cat /sys/fs/cgroup/blkio/blkio.time_recursive took 1second to complete

Bug #2021571 reported by norman shen
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux-meta (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Issue:

```console
# time cat /sys/fs/cgroup/blkio/blkio.time_recursive
8:16 354721435

real 0m1.297s
user 0m0.000s
sys 0m1.297s
```

As could be seen from above result, cat blkio took about 1 second to complete which
is much longer than a normal case.

Kernel Version:

Linux compute08 4.15.0-72-generic #81-Ubuntu SMP Tue Nov 26 12:20:02 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

Would be appreciated to know any operations to identify the issue, Thank you very much for the help.

norman shen (jshen28)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-meta (Ubuntu):
status: New → Confirmed
Revision history for this message
Kevin Brierly (kb1828) wrote (last edit ):

Was the cause of this ever determined? I am seeing something similar on some of my systems. Times are usually around 1s, sometimes higher. A recent high was 1.7s

# uname -r
4.15.0-193-generic

# time cat /sys/fs/cgroup/blkio/blkio.time_recursive
8:144 69349171178307
8:128 4718621058
8:112 4604996397
8:96 4625378704
8:80 4873085108
8:64 3777190161
8:48 4749702443
8:32 8346248400
8:16 9260262419
8:0 154939957376379

real 0m1.03s
user 0m0.00s
sys 0m1.03s

Revision history for this message
norman shen (jshen28) wrote :

Hi Kevin, have you used perf and analyze the flamegraph?

Revision history for this message
Kevin Brierly (kb1828) wrote :

yes. just confirms it takes a while.

Revision history for this message
norman shen (jshen28) wrote :

thank you and same for me. __percpu_counter_sum is very slow even though only one block device.. Not really figure why. Do you have any clue to reproduce this behavior?

Revision history for this message
Kevin Brierly (kb1828) wrote :

Do you have a lot of blkio devices coming and going? For example iscsi/usb/etc. We currently believe it's tree bloat due to added/removed devices. We are still digging into the issue.

Revision history for this message
norman shen (jshen28) wrote :

we might have around 50 multipath mappers per node and underline device is FC based disk. But disks added/removed should not be frequent. What machine are you using? For me, some nodes are intel 2650, not sure if it is relevant.

Revision history for this message
Kevin Brierly (kb1828) wrote (last edit ):

the hardware does not seem to be relevant. Do you have anything calling "racadm lclog" on fairly regular basis? it creates and destroys a usb device over and over.

Revision history for this message
norman shen (jshen28) wrote :

I have no idea what racadm is ..

Revision history for this message
norman shen (jshen28) wrote :

do you know what __percpu_counter_sum does? I am wondering why it has been sampled a lot of times.

Revision history for this message
Kevin Brierly (kb1828) wrote :

It examines the blkio device tree and sums data as far as i know. Removed devices do not get deleted from the blkio statistics tree and are just marked as offline somehow.

In our case racadm lclog created and destroyed usb devices to pull data and the blkio tree reached over 65000 leafs. It takes time to process a tree that large. WE believe our issue was caused by accumulation over time of the usb device leafs from racadm lclog. WE had it running in a cron which has now been stopped.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.