Since Trusty /proc/diskstats shows weird values

Bug #1297522 reported by Holger Mauermann on 2014-03-25
98
This bug affects 17 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Trusty
Medium
Chris J Arges

Bug Description

SRU Justification:

Impact: Tools that rely on diskstats may report incorrect data in certain conditions. In particular diskstats in a VM may report incorrect statistics.

Fix: 0fec08b4ecfc36fd8a64432343b2964fb86d2675 ( in 3.14-rc1 )

Testcase:
  - Install a VM with the affected kernel
  - Run cat /proc/diskstats | awk '$3=="vda" { print $7/$4, $11/$8 }'
  - If the two values are much larger compared to the v3.14-rc1 kernel in the same VM, we have failed. For example in a failing case I see: "132.44 5458.34"; in a passing case I see: "0.19334 5.90476".

--

After upgrading some virtual machines (KVM) to Trusty I noticed really high I/O wait times, e.g. Munin graphs now show up to 200 seconds(!) read I/O wait time. See attached image. Of course real latency isn't higher than before, it's only /proc/diskstats that shows totally wrong numbers...

$ cat /proc/diskstats | awk '$3=="vda" { print $7/$4, $11/$8 }'
1375.44 13825.1

From the documentation for /proc/diskstats field 4 is total number of reads completed, field 7 is the total time spent reading in milliseconds, and fields 8 and 11 are the same for writes. So above numbers are the average read and write latency in milliseconds.

Same weird numbers with iowait. Note the column "await" (average time in milliseconds for I/O requests):

$ iostat -dx 1 60
Linux 3.13.0-19-generic (munin) 03/25/14 _x86_64_ (2 CPU)

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 2.30 16.75 72.45 24.52 572.79 778.37 27.87 1.57 620.00 450.20 1121.83 1.71 16.54

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 52.00 0.00 25.00 0.00 308.00 24.64 0.30 27813.92 0.00 27813.92 0.48 1.20

I upgraded the host system to Trusty too, however there /proc/diskstats output is normal as before.

$ uname -r
3.13.0-19-generic

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1297522

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Holger Mauermann (mauermann) wrote :
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.14 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14-rc8-trusty/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Holger Mauermann (mauermann) wrote :

No issues with 3.14 kernel

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-fixed-upstream
tobby (tobby88) wrote :

I can confirm this bug still exist in current Trusty-kernel (3.13.0-34-generic). On host system diskstats are ok, but on virtual machine (KVM) the diskstats are showing crazy values.

affects: linux (Ubuntu) → linux-meta (Ubuntu)
Brad Figg (brad-figg) on 2014-08-26
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Chris J Arges (arges) wrote :

Can you explain how you setup your guest vm? If you are using libvirt, could you do a 'virsh dumpxml <domain>' of the affected VM?

tobby (tobby88) wrote :

This is my affected VM.

Chris J Arges (arges) on 2014-08-27
Changed in linux (Ubuntu Trusty):
assignee: nobody → Chris J Arges (arges)
status: New → In Progress
importance: Undecided → Medium
Chris J Arges (arges) wrote :

Ok I can reproduce this issue, so far looks like something between v3.13 and v3.14-rc1. I'll start bisecting.

Chris J Arges (arges) wrote :

Ok I found a patch that resolves the issue: 0fec08b4ecfc36fd8a64432343b2964fb86d2675

Chris J Arges (arges) wrote :

Can you please test this build with that patch backported:
http://people.canonical.com/~arges/lp1297522/

Thanks

tobby (tobby88) wrote :

Just tried it - yes, this works! Now diskstats look normal on virtual machines.

Chris J Arges (arges) on 2014-08-28
description: updated
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Chris J Arges (arges) wrote :

SRU submitted for this.

Tim Gardner (timg-tpi) on 2014-08-28
Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
tags: added: verification-done-trusty
removed: verification-needed-trusty
Launchpad Janitor (janitor) wrote :
Download full text (5.8 KiB)

This bug was fixed in the package linux - 3.13.0-36.63

---------------
linux (3.13.0-36.63) trusty; urgency=low

  [ Joseph Salisbury ]

  * Release Tracking Bug
    - LP: #1365052

  [ Feng Kan ]

  * SAUCE: (no-up) irqchip:gic: change access of gicc_ctrl register to read
    modify write.
    - LP: #1357527
  * SAUCE: (no-up) arm64: optimized copy_to_user and copy_from_user
    assembly code
    - LP: #1358949

  [ Ming Lei ]

  * SAUCE: (no-up) Drop APM X-Gene SoC Ethernet driver
    - LP: #1360140
  * [Config] Drop XGENE entries
    - LP: #1360140
  * [Config] CONFIG_NET_XGENE=m for arm64
    - LP: #1360140

  [ Stefan Bader ]

  * SAUCE: Add compat macro for skb_get_hash
    - LP: #1358162
  * SAUCE: bcache: prevent crash on changing writeback_running
    - LP: #1357295

  [ Suman Tripathi ]

  * SAUCE: (no-up) arm64: Fix the csr-mask for APM X-Gene SoC AHCI SATA PHY
    clock DTS node.
    - LP: #1359489
  * SAUCE: (no-up) ahci_xgene: Skip the PHY and clock initialization if
    already configured by the firmware.
    - LP: #1359501
  * SAUCE: (no-up) ahci_xgene: Fix the link down in first attempt for the
    APM X-Gene SoC AHCI SATA host controller driver.
    - LP: #1359507

  [ Tuan Phan ]

  * SAUCE: (no-up) pci-xgene-msi: fixed deadlock in irq_set_affinity
    - LP: #1359514

  [ Upstream Kernel Changes ]

  * iwlwifi: mvm: Add a missed beacons threshold
    - LP: #1349572
  * mac80211: reset probe_send_count also in HW_CONNECTION_MONITOR case
    - LP: #1349572
  * genirq: Add an accessor for IRQ_PER_CPU flag
    - LP: #1357527
  * arm64: perf: add support for percpu pmu interrupt
    - LP: #1357527
  * cifs: sanity check length of data to send before sending
    - LP: #1283101
  * KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
    - LP: #1329434
  * KVM: nVMX: Rework interception of IRQs and NMIs
    - LP: #1329434
  * KVM: vmx: disable APIC virtualization in nested guests
    - LP: #1329434
  * HID: Add transport-driver functions to the USB HID interface.
    - LP: #1353021
  * ahci_xgene: Removing NCQ support from the APM X-Gene SoC AHCI SATA Host
    Controller driver.
    - LP: #1358498
  * fold d_kill() and d_free()
    - LP: #1354234
  * fold try_prune_one_dentry()
    - LP: #1354234
  * new helper: dentry_free()
    - LP: #1354234
  * expand the call of dentry_lru_del() in dentry_kill()
    - LP: #1354234
  * dentry_kill(): don't try to remove from shrink list
    - LP: #1354234
  * don't remove from shrink list in select_collect()
    - LP: #1354234
  * more graceful recovery in umount_collect()
    - LP: #1354234
  * dcache: don't need rcu in shrink_dentry_list()
    - LP: #1354234
  * lift the "already marked killed" case into shrink_dentry_list()
  * split dentry_kill()
    - LP: #1354234
  * expand dentry_kill(dentry, 0) in shrink_dentry_list()
    - LP: #1354234
  * shrink_dentry_list(): take parent's ->d_lock earlier
    - LP: #1354234
  * dealing with the rest of shrink_dentry_list() livelock
    - LP: #1354234
  * dentry_kill() doesn't need the second argument now
    - LP: #1354234
  * dcache: add missing lockdep annotation
    - LP: #1354234
  * fs: convert use of typedef ctl_table to struct ctl_table
 ...

Read more...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers