Since Trusty /proc/diskstats shows weird values
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| linux (Ubuntu) |
Medium
|
Unassigned | ||
| Trusty |
Medium
|
Chris J Arges |
Bug Description
SRU Justification:
Impact: Tools that rely on diskstats may report incorrect data in certain conditions. In particular diskstats in a VM may report incorrect statistics.
Fix: 0fec08b4ecfc36f
Testcase:
- Install a VM with the affected kernel
- Run cat /proc/diskstats | awk '$3=="vda" { print $7/$4, $11/$8 }'
- If the two values are much larger compared to the v3.14-rc1 kernel in the same VM, we have failed. For example in a failing case I see: "132.44 5458.34"; in a passing case I see: "0.19334 5.90476".
--
After upgrading some virtual machines (KVM) to Trusty I noticed really high I/O wait times, e.g. Munin graphs now show up to 200 seconds(!) read I/O wait time. See attached image. Of course real latency isn't higher than before, it's only /proc/diskstats that shows totally wrong numbers...
$ cat /proc/diskstats | awk '$3=="vda" { print $7/$4, $11/$8 }'
1375.44 13825.1
From the documentation for /proc/diskstats field 4 is total number of reads completed, field 7 is the total time spent reading in milliseconds, and fields 8 and 11 are the same for writes. So above numbers are the average read and write latency in milliseconds.
Same weird numbers with iowait. Note the column "await" (average time in milliseconds for I/O requests):
$ iostat -dx 1 60
Linux 3.13.0-19-generic (munin) 03/25/14 _x86_64_ (2 CPU)
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 2.30 16.75 72.45 24.52 572.79 778.37 27.87 1.57 620.00 450.20 1121.83 1.71 16.54
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 52.00 0.00 25.00 0.00 308.00 24.64 0.30 27813.92 0.00 27813.92 0.48 1.20
I upgraded the host system to Trusty too, however there /proc/diskstats output is normal as before.
$ uname -r
3.13.0-19-generic
Holger Mauermann (mauermann) wrote : | #1 |
Changed in linux (Ubuntu): | |
status: | New → Incomplete |
tags: | added: trusty |
Joseph Salisbury (jsalisbury) wrote : | #4 |
Would it be possible for you to test the latest upstream kernel? Refer to https:/
If this bug is fixed in the mainline kernel, please add the following tag 'kernel-
If the mainline kernel does not fix this bug, please add the tag: 'kernel-
If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".
Thanks in advance.
[0] http://
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
status: | Confirmed → Incomplete |
Holger Mauermann (mauermann) wrote : | #5 |
No issues with 3.14 kernel
Changed in linux (Ubuntu): | |
status: | Incomplete → Confirmed |
tags: | added: kernel-fixed-upstream |
tobby (tobby88) wrote : | #6 |
I can confirm this bug still exist in current Trusty-kernel (3.13.0-
affects: | linux (Ubuntu) → linux-meta (Ubuntu) |
affects: | linux-meta (Ubuntu) → linux (Ubuntu) |
Chris J Arges (arges) wrote : | #7 |
Can you explain how you setup your guest vm? If you are using libvirt, could you do a 'virsh dumpxml <domain>' of the affected VM?
tobby (tobby88) wrote : | #8 |
This is my affected VM.
Changed in linux (Ubuntu Trusty): | |
assignee: | nobody → Chris J Arges (arges) |
status: | New → In Progress |
importance: | Undecided → Medium |
Chris J Arges (arges) wrote : | #9 |
Ok I can reproduce this issue, so far looks like something between v3.13 and v3.14-rc1. I'll start bisecting.
Chris J Arges (arges) wrote : | #10 |
Ok I found a patch that resolves the issue: 0fec08b4ecfc36f
Chris J Arges (arges) wrote : | #11 |
Can you please test this build with that patch backported:
http://
Thanks
tobby (tobby88) wrote : | #12 |
Just tried it - yes, this works! Now diskstats look normal on virtual machines.
description: | updated |
Changed in linux (Ubuntu): | |
status: | Confirmed → Fix Released |
Chris J Arges (arges) wrote : | #13 |
SRU submitted for this.
Changed in linux (Ubuntu Trusty): | |
status: | In Progress → Fix Committed |
Brad Figg (brad-figg) wrote : | #14 |
This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-
If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.
See https:/
tags: | added: verification-needed-trusty |
tags: |
added: verification-done-trusty removed: verification-needed-trusty |
Launchpad Janitor (janitor) wrote : | #15 |
This bug was fixed in the package linux - 3.13.0-36.63
---------------
linux (3.13.0-36.63) trusty; urgency=low
[ Joseph Salisbury ]
* Release Tracking Bug
- LP: #1365052
[ Feng Kan ]
* SAUCE: (no-up) irqchip:gic: change access of gicc_ctrl register to read
modify write.
- LP: #1357527
* SAUCE: (no-up) arm64: optimized copy_to_user and copy_from_user
assembly code
- LP: #1358949
[ Ming Lei ]
* SAUCE: (no-up) Drop APM X-Gene SoC Ethernet driver
- LP: #1360140
* [Config] Drop XGENE entries
- LP: #1360140
* [Config] CONFIG_NET_XGENE=m for arm64
- LP: #1360140
[ Stefan Bader ]
* SAUCE: Add compat macro for skb_get_hash
- LP: #1358162
* SAUCE: bcache: prevent crash on changing writeback_running
- LP: #1357295
[ Suman Tripathi ]
* SAUCE: (no-up) arm64: Fix the csr-mask for APM X-Gene SoC AHCI SATA PHY
clock DTS node.
- LP: #1359489
* SAUCE: (no-up) ahci_xgene: Skip the PHY and clock initialization if
already configured by the firmware.
- LP: #1359501
* SAUCE: (no-up) ahci_xgene: Fix the link down in first attempt for the
APM X-Gene SoC AHCI SATA host controller driver.
- LP: #1359507
[ Tuan Phan ]
* SAUCE: (no-up) pci-xgene-msi: fixed deadlock in irq_set_affinity
- LP: #1359514
[ Upstream Kernel Changes ]
* iwlwifi: mvm: Add a missed beacons threshold
- LP: #1349572
* mac80211: reset probe_send_count also in HW_CONNECTION_
- LP: #1349572
* genirq: Add an accessor for IRQ_PER_CPU flag
- LP: #1357527
* arm64: perf: add support for percpu pmu interrupt
- LP: #1357527
* cifs: sanity check length of data to send before sending
- LP: #1283101
* KVM: nVMX: Pass vmexit parameters to nested_vmx_vmexit
- LP: #1329434
* KVM: nVMX: Rework interception of IRQs and NMIs
- LP: #1329434
* KVM: vmx: disable APIC virtualization in nested guests
- LP: #1329434
* HID: Add transport-driver functions to the USB HID interface.
- LP: #1353021
* ahci_xgene: Removing NCQ support from the APM X-Gene SoC AHCI SATA Host
Controller driver.
- LP: #1358498
* fold d_kill() and d_free()
- LP: #1354234
* fold try_prune_
- LP: #1354234
* new helper: dentry_free()
- LP: #1354234
* expand the call of dentry_lru_del() in dentry_kill()
- LP: #1354234
* dentry_kill(): don't try to remove from shrink list
- LP: #1354234
* don't remove from shrink list in select_collect()
- LP: #1354234
* more graceful recovery in umount_collect()
- LP: #1354234
* dcache: don't need rcu in shrink_
- LP: #1354234
* lift the "already marked killed" case into shrink_
* split dentry_kill()
- LP: #1354234
* expand dentry_kill(dentry, 0) in shrink_
- LP: #1354234
* shrink_
- LP: #1354234
* dealing with the rest of shrink_
- LP: #1354234
* dentry_kill() doesn't need the second argument now
- LP: #1354234
* dcache: add missing lockdep annotation
- LP: #1354234
* fs: convert use of typedef ctl_table to struct ctl_table
...
Changed in linux (Ubuntu Trusty): | |
status: | Fix Committed → Fix Released |
This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:
apport-collect 1297522
and then change the status of the bug to 'Confirmed'.
If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.
This change has been made by an automated script, maintained by the Ubuntu Kernel Team.