[linux-azure] Storage performance drop on RAID

Bug #1828248 reported by Adrian Suhov on 2019-05-08
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-azure (Ubuntu)
Undecided
Unassigned
Xenial
Undecided
Unassigned

Bug Description

In Azure, FIO 4k tests are showing performance drops on latest proposed 4.15.0 linux-azure kernels - from 52K IOPS to ~38K IOPS (max value reached on both sequential and random read tests).
The setup used is 12 disks in RAID0.
The affected kernels are:
Ubuntu 14.04 + 4.15.0-1043
Ubuntu 16.04 + 4.15.0-1044

Previous kernel versions had reached max IOPS, ~52K IOPS on both read tests (e.g. 16.04 + 4.15.0-1043 kernel). Right now, it seems like the issue is in the diff from 1043 kernel on trusty vs 1044 kernel on xenial.

The 52K IOPS is expected at qdepth=256. The repro cmd (as ran by the automation) is this:
fio --size=1023G --direct=1 --ioengine=libaio --filename=/dev/md0 --overwrite=1 --readwrite=randread --bs=4K --runtime=300 --iodepth=32 --numjob=8 --output-format=json --output=/root/FIOLog/jsonLog/fio-result-randread-4K-256td.json --name='repro'

Joseph Salisbury (jsalisbury) wrote :

The following commit was added in bug fa55b5d226dd.

7ac257b862f2c (“blk-mq: remove the request_list usage”)

This commit cleans up unused code in block-mq (mq uses pre-allocated tags to allocate request, not request_list as it's used only in legacy queue).

A test kernel with a revert of this commit would prove if it is the cause of this regression.

Joseph Salisbury (jsalisbury) wrote :

This commit was also added to the Ubuntu-azure-4.18.0-1017 kernel. Are we seeing expected performing with that kernel?

Joseph Salisbury (jsalisbury) wrote :

The bug number from comment one should have been bug 1819689

Joshua R. Poulson (jrp) wrote :

Adrian, can you add the VM size here, they are trying to reproduce.

Adrian Suhov (asuhov) wrote :

VM size is Standard_DS14_v2

Marcelo Cerri (mhcerri) wrote :

I built a test kernel reverting the commit that Joe has mention:

https://kernel.ubuntu.com/~mhcerri/azure/xenial-linux-azure-4.15.0-1044.48+1828248/

Adrian Suhov (asuhov) wrote :

The test kernel is reaching 52K IOPS as expected.

Marcelo Cerri (mhcerri) on 2019-05-13
Changed in linux-azure (Ubuntu Xenial):
status: New → Fix Committed
Joseph Salisbury (jsalisbury) wrote :

@Marcelo Cerri, just to confirm, the test kernel posted in comment #6 only had commit 7ac257b862f2c reverted?

Marcelo Cerri (mhcerri) wrote :

Exactly, Joseph. I'm attaching the exact patch I used for building the test kernel.

Marcelo Cerri (mhcerri) wrote :

Hi, Joe. Do you think the revert is enough for now, or should we discuss about eventyually including the changes that you had mentioned on the comment #5 in the bug #1819689?

Joseph Salisbury (jsalisbury) wrote :

This patch was intended to increase performance, not reduce it. Is it too late to revert it?

tags: added: patch
Joshua R. Poulson (jrp) wrote :

@longli I would like your assessment of the changes that we've made here to get back to expected performance. I know we are not getting the same results on NVMe devices with 16.04 and 18.04 and I'm wondering if there's something substantially different with the 4.15 kernel in use on 16.04 that prevents us from making the same changes that we did with 4.18 to get better performance.

Joseph Salisbury (jsalisbury) wrote :

@marcelo, we do want the changes requested in comment #5 in bug 1819689. We can wait for the results of Long Li's investigation before proceeding there.

Marcelo Cerri (mhcerri) wrote :

Thanks for the feedback, @jsalisbury.

So we are proceeding with the revert for now. Later when Long Li's investigation is ready, can you open a new bug listing the remaining commits that need to be included?

Morning Joe,

On 5/14/19 8:40 AM, Joseph Salisbury wrote:
> @marcelo, we do want the changes requested in comment #5 in bug 1819689.
> We can wait for the results of Long Li's investigation before proceeding
> there.
>
that is the revert in comment #6, right?

Terry

Joseph Salisbury (jsalisbury) wrote :

Yes, that is correct. Comment #6. Sorry for the typo.

Launchpad Janitor (janitor) wrote :
Download full text (13.5 KiB)

This bug was fixed in the package linux-azure - 4.15.0-1045.49

---------------
linux-azure (4.15.0-1045.49) xenial; urgency=medium

  * [linux-azure] Storage performance drop on RAID (LP: #1828248)
    - Revert "blk-mq: remove the request_list usage"

  [ Ubuntu: 4.15.0-50.54 ]

  * CVE-2018-12126 // CVE-2018-12127 // CVE-2018-12130
    - Documentation/l1tf: Fix small spelling typo
    - x86/cpu: Sanitize FAM6_ATOM naming
    - kvm: x86: Report STIBP on GET_SUPPORTED_CPUID
    - locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a
      new <linux/bits.h> file
    - tools include: Adopt linux/bits.h
    - x86/msr-index: Cleanup bit defines
    - x86/speculation: Consolidate CPU whitelists
    - x86/speculation/mds: Add basic bug infrastructure for MDS
    - x86/speculation/mds: Add BUG_MSBDS_ONLY
    - x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
    - x86/speculation/mds: Add mds_clear_cpu_buffers()
    - x86/speculation/mds: Clear CPU buffers on exit to user
    - x86/kvm/vmx: Add MDS protection when L1D Flush is not active
    - x86/speculation/mds: Conditionally clear CPU buffers on idle entry
    - x86/speculation/mds: Add mitigation control for MDS
    - x86/speculation/mds: Add sysfs reporting for MDS
    - x86/speculation/mds: Add mitigation mode VMWERV
    - Documentation: Move L1TF to separate directory
    - Documentation: Add MDS vulnerability documentation
    - x86/speculation/mds: Add mds=full,nosmt cmdline option
    - x86/speculation: Move arch_smt_update() call to after mitigation decisions
    - x86/speculation/mds: Add SMT warning message
    - x86/speculation/mds: Fix comment
    - x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
    - x86/speculation/mds: Add 'mitigations=' support for MDS
  * CVE-2017-5715 // CVE-2017-5753
    - s390/speculation: Support 'mitigations=' cmdline option
  * CVE-2017-5715 // CVE-2017-5753 // CVE-2017-5754 // CVE-2018-3639
    - powerpc/speculation: Support 'mitigations=' cmdline option
  * CVE-2017-5715 // CVE-2017-5754 // CVE-2018-3620 // CVE-2018-3639 //
    CVE-2018-3646
    - cpu/speculation: Add 'mitigations=' cmdline option
    - x86/speculation: Support 'mitigations=' cmdline option
  * Packaging resync (LP: #1786013)
    - [Packaging] resync git-ubuntu-log

linux-azure (4.15.0-1044.48) xenial; urgency=medium

  * linux-azure: 4.15.0-1044.48 -proposed tracker (LP: #1826354)

  * [linux-azure] Include mainline commits fc96df16a1ce and ba50bf1ce9a5 in
    Azure kernel (LP: #1821378)
    - Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels
    - Drivers: hv: vmbus: Check for ring when getting debug info

  * [linux-azure] Commit To Improve NVMe Performance (LP: #1819689)
    - blk-mq: remove the request_list usage

  [ Ubuntu: 4.15.0-49.53 ]

  * linux: 4.15.0-49.53 -proposed tracker (LP: #1826358)
  * Backport support for software count cache flush Spectre v2 mitigation. (CVE)
    (required for POWER9 DD2.3) (LP: #1822870)
    - powerpc/64s: Add support for ori barrier_nospec patching
    - powerpc/64s: Patch barrier_nospec in modules
    - powerpc/64s: Enable barrier_nospec based on firmware settings
    - powerpc...

Changed in linux-azure (Ubuntu Xenial):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (13.9 KiB)

This bug was fixed in the package linux-azure - 4.15.0-1045.49~14.04.1

---------------
linux-azure (4.15.0-1045.49~14.04.1) trusty; urgency=medium

  [ Ubuntu: 4.15.0-1045.49 ]

  * [linux-azure] Storage performance drop on RAID (LP: #1828248)
    - Revert "blk-mq: remove the request_list usage"
  * CVE-2018-12126 // CVE-2018-12127 // CVE-2018-12130
    - Documentation/l1tf: Fix small spelling typo
    - x86/cpu: Sanitize FAM6_ATOM naming
    - kvm: x86: Report STIBP on GET_SUPPORTED_CPUID
    - locking/atomics, asm-generic: Move some macros from <linux/bitops.h> to a
      new <linux/bits.h> file
    - tools include: Adopt linux/bits.h
    - x86/msr-index: Cleanup bit defines
    - x86/speculation: Consolidate CPU whitelists
    - x86/speculation/mds: Add basic bug infrastructure for MDS
    - x86/speculation/mds: Add BUG_MSBDS_ONLY
    - x86/kvm: Expose X86_FEATURE_MD_CLEAR to guests
    - x86/speculation/mds: Add mds_clear_cpu_buffers()
    - x86/speculation/mds: Clear CPU buffers on exit to user
    - x86/kvm/vmx: Add MDS protection when L1D Flush is not active
    - x86/speculation/mds: Conditionally clear CPU buffers on idle entry
    - x86/speculation/mds: Add mitigation control for MDS
    - x86/speculation/mds: Add sysfs reporting for MDS
    - x86/speculation/mds: Add mitigation mode VMWERV
    - Documentation: Move L1TF to separate directory
    - Documentation: Add MDS vulnerability documentation
    - x86/speculation/mds: Add mds=full,nosmt cmdline option
    - x86/speculation: Move arch_smt_update() call to after mitigation decisions
    - x86/speculation/mds: Add SMT warning message
    - x86/speculation/mds: Fix comment
    - x86/speculation/mds: Print SMT vulnerable on MSBDS with mitigations off
    - x86/speculation/mds: Add 'mitigations=' support for MDS
  * CVE-2017-5715 // CVE-2017-5753
    - s390/speculation: Support 'mitigations=' cmdline option
  * CVE-2017-5715 // CVE-2017-5753 // CVE-2017-5754 // CVE-2018-3639
    - powerpc/speculation: Support 'mitigations=' cmdline option
  * CVE-2017-5715 // CVE-2017-5754 // CVE-2018-3620 // CVE-2018-3639 //
    CVE-2018-3646
    - cpu/speculation: Add 'mitigations=' cmdline option
    - x86/speculation: Support 'mitigations=' cmdline option
  * Packaging resync (LP: #1786013)
    - [Packaging] resync git-ubuntu-log

linux-azure (4.15.0-1044.48~14.04.1) trusty; urgency=medium

  * linux-azure: 4.15.0-1044.48~14.04.1 -proposed tracker (LP: #1826352)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync git-ubuntu-log
    - [Packaging] update helper scripts

  [ Ubuntu: 4.15.0-1044.48 ]

  * linux-azure: 4.15.0-1044.48 -proposed tracker (LP: #1826354)
  * [linux-azure] Include mainline commits fc96df16a1ce and ba50bf1ce9a5 in
    Azure kernel (LP: #1821378)
    - Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channels
    - Drivers: hv: vmbus: Check for ring when getting debug info
  * [linux-azure] Commit To Improve NVMe Performance (LP: #1819689)
    - blk-mq: remove the request_list usage
  * linux: 4.15.0-49.53 -proposed tracker (LP: #1826358)
  * Backport support for software count cache flush Spectre v2 mitigation. (CVE)
    (required for POWER9 DD...

Changed in linux-azure (Ubuntu):
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers