Storage performance regression when Xen backend lacks persistent-grants support

Bug #1319003 reported by Felipe Franciosi
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Won't Fix
Medium
Joseph Salisbury
Saucy
Won't Fix
Medium
Joseph Salisbury

Bug Description

Description of problem:
When used as a Xen guest, Ubuntu 13.10 may be slower than older releases in terms of storage performance. This is due to the persistent-grants feature introduced in xen-blkfront on the Linux Kernel 3.8 series. From 3.8 to 3.12 (inclusive), xen-blkfront will add an extra set of memcpy() operations regardless of persistent-grants support in the backend (i.e. xen-blkback, qemu, tapdisk). Many Xen dom0s do not have backend persistent-grants support (such as Citrix XenServer and any Linux distro with Kernel prior to 3.8). This has been identified and fixed in the 3.13 kernel series [1], but was not backported to previous LTS kernels due to the nature of the bug (performance only).

While persistent grants reduce the stress on the Xen grant table and allow for much better aggregate throughput (at the cost of an extra set of memcpy() operations), adding the copy overhead when the feature is unsupported on the backend combines the worst of both worlds. This is particularly noticeable when intensive storage workloads are active from many guests.

The graphs attached show storage throughput numbers for Linux guests using kernel 3.12.9 (Graph 1) and 3.13.7 (Graph 2) running on a Citrix XenServer development build. The server had 4 storage repositories (SRs) with 1 Micron P320 SSD per SR (i.e. 10 VMs per SR means 40 VMs in total). When using 3.12.9 kernel, the regression is clearly visible for more than 2 VMs per SR and block sizes larger than 64 KiB. The workload consisted of sequential reads on pre-allocated raw LVM logical volumes.

[1] Commits by Roger Pau Monné:
    bfe11d6de1c416cea4f3f0f35f864162063ce3fa
    fbe363c476afe8ec992d3baf682670a4bd1b6ce6

Version-Release number of selected component (if applicable):
xen-blkfront of Linux kernel 3.11

How reproducible:
This is always reproducible when a Ubuntu 13.10 guest is running on Xen and the storage backend (i.e. xen-blkback, qemu, tapdisk) does not have support for persistent grants.

Steps to Reproduce:
1. Install a Xen dom0 running a kernel prior to 3.8 (without persistent-grants support).
2. Install a set of Ubuntu 13.10 guests (which uses kernel 3.11).
3. Measure aggregate storage throughput from all guests.

NOTE: The storage infrastructure (e.g. local SSDs, network-attached storage) should not be a bottleneck in itself. If tested on a single SATA disk, for example, the issue will probably be unnoticeable as the infrastructure will be limiting response time and throughput.

Actual results:
Aggregate storage throughput will be lower than with a xen-blkfront versions prior to 3.8 or newer than 3.12.

Expected results:
Aggregate storage throughput should be at least as good or better than previous (or newer) versions of Ubuntu in cases where the backend doesn't support persistent grants.

Additional info:
Given that this is fixed on newer kernels, we urge that a backport of the relevant patches to the 3.11 stable branch is requested. According to the rules in: https://www.kernel.org/doc/Documentation/stable_kernel_rules.txt, the patches would be accepted on the grounds of:

- Serious issues as reported by a user of a distribution kernel may also
   be considered if they fix a notable performance or interactivity issue.
   As these fixes are not as obvious and have a higher risk of a subtle
   regression they should only be submitted by a distribution kernel
   maintainer and include an addendum linking to a bugzilla entry if it
   exists and additional information on the user-visible impact.

Revision history for this message
Felipe Franciosi (felipe-1) wrote :
Revision history for this message
Felipe Franciosi (felipe-1) wrote :
description: updated
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1319003/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
description: updated
affects: ubuntu → linux-meta (Ubuntu)
Brad Figg (brad-figg)
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1319003

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Felipe Franciosi (felipe-1) wrote :

I am unable to collect logs with apport-collect as I do not have a set of Ubuntu guests setup for repro at the moment.
I also believe logs are not relevant for this particular case.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
tags: added: kernel-da-key saucy
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Triaged
Changed in linux (Ubuntu Saucy):
importance: Undecided → Medium
status: New → Triaged
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Saucy test kernel with a cherry pick of commits:
  bfe11d6de1c416cea4f3f0f35f864162063ce3fa
  fbe363c476afe8ec992d3baf682670a4bd1b6ce6

The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1319003/

Can you test this kernel and confirm if it resolves this bug or not?

Changed in linux (Ubuntu Saucy):
status: Triaged → In Progress
Changed in linux (Ubuntu):
status: Triaged → In Progress
assignee: nobody → Joseph Salisbury (jsalisbury)
Changed in linux (Ubuntu Saucy):
assignee: nobody → Joseph Salisbury (jsalisbury)
Revision history for this message
Felipe Franciosi (felipe-1) wrote :

Thanks for the prompt attention to this matter. My environment for testing is currently busy, but I should be able to get access by the end of next week (23/May). I'll comment as soon as I have some results.

Revision history for this message
Felipe Franciosi (felipe-1) wrote :

Joseph, apologies for the delay on this. I have finally managed to retest the kernel you have provided.

For reference, I added the following disks to my server:
* 4 x Micron P320
* 2 x Intel 910s (presented as 4 separate SCSI devices)
* 1 x Fusion-io ioDrive2

I created 10 LVM Logical Volumes on each one of these 9 devices.
Next, I created 10 Saucy 64bit VMs (each with 2 vCPUs and 512 MB RAM) and assigned one LV from each device to it.

Each data point in the graphs I am attaching now correspond to the aggregate throughput of all LVs per VM.

Your kernel (indicated by Ubuntu 13.10 x86_64 + Backports) allows the VMs to reach the 7 GB/s mark while the kernel without the backports will not go past the 5 GB/s mark.

I hope this confirms that these patches are essential and that you can include them as soon as possible.

Regards,
Felipe

Revision history for this message
Felipe Franciosi (felipe-1) wrote :
Revision history for this message
Felipe Franciosi (felipe-1) wrote :
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I sent an SRU request to the Ubuntu kernel team mailing list. However, since Saucy will soon be EOL, we are only accepting critical CVE patches.

It might be best to upgrade to Trusty, which should already have this fix.

Changed in linux (Ubuntu Saucy):
status: In Progress → Won't Fix
Changed in linux (Ubuntu):
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.