zone_reclaim_mode=1 reduces performance

Bug #605773 reported by Andras Fabian
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: linux-image-server

--------------------------------------------------
Description: Ubuntu 10.04 LTS
Release: 10.04
--------------------------------------------------
linux-image-server version:
  Installed: 2.6.32.22.23
--------------------------------------------------

WORKAROUND: setting:
zone_reclaim_mode = 0

The background of this problem is - or how I discovered it - a migration of PostgreSQL database server from old hardware+old OS to a new hardware+new OS. Transition was no problem, but after we started using the server in production, we discovered a strange problem during nightly backups. The runtime of the backups went up from 2 1/2 hours to 6 1/2 hours (despite the fact, that the new hardware was designed to have much more power ... which positively showed up in most other tasks!).

A longer research of the issue using the knowledge of many helpful guys on the PostgreSQL mailing list finally helped to find the reason for this slow down. It turned out to be a problem around the VM part of the kernel! Under some situations, where a lot of memory - for caching purposes - was consumed (which easily happens while backing up 100 GByte DBs), a congestion happened in the VM which slowed down the process dramatically.

In depth analysis of many parts (vie /proc file system, ps, strace etc.) and comparing with settings on the old machines, I finally found an essential kernel setting, vm.zone_reclaim_mode, that was solely responsible for the issue. Luckily I could construct a simple test scenario (COPY-to-STDOU - exporting the data from a database table via stdout ... and writing this via pipe to the file system) where I could reproduce the issue. Our server had the value zone_reclaim_mode = 1 set, whereas our old servers used zone_reclaim_mode = 0. By switching (via sysctl) this values back and forth, I could easily bring down the experimental export process to crouching speed, or let it run again.

The complete path of the analysis can be viewed at the PostgreSQL mailing list here:
(there ia also a description, how the problem can be reproduced, and what the many symptoms are)
http://archives.postgresql.org/pgsql-general/2010-07/msg00267.php

Now, the conclusion to use "zone_reclaim_mode = 0" on our type of hardware was further strengthened by a very interesting thread at LKML, where the kernel developer discussed potential issues with this setting. You can read it here:
http://lkml.org/lkml/2009/5/12/586

That discussion boils down to the fact, that for some reasons (described there in detail), the Linux kernel thinks on modern CPU architectures (out new Servers use Core i7 generation CPUs which are explicitly mentioned!) that it has a NUMA architecture. And for NUMA architectures it automatically enables "zone_reclaim_mode = 1" ... even though it is wrong, and not even recommended under many circumstances. Interestingly, even most posters at the LKML thread think, that it would be better to always(!) default this value to "zone_reclaim_mode = 0" instead of some automatic decision.

Some more detail on what zone_reclaim_mode does can also be found here:
http://www.linuxinsight.com/proc_sys_vm_zone_reclaim_mode.html

Now, I don't know why this "defaulting to 0" is still not in the mainline kernels. That discussion from May 2009 at LKML died down, and obviously no one feeled responsible to commit the patches (even though, obvioulsy one of the guys had already prepared some!). BUT, I would ask the Ubuntu team, to maybe act on their own and provide a way in the Ubuntu 10.04 LTS to fix this issue (because, some reports on the net suggest, that "zone_reclaim_mode = 1" can do harm to performance in many ways)! And I believe, that I will not be the only PostgreSQL admin being affected by this issue!

Brad Figg (brad-figg)
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in dianosing the problem. From a terminal window please run:

apport-collect 605773

and then change the status of the bug back to 'New'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Andras Fabian (fabian-atrada) wrote : Re: Wrong kernel setting zone_reclaim_mode leads to performance problems

The issue is a year old, and as such the requested apport-collect can't be created anymore. Also, it doesn't needs an extra diagnosis, as this has already happened on our side, and we are recommending a fix (which is backed by discussions in other communities - like the linked PostgreSQL mailing list link etc.).

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
penalvch (penalvch)
tags: added: lucid
Revision history for this message
penalvch (penalvch) wrote :

Andras Fabian, this bug was reported a while ago and there hasn't been any activity in it recently. We were wondering if this is still an issue? If so, could you please run the following command in the development release from a Terminal (Applications->Accessories->Terminal), as it will automatically gather and attach updated debug information to this report:

apport-collect -p linux <replace-with-bug-number>

When reporting bugs in the future please use apport by using 'ubuntu-bug' and the name of the package affected. You can learn more about this functionality at https://wiki.ubuntu.com/ReportingBugs.

Also, during your maintenance window, could you please test the latest upstream kernel available following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Please do not test the daily folder, but the one all the way at the bottom. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.11-rc5

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

If you are unable to test the mainline kernel, please comment as to why specifically you were unable to test it and add the following tags:
kernel-unable-to-test-upstream
kernel-unable-to-test-upstream-VERSION-NUMBER

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: needs-kernel-logs needs-upstream-testing
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Andras Fabian (fabian-atrada) wrote :

Hi,

Well, the problem was "solved" with the described workaround (setting "zone_reclaim_mode = 0" manually). Since then, we never again had this issue. To be honest I was even surprised to see new activity on this issues (as I have reported it over 3 years ago :-) ) and really had to read it before remembering what was going on at all.

The affected servers are still in use, but usually they have no real maintenance windows (in the classic sense), neither do we have time and resources to experiment with different kernel versions on them.

Neither do I see a way to sanely run apport-collect, as the server has no X, and the text gui is really hard to work with (for example I didn't find a way to supply the necessary email address ... etc.)

Andras Fabian

Revision history for this message
penalvch (penalvch) wrote :

Andras Fabian, one could gather the apport via command line following https://help.ubuntu.com/community/ReportingBugs#Filing_bugs_when_off-line .

description: updated
Revision history for this message
Andras Fabian (fabian-atrada) wrote :

Well, I have tried apport-cli, but that tool crashed with some ominous "IOError: [Errno 2] No such file or directory: '/proc/asound/cards'" (which is in no way related to this bug). Also it asks lots of questions which I could have easily answered 3 years ago, but now need to research again ...

To be honest, I have no big motivation to pick up the investigation of an issue which is 3 years old, and I have not thought about nor worked on for almost the same amount of time. Especially as we have only a few Ubuntu 10.04 servers left, which are phased out and - mostly - being replaced with 12.04 systems.

Andras Fabian

Revision history for this message
penalvch (penalvch) wrote :

Andras Fabian, fair enough. Would this problem be reproducible (addressed by your previously mentioned WORKAROUND) in Precise?

Revision history for this message
Andras Fabian (fabian-atrada) wrote :

I think this mostly depends on what "cat /proc/sys/vm/zone_reclaim_mode" is set to by default on Ubuntu 12.04 ... If you have it at 1, then you should be able to reproduce it ... at least in theory ... because possible changes to the kernel might have already changed the behavior too.

Here we have added it to our standard installation procedure, to always and explicitly set cat "zone_reclaim_mode=0".

penalvch (penalvch)
summary: - Wrong kernel setting zone_reclaim_mode leads to performance problems
+ zone_reclaim_mode=1 reduces performance
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.