[Lenovo ThinkPad W530] I/O slow down

Bug #1290337 reported by Michisteiner
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)

Bug Description

Since at least 12.10, symptoms are that as time goes by I/O gets slower and slower (primarily if not even only writes rather than reads). It can be temporarily cured via /sbin/sysctl -w vm.drop_caches=2 and/or /sbin/sysctl -w vm.drop_caches=3 but the problem will re-appear again sooner rather than later. The problem seems also somewhat proportional on how much RAM is allocated. It does not seem to appear with less than 8GB (i.e., i saw it first when switching to a 16GB laptop/ThinkPad w530) and is considerably worse when allowing all memory compared to, e.g., by using only ~14GB using the mem=14000M boot option.

ProblemType: Bug
DistroRelease: Ubuntu 13.10
Package: linux-image-3.11.0-18-generic 3.11.0-18.32
ProcVersionSignature: Ubuntu 3.11.0-18.32-generic
Uname: Linux 3.11.0-18-generic i686
NonfreeKernelModules: nvidia wl
ApportVersion: 2.12.5-0ubuntu2.2
Architecture: i386
 /dev/snd/controlC1: msteiner 4131 F.... pulseaudio
 /dev/snd/controlC0: msteiner 4131 F.... pulseaudio
 /dev/snd/pcmC0D0p: msteiner 4131 F...m pulseaudio
 /dev/snd/seq: timidity 3237 F.... timidity
 country IN:
  (2402 - 2482 @ 40), (N/A, 20)
  (5170 - 5250 @ 40), (N/A, 20)
  (5250 - 5330 @ 40), (N/A, 20), DFS
  (5735 - 5835 @ 40), (N/A, 20)
Date: Mon Mar 10 17:47:12 2014
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=4a17fffe-b487-4abf-9df8-1846a925cf4b
MachineType: LENOVO 2449A35
MarkForUpload: True
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.11.0-18-generic root=/dev/mapper/babbage2-root ro acpi_osi=Linux mem=14000M nox2apic rcutree.rcu_idle_gp_delay=1
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
 linux-restricted-modules-3.11.0-18-generic N/A
 linux-backports-modules-3.11.0-18-generic N/A
 linux-firmware 1.116.2
SourcePackage: linux
UpgradeStatus: Upgraded to saucy on 2013-10-25 (135 days ago)
dmi.bios.date: 05/24/2013
dmi.bios.vendor: LENOVO
dmi.bios.version: G5ET93WW (2.53 )
dmi.board.asset.tag: Not Available
dmi.board.name: 2449A35
dmi.board.vendor: LENOVO
dmi.board.version: Not Defined
dmi.chassis.asset.tag: No Asset Information
dmi.chassis.type: 10
dmi.chassis.vendor: LENOVO
dmi.chassis.version: Not Available
dmi.modalias: dmi:bvnLENOVO:bvrG5ET93WW(2.53):bd05/24/2013:svnLENOVO:pn2449A35:pvrThinkPadW530:rvnLENOVO:rn2449A35:rvrNotDefined:cvnLENOVO:ct10:cvrNotAvailable:
dmi.product.name: 2449A35
dmi.product.version: ThinkPad W530
dmi.sys.vendor: LENOVO

Revision history for this message
Michisteiner (michisteiner) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Revision history for this message
Michisteiner (michisteiner) wrote : Re: I/O slow down on 32-bit PAE systems

btw: do not bother looking at the ``nox2apic rcutree.rcu_idle_gp_delay=1'' boot options. i added them lately trying to overcome some nvidia-related problems. The problem here exists when these options were not specified ...

Brad Figg (brad-figg)
Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.14 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14-rc5-trusty/

Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

A bisect was performed in bug 1107150, but the bad commit was never identified. After testing the mainline kernel, we can also perform a bisect and may have better luck identifying the cause of this regression.

penalvch (penalvch)
summary: - I/O slow down on 32-bit PAE systems
+ [Lenovo ThinkPad W530] I/O slow down
description: updated
tags: added: bios-outdated-2.57 needs-upstream-testing quantal
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Michisteiner (michisteiner) wrote :

@joseph: as i replied to a similar request of yours in bug 1107150 (comment 66), i can't find the PAE kernels, e.g., the http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.14-rc5-trusty/ you point out does not seem to contain any PAE kernels (at least if still the old naming conventions apply).

penalvch (penalvch)
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

There is no longer a non-pae kernel after Precise, so the pae kernel is the same as the -generic kernel now.

Revision history for this message
Michisteiner (michisteiner) wrote :

@Joseph, thanks for the info and i could install v3.14-rc7-trusty (although i had to switch from nvidia to integrated graphics as nvidia didn't seem to work anymore).

The experience is somewhat mixed so far. In the half-day i'm running it i haven't seen really bad behaviour (no multi-minute apt-get updates) but machine still feels more sluggish than it seems to be when running with old kernel and reduced memory.

And there certainly seems to be still some anomalies: After booting, logging in X and letting the machine's I/O system settle down to idle (according to htop and iotop) i run repeatedly the following command on the otherwise idle machine:

   dd if=/dev/zero of=/tmp/disk-test bs=1M count=1k

As you can see ...

   1073741824 bytes (1.1 GB) copied, 13.5484 s, 79.3 MB/s
   1073741824 bytes (1.1 GB) copied, 14.7879 s, 72.6 MB/s
   1073741824 bytes (1.1 GB) copied, 16.2955 s, 65.9 MB/s
   1073741824 bytes (1.1 GB) copied, 17.5495 s, 61.2 MB/s
   1073741824 bytes (1.1 GB) copied, 19.267 s, 55.7 MB/s
   1073741824 bytes (1.1 GB) copied, 18.9594 s, 56.6 MB/s
   1073741824 bytes (1.1 GB) copied, 19.6092 s, 54.8 MB/s
   1073741824 bytes (1.1 GB) copied, 20.5099 s, 52.4 MB/s
   1073741824 bytes (1.1 GB) copied, 22.3197 s, 48.1 MB/s
   1073741824 bytes (1.1 GB) copied, 20.8351 s, 51.5 MB/s
   1073741824 bytes (1.1 GB) copied, 27.6185 s, 38.9 MB/s
   1073741824 bytes (1.1 GB) copied, 39.5405 s, 27.2 MB/s
   1073741824 bytes (1.1 GB) copied, 55.6986 s, 19.3 MB/s
   1073741824 bytes (1.1 GB) copied, 74.4439 s, 14.4 MB/s
   1073741824 bytes (1.1 GB) copied, 117.431 s, 9.1 MB/s
   1073741824 bytes (1.1 GB) copied, 142.33 s, 7.5 MB/s

... the performance got worse and worse. Looking at `iostat -zmx 3' I/O utilization went down to less than 10% in the last iterations.

However, as soon as i run a single

  /sbin/sysctl -w vm.drop_caches=2

the dd went up again to 90 MB/s!!

So i would be tempted to add the 'kernel-bug-exists-upstream' tag. Thoughts?

Revision history for this message
Michisteiner (michisteiner) wrote :

another observation: i also run a few time apt-get update as others have used this as a benchmark. While not as slow as others have reported in bug 1107150, i did notice slow package reads of multiple minutes whereas a preceeding /sbin/sysctl -w vm.drop_caches=3 could drop it to <20 seconds. Interestingly, though, in this case in both cases iostat reported 100% utilization during the package read!

Revision history for this message
Michisteiner (michisteiner) wrote :

yet another observation: yesterday i run all day with the standard 3.11 kernel with all 16GB RAM. At the end of the day i also got in the state where apt-get update took minutes (tens of them). Interestingly, also, the vm.cache_drop trick did not seem to have any speed-up effect on it. Also note that io-activity accoding to iostat was 100% even though it was so ridiculously slow (and no other noticable activity, e.g., according to iotop).

Having finally found an (ugly way) to get nvidia working again i run 3.14 again since yesterday evening. And i do see now the apt-get update anomaly (9 minutes for package read) also on 13.14. During that time according iostat the avgrq-size seemed also to be below 10; avgqu-size fluctuated between 5 and 20 and utilization 20-60%. doing dd measurements as above also was slow (<15MB/s) and this time vm.cache_drop trick did not have any positive impact here, contrary to the experiment reported earlier where a /sbin/sysctl -w vm.drop_caches=2 sped it up to a "normal" 90MB/s.

BTW: just in case it is relevant, i use a LUKS encrypted LVM with ext4 FS on the encrypted partitions.

tags: added: kernel-bug-exists-upstream
removed: needs-upstream-testing
Revision history for this message
Michisteiner (michisteiner) wrote :

fyi: problem also still exists after upgrading to 14.04 (& 3.13.0-24)

penalvch (penalvch)
tags: added: trusty
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: bios-outdated-2.58
removed: bios-outdated-2.57
Revision history for this message
Michisteiner (michisteiner) wrote :

I'm already running with the 2.58 ever since i upgraded to 14.04 four weeks ago. BTW: Given that the problem also appeared on quite different hardware (bug 1107150) i doubt that it has anything to do with BIOS.

Revision history for this message
Michisteiner (michisteiner) wrote :

# dmidecode -s bios-version ; dmidecode -s bios-release-date
G5ET98WW (2.58 )

penalvch (penalvch)
tags: added: latest-bios-2.58
removed: bios-outdated-2.58
Revision history for this message
Michisteiner (michisteiner) wrote :

# uname -a
Linux babbage2 3.15.0-031500rc5-generic #201405091635 SMP Fri May 9 20:57:05 UTC 2014 i686 i686 i686 GNU/Linux

unfortunately, problem still exists, e.g., after half a day apt-get update takes minutes to update package list .. :-(

Revision history for this message
penalvch (penalvch) wrote :

Michisteiner, just to clarify, did this problem not occur in a release prior to Quantal?

tags: added: kernel-bug-exists-upstream-3.15-rc5
removed: kernel-bug-exists-upstream
Revision history for this message
Michisteiner (michisteiner) wrote :

@Christopher. Sorry, can't answer that question: i never run anything earlier than quantal on this hardware. On the previous laptop (thinkpad w500) where i run many earlier versions of ubuntu (including quantal for a few weeks) i did not notice that problem but then that laptop did have only 8GB of RAM (compared to the 16GB of the w530) on this one and as mentioned in problem description (and also as mentioned in bug 1107150) the problem seems to appear only as you get more allocated RAM. E.g., after the initial test with 3.15-rc5 and the relative quick arise of the problem, i booted with mem=15000M (resulting in 14362796 total mem instead of 16xxxxxx) and so far i've seen the problem less and could (at least temporarily) resolve it using the sysctl drop_caches hack (something which did not work anymore for the previous run with all RAM enabled)

Revision history for this message
penalvch (penalvch) wrote :

Michisteiner, for regression testing purposes, could you please test 12.04.0 via http://old-releases.ubuntu.com/releases/12.04.0/ and advise to the results?

Revision history for this message
Michisteiner (michisteiner) wrote :

@Christopher, sorry this is my main computer, so i cannot re-install from scratch (also wouldn't really have the time for this and also lack a separate disk). I assume, just running with old kernels probably won't work given that they are reasonably old? (otherwise, can you point me to a kernel package).

BTW: if there is any good tool for kernel-state diagnostics or alike, i gladly would investigate what exactly is causing the slow-down when it happens, but right now i don't really know where to look at (i looked at and played around with obvious sysctl parameters, looked at slaptop, iotop, iostat etc but couldn't get any insights. Maybe something dtrace-like such as systemtap could give further insights but i would still need some starting points on where in kernel code/data-structures i might look best.

Revision history for this message
penalvch (penalvch) wrote :

Michisteiner, testing a live environment would be fine.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.