Ubuntu

Buffer I/O error on device zram0

Reported by Dan Muresan on 2013-08-27
112
This bug affects 20 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Undecided
Unassigned
linux-lts-raring (Ubuntu)
Undecided
Unassigned

Bug Description

This is on linux-image-generic-lts-raring (which I recently upgraded to 3.8.0-29; the problem didn't occur before upgrading AFAIR)

I see this junk in my dmesg when creating / formatting a zram swap, and then periodically. Then at some point after heavy swapping incidents, the machine locks up. "Correlation does not imply causation, but it does waggle its eyebrows suggestively" and all that.

---

zram: module is from the staging directory, the quality is unknown, you have been warned.
zram: num_devices not specified. Using default: 1
zram: Creating 1 devices
Adding 1031648k swap on /dev/zram0. Priority:100 extents:1 across:1031648k SS
...
Buffer I/O error on device zram0, logical block 257912 [repeated 10 times]
...
BUG: scheduling while atomic: rsyslogd/1439/0x00000001
Pid: 1439, comm: rsyslogd Tainted: PF WC O 3.8.0-29-generic #42~precise1-Ubuntu
[20931.796635] Pid: 1439, comm: rsyslogd Tainted: PF WC O 3.8.0-29-generic #42~precise1-Ubuntu
[20931.796638] Call Trace:
[20931.796649] [<c1619449>] __schedule_bug+0x52/0x5e
[20931.796653] [<c162c265>] __schedule+0x575/0x5f0
[20931.796660] [<f8439e06>] ? zram_make_request+0xe6/0x100 [zram]
[20931.796666] [<c111c38d>] ? release_pages+0x18d/0x1c0
[20931.796669] [<c162c573>] schedule+0x23/0x60
[20931.796673] [<c162d0fd>] rwsem_down_failed_common+0x9d/0xf0
[20931.796677] [<c162d162>] rwsem_down_write_failed+0x12/0x20
[20931.796681] [<c12f809a>] call_rwsem_down_write_failed+0x6/0x8
[20931.796685] [<c162ba44>] ? down_write+0x24/0x30
[20931.796689] [<f8439169>] zram_slot_free_notify+0x29/0x50 [zram]
[20931.796693] [<f8439140>] ? zram_stat64_inc+0x30/0x30 [zram]
[20931.796700] [<c11460bc>] swap_entry_free+0xdc/0x170
[20931.796703] [<c162d100>] ? rwsem_down_failed_common+0xa0/0xf0
[20931.796708] [<c1146458>] swap_free+0x28/0x40
[20931.796712] [<c1134ba0>] do_swap_page+0x390/0x6f0
[20931.796717] [<c10180f8>] ? sched_clock+0x8/0x10
[20931.796721] [<c11365aa>] handle_pte_fault+0x21a/0x2b0
[20931.796726] [<c10451e1>] ? kmap_atomic_prot+0xf1/0x120
[20931.796730] [<c113746a>] handle_mm_fault+0x1fa/0x2d0
[20931.796735] [<c1630ad0>] __do_page_fault+0x190/0x4f0
[20931.796740] [<c12f63bc>] ? sprintf+0x1c/0x20
[20931.796744] [<c104c132>] ? print_time.part.4+0x82/0xc0
[20931.796748] [<c1630e30>] ? __do_page_fault+0x4f0/0x4f0
[20931.796752] [<c1630e3d>] do_page_fault+0xd/0x10
[20931.796756] [<c162dc17>] error_code+0x67/0x6c
[20931.796759] [<c12f8716>] ? __copy_to_user_ll+0x46/0x70
[20931.796763] [<c12f8970>] copy_to_user+0x40/0x60
[20931.796767] [<c104e218>] syslog_print+0xc8/0x210
[20931.796770] [<c104eae6>] do_syslog+0x206/0x390
[20931.796775] [<c106d570>] ? add_wait_queue+0x50/0x50
[20931.796780] [<c11c3680>] ? kmsg_poll+0x50/0x50
[20931.796784] [<c11c36d0>] kmsg_read+0x50/0x60
[20931.796788] [<c11b6584>] proc_reg_read+0x64/0xa0
[20931.796793] [<c116469c>] vfs_read+0x8c/0x160
[20931.796797] [<c10a898d>] ? sys_futex+0xed/0x130
[20931.796801] [<c11b6520>] ? proc_reg_write+0xa0/0xa0
[20931.796805] [<c11647b7>] sys_read+0x47/0x80
...
[30855.181542] Write-error on swap-device (251:0:131064)
[43457.030155] Write-error on swap-device (251:0:131064)
[43705.090381] Buffer I/O error on device zram0, logical block 16383

Anyway, there's a couple of lkml messages related to this "Buffer I/O error on device zram0", please investigate

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: linux-image-3.8.0-29-generic 3.8.0-29.42~precise1
ProcVersionSignature: Ubuntu 3.8.0-29.42~precise1-generic 3.8.13.5
Uname: Linux 3.8.0-29-generic i686
NonfreeKernelModules: nvidia
ApportVersion: 2.0.1-0ubuntu17.4
Architecture: i386
Date: Tue Aug 27 07:33:37 2013
EcryptfsInUse: Yes
MarkForUpload: True
ProcEnviron:
 TERM=rxvt
 LC_COLLATE=C
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: linux-lts-raring
UpgradeStatus: No upgrade log present (probably fresh install)
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.8.0-29-lowlatency.
ApportVersion: 2.0.1-0ubuntu17.4
Architecture: i386
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: muresan 29305 F.... pulseaudio
 /dev/snd/pcmC0D0p: muresan 29305 F...m pulseaudio
 /dev/snd/controlC1: muresan 29305 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
Card0.Amixer.info:
 Card hw:0 'Intel'/'HDA Intel at 0xfebfc000 irq 44'
   Mixer name : 'SigmaTel STAC9205'
   Components : 'HDA:838476a0,102801f1,00100204 HDA:14f12c06,14f1000f,00100000'
   Controls : 26
   Simple ctrls : 12
Card1.Amixer.info:
 Card hw:1 'USB'/'E-MU Systems, Inc. E-MU 0202 | USB at usb-0000:00:1d.0-2, full speed'
   Mixer name : 'USB Mixer'
   Components : 'USB041e:3f02'
   Controls : 4
   Simple ctrls : 2
DistroRelease: Ubuntu 12.04
EcryptfsInUse: Yes
HibernationDevice: RESUME=UUID=88836c66-16ea-407f-8753-3190c09ca82f
IwConfig:
 lo no wireless extensions.

 eth0 no wireless extensions.
MachineType: Dell Inc. Inspiron 1520
MarkForUpload: True
NonfreeKernelModules: nvidia
Package: linux-lts-raring
ProcEnviron:
 TERM=rxvt
 LC_COLLATE=C
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-29-lowlatency root=UUID=a7047dd8-61aa-4cbe-b36d-697a5e7ee64b ro
ProcVersionSignature: Ubuntu 3.8.0-29.21-lowlatency 3.8.13.5
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-29-lowlatency N/A
 linux-backports-modules-3.8.0-29-lowlatency N/A
 linux-firmware 1.79.6
RfKill:
 0: hci0: Bluetooth
  Soft blocked: no
  Hard blocked: no
StagingDrivers: zram
Tags: precise staging
Uname: Linux 3.8.0-29-lowlatency i686
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm admin audio cdrom davfs2 dialout dip floppy fuse lpadmin netdev plugdev powerdev sambashare scanner video
dmi.bios.date: 02/03/2008
dmi.bios.vendor: Dell Inc.
dmi.bios.version: A07
dmi.board.name: 0UW306
dmi.board.vendor: Dell Inc.
dmi.chassis.type: 8
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvrA07:bd02/03/2008:svnDellInc.:pnInspiron1520:pvr:rvnDellInc.:rn0UW306:rvr:cvnDellInc.:ct8:cvr:
dmi.product.name: Inspiron 1520
dmi.sys.vendor: Dell Inc.

Dan Muresan (danmbox) wrote :
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux-lts-raring (Ubuntu):
status: New → Confirmed
variona (variona) wrote :

Affects me with 3.5.0-39-generic Kernel on an AMD64X2 machine.

variona (variona) wrote :

I also experienced the complete machine lock up.
dmesg | grep zram
[ 0.842160] zram: module is from the staging directory, the quality is unknown, you have been warned.
[ 0.842454] zram: num_devices not specified. Using default: 1
[ 0.842457] zram: Creating 1 devices ...
[ 1.015211] Adding 1032592k swap on /dev/zram0. Priority:100 extents:1 across:1032592k SS
[ 20.325220] Buffer I/O error on device zram0, logical block 258148
[ 20.325241] Buffer I/O error on device zram0, logical block 258148
[ 20.325339] Buffer I/O error on device zram0, logical block 258148
[ 20.325355] Buffer I/O error on device zram0, logical block 258148
[ 20.325370] Buffer I/O error on device zram0, logical block 258148
[ 20.325385] Buffer I/O error on device zram0, logical block 258148
[ 20.325400] Buffer I/O error on device zram0, logical block 258148
[ 20.325446] Buffer I/O error on device zram0, logical block 258148
[ 20.325461] Buffer I/O error on device zram0, logical block 258148
[ 20.325483] Buffer I/O error on device zram0, logical block 258148

variona (variona) wrote :

After multiple machine lock downs (no way to ping it) switching back to 3.5.0-37-generic results in:

dmesg | grep zram
[ 0.829615] zram: module is from the staging directory, the quality is unknown, you have been warned.
[ 0.829907] zram: num_devices not specified. Using default: 1
[ 0.829909] zram: Creating 1 devices ...
[ 0.966127] Adding 1032592k swap on /dev/zram0. Priority:100 extents:1 across:1032592k SS

Dan Muresan (danmbox) wrote :

@variona So, no lock-ups after downgrading? You should also test the NEWER kernel WITHOUT enabling zram and see if the crashes still occur. I've run 3.8.0-29 WITHOUT zram for almost a day (not quite enough to give it a clean pass)

Damian Sawicki (damian-sawicki) wrote :

The same here on Kernel Linux 3.2.0-53-generic-pae

dmesg | grep zram
[ 1.324402] zram: module is from the staging directory, the quality is unknown, you have been warned.
[ 1.324654] zram: num_devices not specified. Using default: 1
[ 1.324656] zram: Creating 1 devices ...
[ 1.443281] Adding 2060268k swap on /dev/zram0. Priority:100 extents:1 across:2060268k SS
[ 25.531955] Buffer I/O error on device zram0, logical block 515067
[ 25.531974] Buffer I/O error on device zram0, logical block 515067
[ 25.532060] Buffer I/O error on device zram0, logical block 515067
[ 25.532076] Buffer I/O error on device zram0, logical block 515067
[ 25.532091] Buffer I/O error on device zram0, logical block 515067
[ 25.532107] Buffer I/O error on device zram0, logical block 515067
[ 25.532122] Buffer I/O error on device zram0, logical block 515067
[ 25.532158] Buffer I/O error on device zram0, logical block 515067
[ 25.532174] Buffer I/O error on device zram0, logical block 515067
[ 25.532193] Buffer I/O error on device zram0, logical block 515067

Dan Muresan (danmbox) wrote :

@damian-sawicki: do you also get lock-ups?

Damian Sawicki (damian-sawicki) wrote :

@danmbox Yes. I don't know if they are related to the above logs (often displayed on boot - btw. for some time I don't have ubuntu logo and animation on boot, but just text "Ubuntu 12.04" and 4 sparkling dots), but both - errors about zram0 and lock-ups - started recently.

Dan Muresan (danmbox) wrote :

@damian-sawicki: a lot of people have zram due to the zram-config package, which you must remove with dpkg --purge zram-config (otherwise it won't go away completely). But in my case zram *still* starts up, and I can't track what sets it up. I think the easy solution is to break the zram modules (e.g. rename zram.ko to zram.ko.save in the appropriate /lib/modules directory)

Of course, it would be great to get working zram again.

Dan Muresan (danmbox) wrote :

Bug #1218278 might be a duplicate (though it doesn't talk about lock-ups). Apparently this is fixed in 3.11, though I don't see how this helps with raring and linux-lts-raring.

@danmbox: Still having the issue after dpkg --purge. I'll try your method with breaking the modules.

Dan Muresan (danmbox) wrote :

The way to list your swaps is swapon -s. If zram0 is listed, something still enables it.

@danmbox: It is. So another fix is just swapoff, right? It doesn't help with error messages on boot, but should prevent lock-ups when system is already on.

Dan Muresan (danmbox) wrote :

@damian-sawicki: yes, but once such memory errors DO occur, I think your system is basically compromised; you don't know when it will crash (*) and you don't know what else gets corrupted before the lock-up (why not filesystem errors, security etc?). That's why I would cut this at the root.

(*) In practice, of course, the system seems to actually crash when under heavy swapping the erroneous block gets used (possibly when the page is swapped back into memory?) But this is like overclocking, you never know what parts of the system might break.

We need to figure out why there aren't more people affected by this issue and how to get developers involved. Perhaps we should mark this as a security problem (because memory errors usually DO lead to security problems, even if writing an exploit could be hard)

@all:
after today's updating kernel 3.2.0.52 to 3.2.0.53 there is this message (about 5 times) in syslog:

kernel: [ 30.459194] Buffer I/O error on device zram0, logical block 480744
...

starting the old kernel 3.2.0.52 shows no message like this.

@all: sorry, i forgot my

DistroRelease: Ubuntu 12.04 LTS

@all:
 dmesg | grep zram
[ 1.820314] zram: module is from the staging directory, the quality is unknown, you have been warned.
[ 1.820596] zram: num_devices not specified. Using default: 1
[ 1.820643] zram: Creating 1 devices ...
[ 1.864606] Adding 1922976k swap on /dev/zram0. Priority:100 extents:1 across:1922976k SS
[ 30.459194] Buffer I/O error on device zram0, logical block 480744
[ 30.459234] Buffer I/O error on device zram0, logical block 480744
[ 30.459313] Buffer I/O error on device zram0, logical block 480744
[ 30.459349] Buffer I/O error on device zram0, logical block 480744
[ 30.459384] Buffer I/O error on device zram0, logical block 480744

Steve Dodd (anarchetic) wrote :

Like Hans, I'm seeing this on 12.04 LTS. linux-image-3.2.0-53-generic-pae seems to enable zram by default, which results in lots of "scheduling while atomic" errors in syslog and occasional lockups (machine still responds to ping, but not ssh, desktop frozen.)

This seems like a pretty critical problem now that it's affecting current LTS..

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1217189

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: raring

apport information

tags: added: apport-collected staging
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Ah, I didn't know about apport-collect, I've added my data as Bug #1223273, and marked it as a duplicate ..

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Dan Muresan (danmbox) wrote :

This bug also occurs on linux-lts-quantal, I think (I remember booting to a lts-quantal kernel after seeing the raring bug). There was a backport to zram on both branches that is causing this.

Note that in Bug #1218278, Tim Gardner (timg-tpi) wrote at #19: "There will be a linux-lts-saucy shortly after 13.10 is released."; anyway, I'm marking that one as a duplicate.

meanwhile: rolling back to kernel 3.2.0.52 - can't wait 'til release of 13.10 - error is critical

Luis Henriques (henrix) wrote :

This seems to be a duplicate of bug #1215513. Could the people running Raring try the test kernel here:

http://people.canonical.com/~henrix/lp1215513/

(there are 64bits and 32bits kernels)

The faulty commit has been reverted in all the kernels (including Precise and Quantal).

@henrix Is 3.2.0.54 already bug-free (for 12.04)?

Luis Henriques (henrix) wrote :

Damian, I believe you refer to 3.2.0-54.82, which is the 12.04 kernel currently in the -proposed pocket. This kernel should already contain the fix for this problem.

A simple workaround is to blacklist the zram module, and update your initramfs:

1. create a text file /etc/modprobe.d/blacklist-zram.conf:
blacklist zram

2. enter command
update-initramfs -c -k all

After reboot, the zram module will not be loaded anymore.

You can also temporarily unload the module:

1. list swap devices:
swapon -s

- it should produce output similar to:
Filename Type Size Used Priority
/dev/zram0 partition 4102348 55052 100
/dev/sda9 partition 4112604 0 -1

2. now turn off swapping on zram0:
sudo swapoff /dev/zram0

3. now that module is not used anymore, it can be unloaded:
sudo rmmod zram

The bug seems to affect also

Kubuntu 13.04 - Kernel version 3.8.0-29
Kubuntu 13.04 - Kernel version 3.8.0-30

No problem with 3.8.0-28 kernel...

Oibaf (oibaf) wrote :

I am also having this issue on 3.2.0-53-powerpc-smp . Rather than reverting a commit, there is a patch on LKML: https://lkml.org/lkml/2013/8/12/399

Kenneth Parker (sea7kenp) wrote :

I got this, right after apt-get upgrade (plus direct installs of kernel). The message shows up on bootup. My system has not locked up yet, which is why I prefer this to bug 1215513.

In my case, the Logical Block is 314223.

I did (under root): fdisk /dev/ram0, followed by "p".

This shows up, as if it's a hard disk: 255 heads, 63 sectors/track, 13 cylinders, total 314224 sectors.

Note that total sectors is one higher than the logical block the error message is complaining about.

Is it as simple as to find the script under the ramdisk part of bootup that sets up zram0, and decreasing the number of sectors?

Thank you and best regards,

Kenneth Parker, Seattle, WA

Kenneth Parker (sea7kenp) wrote :

Something to add to prior comment: Ubuntu 12.0.4 Server, running in Text mode (at least for now, as I have not brought up FVWM yet, due to this error. I'm thinking the "hard stop" in 1215513 is because of attempts to run a desktop.

Thank you and best regards,

Kenneth Parker, Seattle, WA

Kenneth Parker (sea7kenp) wrote :

I also posted comments in Bug #1215513. I got a workaround there: "service zram-config stop".

I posted a question about what script (probably in the ramdisk portion of the boot) sets up zram0, so I can possibly edit it. Alternatively, the fix for #1215513 may fix this script, but I'm too much "in production" to try development kernels.

I've signed up for email updates for both this and issue #1215513, so hopefully, I'm covered, when the fix gets to the "stable" 12.0.4 LTS version.

Thank you and best regards,

Kenneth Parker, Seattle, WA

-- I'm a Troubleshooter. I look for Trouble and Shoot it!

Kenneth Parker (sea7kenp) wrote :

I STRONGLY suggest that this is NOT a duplicate of bug #1215513, as a fix was distributed that fixes the "System Locks Up" part, but leaves this error message.

In my first comment, both on this Bug and on Bug #1215513, I responded to the error message by entering "fdisk /dev/ram0", under root, followed by "p", which showed it to never be partitioned, or anything (like a SWAP partition being defined). So it LOOKS like I was never in danger of bug #1215513, but only experiencing the messages, noted in this Bug (#1217189). Even though no "actual harm" is done, through a "pure experience" of Bug # 1217189, the error messages still appear, after "fix" of #1215513, which can CERTAINLY alarm production Linux administrators anywhere, and cause loss of money in any case where the Server is anything but something assisting non-profit volunteer work, where they can be told they need to be patient and wait for their services. (Actually, due to an early message by dac922, I was able to disable zram services on a system that has no intention of using them).

Obviously, what happened was that the Kernel 3.8 work, IMPLEMENTING zram0 swap was "back-ported" to the 3.2 Kernel tree, somewhere <= vmlinuz-3.2.0-53-generic-pae!!! :-O

Remember, my system is Ubuntu 12.0.4 LTS Server, which, in my opinion is not in any condition for "stray zram swap" creation! Fortunately, somebody ported the package zram-config to 12.0.4, allowing me to get it, which, in effect PERMANENTLY DISABLES the zram0 processes, because they fail, during the install of zram-config, allowing "apt-get upgrade" to state that it was not successful in the "install" of this package.

So I suggest that somebody on this forum, in either of these Bug reports (again, I'm too busy administering Linux to normally submit fixes!) put some of this text in some "FAQ file" somewhere,

Forgive my dry humor in this post. As a Linux admin, I'm trained to "roll with the punches" and to consider the humor of life on Planet Earth (and, forgive me for adding) the political system in the USA!!! :-)

Thank you and best regards,

Kenneth Parker, Seattle, WA

-- I'm a Troubleshooter. I look for Trouble and Shoot It! :-)

@sea7kenp I agree with you. @henrix says that kernel 3.2.0.54 for Ubuntu 12.04 should be bug free, but I'm still experiencing the error messages on boot. There have been no lock-ups since the update of the kernel (53->54), though.

After "# fdisk /dev/zram0" followed by "p" [note that I type "zram0" not "ram0"], I get 515068 as a total number of sectors and the "erroneous" block in my case is 515067 - the difference between both numbers being one, just as in your case.

Kenneth Parker (sea7kenp) wrote :

Damian, the "fdisk /dev/ram0" was a typo, because I was NOT in any condition to do "copy/paste" from the system, undergoing the zram0 issue. My actual number is different : 314224, but my "situation" is the same, because the number in the "Buffer I/O Error message" was 314223.

Everyone,

This is NOT just a "asthetic" issue under Ubuntu 12.0.4 LTS as, when I examined my RAM situation, while "fdisk /dev/zram0" gave me "number of sectors", I have less RAM, causing a Significant performance degregation, and MUCH thrashing, as my system over-used the poor "Type 82 SWAP Partition" on the same hard drive as all the other files.

PLEASE re-open this, as an ACTUAL problem, not a duplicate, as I see the "steam roller" effect on issue #1215513, where they appear set to "close" that problem as "fixed, but with an 'ugly' message". If any of the technicions (spelling?) are also on that other issue, PLEASE pass along that ONLY fixing the freeze can adversely affect system performance, at least on some of the Ubuntu Releases.

I though the LTS [12.0.4] release was supposed to be "stable". Like I said in a prior message, I'm "supposed" to be a "big picture manager" who just happens to administer a "semi-production" system. I "happily jump" into "detailed debug" mode, when needed, but am "spread thin".

When I was doing things like you do, the "state of the art" release was Red Hat 6! :-O

Now that I've "dated myself" [premature Senior Citizen], I sincerely hope this issue gets the "fantastic customer support" I used to get, when I could phone the developers and "brainstorm" solutions. I met Linus Torvalds at a San Francisco Linux conference once, but don't remember the year. He was very gracious back then. I hope you all still maintain social skills.

Thank you and best regards,

Kenneth Parker, Seattle, WA

-- I'm a troubleshooter. I look for trouble, and shoot it!

Kenneth Parker (sea7kenp) wrote :

I'm turning myself from a "verbose senior citizen", to an "efficient" system admin, with customer support experience.

Attached to this comment is a short extract of /var/log/kern.log, with only the lines, relevent to this issue (and 1215513).

Reminder that I'm running Ubuntu 12.0.4 LTS Server, with Kernel vmlinuz-3.2.0-53-generic-pae.

Thank you and best regards,

Kenneth Parker, Seattle, WA

Eugene San (eugenesan) wrote :

Gathering all related bugs into this one (as most active) also de-duplicating them from "Hang bug" which obviously doesn't provide a fix and probably not related since none of my systems hangs but still report zram block errors.

Eugene San (eugenesan) on 2013-09-28
summary: - Buffer I/O error on device zram0 (possibly causing complete machine
- lockup)
+ Buffer I/O error on device zram0
Oibaf (oibaf) wrote :

See comment #47.

IMO, this bug *is* a duplicate of #1215513. The real problem is that the zram kernel module creates a block device with a bad sector at the end. When you try to swap on it, anything can happen, from system lockdowns to program crashes to nothing (e.g., if you have so much RAM you never get to use the last sector of /dev/zram0).

I posted detailed findings at #55 on the other bug's page.

Kenneth (#51): 'mkswap' run on the entire block device /dev/zram0 DOES NOT create a visible swap partition for you to see with 'fdisk -l'. The fact that you don't see a swap partition on it DOES NOT mean /dev/zram0 is not being swapped on. Check this with 'swapon -s' (and disable it if it's there!) To see this, and also the bad 4KB sector, try this (first swapoff and rmmod):

$ sudo modprobe zram
$ echo '1024^3' | bc | sudo tee /sys/block/zram0/disksize
1073741824
$ sudo mkswap -c /dev/zram0
one bad page
Setting up swapspace version 1, size = 1048568 KiB
no label, UUID=487d15c0-07ce-46d1-b632-492611e2a13a
$ sudo mkswap /dev/zram0
Setting up swapspace version 1, size = 1048572 KiB
no label, UUID=9fa2c2df-d69c-4282-87ad-7c3219e50832
$ sudo fdisk -l /dev/zram0
Note: sector size is 4096 (not 512)

Disk /dev/zram0: 1073 MB, 1073741824 bytes
255 heads, 63 sectors/track, 16 cylinders, total 262144 sectors
Units = sectors of 1 * 4096 = 4096 bytes
Sector size (logical/physical): 4096 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Disk /dev/zram0 doesn't contain a valid partition table

Been a long time since I looked at the relevant bits of the kernel
source, but errors when _writing_ to swap devices should be quite
manageble, I would have thought - simply mark the block/page as bad and
go looking for another free location. It'll be _read_ errors that cause
problems!

Some info here, though no discussion of error handling that I noticed.

https://www.kernel.org/doc/gorman/html/understand/understand014.html

I've had no crashes since installing the fix that went into -proposed
and then was released. Still getting the I/O errors though.

On Wed, Oct 02, 2013 at 02:37:33AM -0000, Mel Dee wrote:
> IMO, this bug *is* a duplicate of #1215513. The real problem is that the
> zram kernel module creates a block device with a bad sector at the end.
> When you try to swap on it, anything can happen, from system lockdowns
> to program crashes to nothing (e.g., if you have so much RAM you never
> get to use the last sector of /dev/zram0).
[..]

Kenneth Parker (sea7kenp) wrote :

Mel, the reason I do NOT believe this is a duplicate of # 1215513 is that many get error messages (including I/O errors in some cases) WITHOUT a hang. But any time I see I/O error on any disk, whether "hardware, ram or simulated [i.e. network]", there's a chance of data corruption, even without crashes. Please reconsider.

Steve, you also mention I/O error messages. Once again, that could corrupt data, the WORST TYPE of error, in my NOT SO humble opinion. (Think about it: If SWAP data comes back in, and your "bank balance" changes from 5000 to 500,000, you might THINK you have more in your account than you do, causing "mishaps" in International Finance. [Example used, due to how badly your life could be hurt].

My personal solution is to come up with ONLY a Root Text console [yes, Ubuntu users aren't SUPPOSED to become "real Root", but I've been doing this sort of thing since the 1990's], "rmmod zram" [with prejudice, if possible], and I'm prepared to do a "permanent rename" of the correct name of the module, as soon as I "get around" to it. [Visions of Round Tuits...]

A point I made earlier: Even if a SWAP partition is NOT made out of /dev/zram0, part of RAM is being "reserved for nothing".

Thank you and best regards,

Kenneth Parker, Seattle, WA

Steve Dodd (anarchetic) wrote :

Hence why I said I/O errors on *write*. If the block dev layer reports
an error to the VM on page-out, it knows the data hasn't been correctly
written.

I'm not sure zram actually allocates the whole amount of memory it's
configured to use, initially:

steved@xubuntu:~$ free
             total used free shared buffers cached
Mem: 2050796 1473456 577340 0 254232 696144
-/+ buffers/cache: 523080 1527716
Swap: 2975088 20848 2954240
steved@xubuntu:~$ swapon -s
Filename Type Size Used Priority
/dev/zram0 partition 1025396 20848 100
/dev/mapper/lvg2-swap partition 1949692 0 -1

At the very least, free seems to know it's cache memory. Don't know
whether that's merely a presentation thing, need to look at the code.

Anyway, the SNR here is getting awful, and I don't want to contribute to
it. Anyone who wants to can contact me privately via LP.

S.

On Fri, Oct 04, 2013 at 03:28:18PM -0000, Kenneth Parker wrote:
> Mel, the reason I do NOT believe this is a duplicate of # 1215513 is
> that many get error messages (including I/O errors in some cases)
> WITHOUT a hang. But any time I see I/O error on any disk, whether
> "hardware, ram or simulated [i.e. network]", there's a chance of data
> corruption, even without crashes. Please reconsider.
>
> Steve, you also mention I/O error messages. Once again, that could
> corrupt data, the WORST TYPE of error, in my NOT SO humble opinion.
> (Think about it: If SWAP data comes back in, and your "bank balance"
> changes from 5000 to 500,000, you might THINK you have more in your
> account than you do, causing "mishaps" in International Finance.
> [Example used, due to how badly your life could be hurt].
>
> My personal solution is to come up with ONLY a Root Text console [yes,
> Ubuntu users aren't SUPPOSED to become "real Root", but I've been doing
> this sort of thing since the 1990's], "rmmod zram" [with prejudice, if
> possible], and I'm prepared to do a "permanent rename" of the correct
> name of the module, as soon as I "get around" to it. [Visions of Round
> Tuits...]
>
> A point I made earlier: Even if a SWAP partition is NOT made out of
> /dev/zram0, part of RAM is being "reserved for nothing".
>
> Thank you and best regards,
>
> Kenneth Parker, Seattle, WA
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1217189
>
> Title:
> Buffer I/O error on device zram0
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1217189/+subscriptions

Oibaf (oibaf) wrote :

There is a new raring kernel (3.8.0-32) with a zram fix (see bug 1233227). With it the "Buffer I/O error on device zram0" warning is no longer shown.

Oibaf (oibaf) wrote :

After some days with the updated 3.8 kernel I never see this issue anymore.

Upstream 3.2.52 reverted a patch already reverted on 3.2.0-54: https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.2.52

I'll close this if no one experience other problems.

Oibaf (oibaf) on 2013-10-29
Changed in linux-lts-raring (Ubuntu):
status: Confirmed → Fix Released
Changed in linux (Ubuntu):
status: Confirmed → Fix Released

@oibaf I've just installed the newest updates and now have kernel 3.2.0-56-generic-pae (12.04 32-bit), but the error message still appears. I haven't used this computer recently, so it is hard to tell if there is a problem with lock-ups. But, as I said before, after updating to 3.2.0-54 I didn't experience lockups, so probably this part of the problem is fixed.

Oibaf (oibaf) wrote :

You may want to reopen "linux (Ubuntu)" if it's not fixed for you. I tried myself on a friend machine 3.2.0-55 and the message didn't show up.

@oibaf I'm afraid I don't have necessary permissions to change the status. Could you do it for me?

Oibaf (oibaf) wrote :

For some reason I also cannot do it. Maybe you could open a new bug?

Oibaf (oibaf) wrote :

@Damian: is this a regression of -56? Do -55 works fine? If so you may want to report it here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1242901

@oibaf I don't know, because I didn't use this computer for a while and upgraded directly from -54 to -56.

I reported the bug here: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1246664

Oibaf (oibaf) wrote :

Made this a dup of bug 1246664 since there is some progress there.

kalehrl (kalehrl) wrote :

I dist-upgraded mu ubuntu 12.04 LTS this morning and after reboot I got:
[ 14.666280] zram: module is from the staging directory, the quality is unknown, you have been warned.
[ 14.667040] zram: Creating 1 devices ...
[ 14.830355] Buffer I/O error on device zram0, logical block 62411
[ 14.830375] Buffer I/O error on device zram0, logical block 62411
[ 14.830598] Buffer I/O error on device zram0, logical block 62411
[ 14.830621] Buffer I/O error on device zram0, logical block 62411
[ 14.878597] Adding 249644k swap on /dev/zram0. Priority:5 extents:1 across:249644k SS
I will see if this is just cosmetics or if it will cause problems.

JmAbuDabi (dambldor91) wrote :

[ 2.440692] zram: module is from the staging directory, the quality is unknown, you have been warned.
[ 2.441048] zram: num_devices not specified. Using default: 1
[ 2.441050] zram: Creating 1 devices ...
[ 2.495601] Adding 3056956k swap on /dev/zram0. Priority:100 extents:1 across:3056956k SS
[ 19.903691] Buffer I/O error on device zram0, logical block 764239

Linux jmabudabi-pc 3.2.0-55-generic #85-Ubuntu SMP Wed Oct 2 12:29:27 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers