System locks up, requires hard reset

Bug #1215513 reported by Federico Tello Gentile on 2013-08-22
74
This bug affects 12 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Precise
Undecided
Unassigned
Quantal
Undecided
Unassigned
Raring
Undecided
Unassigned
Saucy
High
Unassigned

Bug Description

Since I started using 3.8.0-29 I experienced system lock ups at random times. First one was while running Virtualbox (windows guest) but later have been without virtualbox running for example while shutting down or while reading a webpage.

I could not find any relevant log, but I'd post any log file considered useful.

This has hapened in 2 different machines (the one im reporting from is intel with intel graphics and the other is AMD with radeon graphics). Both 64 bits.

Up until 3.8.0-27 inclusive I had not had a lockup in years in any of those machines. I have been running Ubuntu 64 bit since at least 2008 in one of them.

ProblemType: Bug
DistroRelease: Ubuntu 13.04
Package: linux-image-3.8.0-29-generic 3.8.0-29.42
ProcVersionSignature: Ubuntu 3.8.0-29.42-generic 3.8.13.5
Uname: Linux 3.8.0-29-generic x86_64
ApportVersion: 2.9.2-0ubuntu8.3
Architecture: amd64
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: fede 1829 F.... pulseaudio
CRDA: Error: command ['iw', 'reg', 'get'] failed with exit code 1: nl80211 not found.
CurrentDmesg:

Date: Thu Aug 22 12:49:35 2013
HibernationDevice: RESUME=UUID=ee5b2280-eed3-41e2-8675-103fbdd703d1
InstallationDate: Installed on 2012-12-03 (261 days ago)
InstallationMedia: Ubuntu 12.10 "Quantal Quetzal" - Release amd64 (20121017.5)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 003 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 004 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
 Bus 005 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MarkForUpload: True
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.8.0-29-generic root=UUID=0d34a8a5-2002-47e8-9128-a35623612ec1 ro quiet splash vt.handoff=7
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-29-generic N/A
 linux-backports-modules-3.8.0-29-generic N/A
 linux-firmware 1.106
RfKill:

SourcePackage: linux
StagingDrivers: zram
UpgradeStatus: Upgraded to raring on 2013-04-25 (118 days ago)
dmi.bios.date: 09/11/2006
dmi.bios.vendor: Intel Corp.
dmi.bios.version: TS94610J.86A.0047.2006.0911.0110
dmi.board.asset.tag: Base Board Asset Tag
dmi.board.name: D946GZIS
dmi.board.vendor: Intel Corporation
dmi.board.version: AAD66165-501
dmi.chassis.type: 2
dmi.modalias: dmi:bvnIntelCorp.:bvrTS94610J.86A.0047.2006.0911.0110:bd09/11/2006:svn:pn:pvr:rvnIntelCorporation:rnD946GZIS:rvrAAD66165-501:cvn:ct2:cvr:

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Joseph Salisbury (jsalisbury) wrote :

The 3.8.0-30.43 kernel is now available in the -proposed repository. Would it be possible for you to test this latest kernel and post back if it resolves this bug?

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed.

Thank you in advance!

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: regression-update
tags: added: performing-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete

Got another lock up on 3.8.0-29.
Now I'm running 3.8.0-30-generic #43-Ubuntu SMP Wed Aug 21 21:07:22 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux.
I'll report back.

It locked again right after hitting enter in the shutdown dialog. The screen stayed freezed for at least 20 minutes until i reset it.

I think zram is the cause. I had zram in both machines. I'm now running 3.8.0-29 without zram. No issues so far.

After first day working without zram, no lock ups. Still running 3.8.0-29.

dac922 (dac922) wrote :

I have the same bug with two computers, both running with zram. 3.8.0-27 works fine, but 3.8.0-29 freezes from time to time (especially on high RAM usage).

When doing "service zram-config stop", I see no further freezes. Also, I found a patch in linux upstream that could be related to this issue:
https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=94bcc7deb8fed472360ad72ef3717c5409057fca

dac922 (dac922) wrote :

Just tried with rebuilding the zram driver with the upstream patch. Either I have do something wrong, or the patch doesn't help. My second try was to copy the unmodified zram driver from 3.8.0-27 to 3.8.0-29. This seems to work.

dac922 (dac922) wrote :

Second try: I rebuild the zram driver with upstream patch and reverting this patch:
http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=commitdiff;h=f56c0e44628257f97063089eb865d5eb2dfdd642

No freeze since a hour with heavy RAM usage.

Luis Henriques (henrix) wrote :

dac922, thanks a lot for your analysis. I've uploaded a test kernel here:

 http://people.canonical.com/~henrix/lp1215513/

(there are 2 directories, one for 32bits and another for 64bits)

Could the original bug reporter please also test it and report back?

This is a Raring kernel reverting commit referred in comment #11 (upstream commit 57ab048 "zram: use zram->lock to protect zram_free_page() in swap free notify path"). This commit actually introduces a regression that is fixed upstream but its backport to Raring seems to require a few additional commits.

Thanks.

Tim Gardner (timg-tpi) on 2013-09-10
Changed in linux (Ubuntu Precise):
status: New → Fix Committed
Changed in linux (Ubuntu Quantal):
status: New → Fix Committed
Changed in linux (Ubuntu Raring):
status: New → Fix Committed
Changed in linux (Ubuntu Saucy):
status: Incomplete → Fix Released
Oibaf (oibaf) wrote :

According to LKML discussion this was a regression of "zram: avoid access beyond the zram device", not "zram: use zram->lock to protect zram_free_page() in swap free notify path", reverted in Ubuntu 3.2.0-54.82:
https://lkml.org/lkml/2013/8/13/270

Also the fix applied in other trees was this:
https://lkml.org/lkml/2013/8/12/399
and was confirmed working in LKML.

Oibaf (oibaf) wrote :
dac922 (dac922) wrote :

oibaf, please see my comment #9 and #11. The regression you mentioned is an additional one and was not the root cause for this bug report.

Oibaf (oibaf) wrote :

Indeed my problem was not the lock ups but just the warnings. But if they are different problems bug #1217189 should not be a dup of this one.

Luis Henriques (henrix) wrote :

Oibaf, dac922: thank you for your comments. Since bug #1217189 also refers to "possibly causing complete machine lockup" (see title) I decided to set it as a dup.

Anyway, I was actually convinced the fix pointed in comment #13 was already in Raring but looks like I was wrong -- its in Quantal kernel only. I'll make sure it hits the 3.8.y stable kernel which will then be picked into Raring.

Oibaf (oibaf) wrote :

Thanks, I was experiencing the problem on precise 3.2.0-54.82 where it should also be applied. zram changes (including the one triggering the problem) got backported to 3.2 series in 3.2.49: https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.2.49

This fix should eventully be applied upstream in 3.2.x as well as it was applied in 3.10.7 (where the bug commit appeared in 3.10.6): https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.10.7 .

Oibaf (oibaf) wrote :

I sent a mail to Greg Kroah-Hartman asking to apply it in 3.2 series also.

Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-precise-needed' to 'verification-precise-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-precise
tags: added: verification-needed-quantal
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-quantal-needed' to 'verification-quantal-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-raring
Brad Figg (brad-figg) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-raring-needed' to 'verification-raring-done'.

If verification is not done by one week from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Dan Muresan (danmbox) wrote :

A week seems like an extremely short time, especially for ALL releases.

Dan Muresan (danmbox) wrote :

Since several bugs with much more details were merged into this one, the most likely early warning of zram problems are messages like

Buffer I/O error on device zram0, logical block 257912 [repeated X times]

You get those right after modprobing / formatting zram. If you don't see those with the new kernel, either things are in good shape or the bugfix didn't work.

Steve Dodd (anarchetic) wrote :

I've lost track of which bugs cause what here, but *not* looking good on precise so far:

steved@xubuntu:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.04.3 LTS
Release: 12.04
Codename: precise
steved@xubuntu:~$ uname -a
Linux xubuntu 3.2.0-54-generic-pae #82-Ubuntu SMP Tue Sep 10 20:29:22 UTC 2013 i686 i686 i386 GNU/Linux
steved@xubuntu:~$ dlocate -s linux-image-3.2.0-54-generic-pae | grep Version
Version: 3.2.0-54.82
steved@xubuntu:~$ dmesg | grep zram
[ 1.557412] zram: module is from the staging directory, the quality is unknown, you have been warned.
[ 1.557638] zram: num_devices not specified. Using default: 1
[ 1.557640] zram: Creating 1 devices ...
[ 1.659040] Adding 1025396k swap on /dev/zram0. Priority:100 extents:1 across:1025396k SS
[ 20.719906] Buffer I/O error on device zram0, logical block 256349
[ 20.719913] Buffer I/O error on device zram0, logical block 256349
[ 20.719963] Buffer I/O error on device zram0, logical block 256349
[ 20.719968] Buffer I/O error on device zram0, logical block 256349
[ 20.719973] Buffer I/O error on device zram0, logical block 256349
[ 20.719977] Buffer I/O error on device zram0, logical block 256349
[ 20.719981] Buffer I/O error on device zram0, logical block 256349
[ 20.720038] Buffer I/O error on device zram0, logical block 256349
[ 20.720043] Buffer I/O error on device zram0, logical block 256349
[ 20.720053] Buffer I/O error on device zram0, logical block 256349

I also see that log messages on raring 64 bits, but no locks so far since yesterday.

Sep 17 10:24:22 fede-desktop kernel: [ 24.899285] zram: module is from the staging directory, the quality is unknown, you have been warned.
Sep 17 10:24:22 fede-desktop kernel: [ 24.899922] zram: Creating 2 devices ...
Sep 17 10:24:22 fede-desktop kernel: [ 25.185351] quiet_error: 42 callbacks suppressed
Sep 17 10:24:22 fede-desktop kernel: [ 25.185358] Buffer I/O error on device zram0, logical block 192016
Sep 17 10:24:22 fede-desktop kernel: [ 25.185364] Buffer I/O error on device zram0, logical block 192016
Sep 17 10:24:22 fede-desktop kernel: [ 25.185448] Buffer I/O error on device zram0, logical block 192016
Sep 17 10:24:22 fede-desktop kernel: [ 25.185454] Buffer I/O error on device zram0, logical block 192016
Sep 17 10:24:22 fede-desktop kernel: [ 25.185460] Buffer I/O error on device zram0, logical block 192016
Sep 17 10:24:22 fede-desktop kernel: [ 25.185466] Buffer I/O error on device zram0, logical block 192016
Sep 17 10:24:22 fede-desktop kernel: [ 25.185472] Buffer I/O error on device zram0, logical block 192016
Sep 17 10:24:22 fede-desktop kernel: [ 25.185556] Buffer I/O error on device zram0, logical block 192016
Sep 17 10:24:22 fede-desktop kernel: [ 25.185563] Buffer I/O error on device zram0, logical block 192016
Sep 17 10:24:22 fede-desktop kernel: [ 25.185576] Buffer I/O error on device zram0, logical block 192016
Sep 17 10:24:23 fede-desktop kernel: [ 25.237044] Adding 768064k swap on /dev/zram0. Priority:5 extents:1 across:768064k SS
Sep 17 10:24:23 fede-desktop kernel: [ 25.247694] Adding 768064k swap on /dev/zram1. Priority:5 extents:1 across:768064k SS

Kenneth Parker (sea7kenp) wrote :

I'm running Ubuntu Server 12.0.4, rarely upgraded, due to sensitive work. I did apt-get upgrade (plus direct "apt-get install" of kernel packages, that are held back on "upgrade").

I get the message "buffer i/o error on device zram0, logical block 314223.

Due to this error, I'm leaving it in text mode, while diagnosing the problem. On a whim, I did "fdisk /dev/ram0" under root, and then "p", showing total 314224 sectors, one higher than the number in the error message.

Since I'm not running anything yet, I haven't encountered the System Lockup.

Could I go to the scripts in the ramdisk portion of the bootup and find the one that sets up zram0 and edit it? Perhaps put a lower number in for size of ramdisk, so it doesn't hit Logical Block 314223?

Thank you and best regards,

Kenneth Parker, Troubleshooter

-- I look for trouble and shoot it!

Kenneth Parker (sea7kenp) wrote :

Thank you to dac922 for the suggestion "service zram-config stop". I believe this will be a great interim work around for me, allowing me to use FVWM (which I like, due to the 9 desktops).

I still want to know about what script to edit, to lower the size of zram0, so that it doesn't try to allocate 314224 sectors.

Thank you and best regards,

Kenneth Parker, Seattle, WA

-- I'm a Troubleshooter. I look for Trouble, and Shoot it!

Kenneth Parker (sea7kenp) wrote :

When entering "service zram-config stop", ubuntu said it couldn't find zram-config.

I entered "apt-get install zram-config" under root. As part of its install, it tried to run itself and got the above error messages (10 of the "buffer i/o error on device zram0, logical block 314223") and then gracefully admitted defeat. "fdisk /dev/zram0" stated that /dev/zram0 doesn't exist.

Work-around successful. But it looks like zram-config isn't a default package, at least on Ubuntu 12.0.4 Server. Should it be?

Thank you and best regards,

Kenneth Parker, Seattle, WA

Kenneth Parker (sea7kenp) wrote :

Looks like a calculation error in "/etc/init/zram-config.conf". How do I find an earlier version of this (i.e. in kernel *.*.27 or before)?

Note that the /etc file will probably only be there if you apt-get install zram-config.

To the person who released a fix: Is a fix to /etc/init/zram-config.conf part of that fix? If so, could the text of that file be posted here (or emailed to me at <email address hidden> if "public disclosure" is a problem)? :-)

Thank you and best regards,

Kenneth Parker, Seattle, WA

dac922 (dac922) wrote :

Kenneth, please read the previous commits. The error you see is fixed upstream with this patch: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?id=75c7caf5a052ffd8db3312fa7864ee2d142890c4

Luis, Brad:
I tested the kernel 3.8.0-31.46 in raring-proposed and see no freezes (on high ram usage) anymore.
However, now I see the buffer I/O error messages too, so I'm not sure if I really should set the 'verification-raring-done' tag. IMHO it would be better to apply the patch above.

Kenneth Parker (sea7kenp) wrote :

dac922,

First off, my Ubuntu 12.0.4 LTS Server was just upgraded to Kernel vmlinuz-3.2.0-53-generic-pae. But I see in your post that you're working on kernel 3.8.0-31.46. Am I supposed to upgrade 12.0.4, running in production to kernel 3.8? Yikes!

In the last paragraph of your comment, you mention seeing the buffer I/O error messages. Shouldn't this mean that bug #1217189 should be re-opened to address that issue? (It's currently got a comment that "it's a duplicate of #1215513").

I still think /etc/init/zram-config.conf should be looked at, for a possible calculation error. (As mentioned in an earlier post by me, this file exists, if you "apt-get install zram-config" as root (or with sudo). To repeat my question, was that configuration file looked at, when you were doing the kernel patch?)

Thank you and best regards,

Kenneth Parker, Seattle, WA

P.S. I'm a Computer Consultant, specializing in Network and Security issues. I could be working on problems like this, except that, as a "general troubleshooter", I'm typically up to my eyeballs in alligators, and prefer to deal in the "big picture", unless there's a good reason for getting into "detailed troubleshooting mode", like having error messages when booting my server. Thanks!

Dan Muresan (danmbox) wrote :

Kenneth, zram-config is a red herring. Even if you purge it completely and set up zram manually,

modprobe zram
echo -n 67108864 > /sys/block/zram/size
mkswap /dev/zram0
swapon /dev/zram0

You still get the error messages and eventually the lock-ups. This is not a script problem, it's a kernel problem.

Kenneth Parker (sea7kenp) wrote :

Hello Dan,

Thank you for your response. I find it somewhat amusing, since I do not use zram0, or the zram system AT ALL on my server. I manually partition the drive, (pretty much since I first started using Linux in the early to mid 1990's), and manually configure a SWAP partition on it (using file type 82). There never was a swap partition on zram0. My concern was finding the buffer I/O errors on my console for the first time (and then seeing them repeated on my text root console (the only thing I trusted, running on what appeared to be a badly broken Server).

In my first comment, I responded to the error message by entering "fdisk /dev/ram0", followed by "p", which showed it to never being partitioned, or anything (like a SWAP partition) being defined. So it LOOKS like I was never in danger of bug #1215513, but only experiencing the messages, noted in But #1217189. I'll add a message on that Bug, saying that, some way or another, the error messages still appear after "fix" of #1215513, which can CERTAINLY alarm production Linux administrators anywhere, and cause loss of money in any case where the Server is anything but something assisting non-profit volunteer work, where they can be told they need to be patient and wait for their services. (Actually, due to an early message by dac922, I was able to disable zram services on a system that has no intention of using them).

Obviously, what happened was that the Kernel 3.8 work, IMPLEMENTING zram0 swap was "back-ported" to the 3.2 Kernel tree, somewhere <= vmlinuz-3.2.0-53-generic-pae!!! :-O

Remember, my system is Ubuntu 12.0.4 LTS Server, which, in my opinion is not in any condition for "stray zram swap" creation! Fortunately, somebody ported the package zram-config to 12.0.4, allowing me to get it, which, in effect PERMANENTLY DISABLES the zram0 processes, because they fail, during the install of zram-config, allowing "apt-get upgrade" to state that it was not successful in the "install" of this package.

So I suggest that somebody on this forum (again, I'm too busy administering Linux to normally submit fixes!) put some of this text in some "FAQ file" somewhere, IN CASE somebody ELSE with Ubuntu 12.0.4 complains about the Bug #1217189 Buffer I/O Error Message.

Forgive my dry humor in this post. As a Linux admin, I'm trained to "roll with the punches" and to consider the humor of life on Planet Earth (and, forgive me for adding) the political system in the USA!!! :-)

And this post will now be "cross posted" to Bug #1217189 with a STRONG suggestion that it NOT BE CLOSED as long as people are going to complain about the Buffer I/O Error messages.

Thank you and best regards,

Kenneth Parker, Seattle, WA

-- I'm a Troubleshooter. I look for Trouble and Shoot It! :-)

No locks after 3 days. Tagged as verification-raring-done.

tags: added: verification-raring-done
removed: verification-needed-raring
Brad Figg (brad-figg) on 2013-09-20
tags: added: verification-done-raring
removed: verification-raring-done
Dan Muresan (danmbox) wrote :

With 3.8.0-31-lowlatency, try this if you're willing to reboot:

rmmod zram; modprobe zram; echo -n 67109999 > /sys/block/zram0/disksize; mkswap /dev/zram0; dmesg -T | tail

It will bring you into a state where you can't rmmod zram (!).

# rmmod zram
ERROR: Removing 'zram': Device or resource busy

I see in dmesg:

BUG: unable to handle kernel paging request at 6364652d

I tried 67109999 (16k+) after determining that with 67108864 I still get "Buffer I/O error on device zram0" at exactly the last zram0 block.

This is a pretty serious problem. It looks like zram-related code is quite unstable. I'm not even sure if it's been fixed on the kernel mainline. A simple backport will probably not solve issues.

Dan Muresan (danmbox) wrote :

@Kenneth Parker: if your problem is that zram shows up EVEN AFTER dpkg --purge zram-config, please **file a separate bug**. Note that "service zram-config stop" stops zram after the system is up, which means it has already potentially damaged your system. What you want is for the zram module to never be modprobe'd at all. There is a blacklist solution in Bug #1217189, comment 45 that seems to work for me.

[ On an unrelated note, thank you for sharing so much relevant information about your job, your skills and so on with us. ]

Kenneth Parker (sea7kenp) wrote :

Dan,

My "issue" is with an old, but "supposedly supported" LTS release, 12.0.4 Server. SINCE it's a Server release, it "assumes" I know what I'm doing (even to only setting up in Text mode, and "letting" me only "apt-get install" what I ACTUALLY need to install, which means a working X Server and FVWM. (Be honest, folks! How many of you have run that in the last decade? LOL!)

My situation is different than yours, as I'm running a late, "security fix" for Kernel 3.2. As such, there IS NO software, to run zram0, just some sort of boot code, probably in the Kernel. So /dev/zram0 was "sort of" set up, as it was "defined", but not configured. So bug #1217189 MUST be re-opened, for the older Kernels, and because it adversely affects a LTS release!

I posted a message on 1217189, but here is a capsule summary: /dev/zram0 is set up, so I can do fdisk on it, but no partitions exist, because the bootup code is following MY instructions to only use the Disk SWAP I defined for it. HOWEVER, since the messages come up, it's defined, so the actual RAM to run the server is less. So, what I encountered, when I rebooted it (when I was able to get Stand-alone time on it), let the messages come up, bring up FVWM and some of my heavy resource programs, such as Hercules (IBM Mainframe emulator) and Turnkey MVS, which runs an older, "public domain" (even if IBM doesn't like it, they must "abide by it", because there is no copyright code, like there is in, say, MVS/ESA or z/OS, the heavily protected releases).

My experience was heavy use of the Disk SWAP partition, to the level of Thrashing. So, while these Emulators were running (but slowly), I went into Root (which I do, as a Linux Administrator, even though Ubuntu "frowns on it" and "prefers Sudo"! LOL!) And then, I brought down zram0, using instructions, earlier in this forum. GUESS WHAT? It went back to normal, with all RAM access being IN THE RAM, rather than in SWAP. This test went SO FAST, I was able to reboot again, and go back into production, with two hours to spare. [Pat on back. "At-a-boy", but no raise]. :-)

So, if zram0 is coming up, with the messages, but your systems are not freezing, you may still be getting Performance Hits, instead of only "ugly error messages".

I repeat: Please re-open #1217189, for older kernels, which are getting ZRAM kernel code, even if not trying to define the RamDisks.

Thank you and best regards,

Kenneth Parker, Seattle, WA
 -- I'm a Troubleshooter. I Look for Trouble and Shoot it.
   <Joke told to Seattle Police>Don't worry about the "Shoot it" part.
     I do not own a gun, mainly because I could not hit the
     broad side of a barn!</Joke told to Seattle Police>

Kenneth Parker (sea7kenp) wrote :

Sorry, Dan, I misunderstood one part of your response to me. I never had zram-config, until I did "apt-get install zram-config", after reading a work-around, earlier in the forum, by dac922.

zram-config did not fully configure (so is an "incomplete install" on the Apt system. But, it's a MOST convenient way to bring down zram0, on a system that has the Kernel code, but not the most convenient way, based on your post, before the one for me. In it, you mention rmmod zram. The next time I can get "standalone" on this production server (or if I upgrade my netbook that's my "test-bed" for Ubuntu 12.0.4 Desktop, and try it), I will try rmmod zram, followed by "fdisk /dev/zram0", to see if it exists any more.

Note that, even on the desktop, I'm a "control freak", such that I disabled Lightdm from coming up on boot. (And then, if I WANT the desktop, I type "start lightdm" on root). Does that show the "type" of computer nerd I am? (Coupled by the fact that, when I was happily "bit twiddling", and sharing the "fruits", it was Red Hat 6). :-)

Thanks for this idea, Dan.

Kenneth Parker, Seattle, WA

Dan Muresan (danmbox) wrote :

I've tested 3.8.0-31-generic (linux-image-3.8.0-31-generic = 3.8.0-31.46~precise1). To the admins: when asking us to test "the latest in -proposed", please tell us which package and which version. It makes it easier to interpret the record in the future, when the "latest in proposed" may be different (or when your proposed solution has been removed / replaced)

I'm using the following test now:

echo '4096 * 4096' | bc -l > /sys/block/zram0/disksize
mkfs.ext2 /dev/zram0

And I get this with dmesg -T:

[Sun Sep 22 23:05:40 2013] Buffer I/O error on device zram0, logical block 4095
[Sun Sep 22 23:05:40 2013] lost page write due to I/O error on zram0
[Sun Sep 22 23:05:40 2013] Buffer I/O error on device zram0, logical block 4095

I don't think this is a good kernel. In the terminology of https://lkml.org/lkml/2013/8/13/22, both [1] and [2] should be in the kernel, so that both bug A and bug B are fixed.

I'll check how to find the changelog for a kernel and whether the other -proposed (3.2, 3.5) kernels also have the same problem.

Dan Muresan (danmbox) wrote :

With linux-image-3.5.0-41-generic = 3.5.0-41.64~precise1 there are no Buffer I/O errors, either with mkswap or mkfs.ext2

With linux-image-3.2.0-54-generic = 3.2.0-54.82 the errors are the same as in linux-image-3.8.0-31-generic = 3.8.0-31.46~precise1

So the lts-quantal image is good, the others are not.

Brad Figg (brad-figg) on 2013-09-23
tags: added: verification-done-precise verification-done-quantal
removed: verification-needed-precise verification-needed-quantal
Dan Muresan (danmbox) wrote :

Brad, can you comment on the problems I've shown for 3.2 and 3.8 on precise -lts kernels? Do you want to push through those kernels and then have to deal with separate bugs. I know there's a lot of noise in this thread, but there were some relevant tests and errors.

kitten_geek (kitty-nice) wrote :
Download full text (3.2 KiB)

here is solution:

run mkswap with "-c"
mkswap -c /dev/zram0

(you can fix this in package zram-config, if you want :)
cat /etc/init/zram-config.conf | grep mkswap
    mkswap /dev/zram${DEVNUMBER}

you need to change this line to:
    mkswap -c /dev/zram${DEVNUMBER}

Explanation:

See:

modprobe zram zram_num_devices=1

( dmesg | grep zram ) :

[ 979.545213] zram: module is from the staging directory, the quality is unknown, you have been warned.
[ 979.550810] zram: Creating 1 devices ...

Next:

echo $((345*1024*1024)) > /sys/block/zram0/disksize

(dmesg still silent....)

Next:

mkswap /dev/zram0
Setting up swapspace version 1, size = 353276 KiB
no label, UUID=a95522e0-fb27-4dce-8006-3e3cbf9a6f75

(dmesg:)

[ 1115.120823] Buffer I/O error on device zram0, logical block 88319
[ 1115.120838] Buffer I/O error on device zram0, logical block 88319
[ 1115.121111] Buffer I/O error on device zram0, logical block 88319
[ 1115.121145] Buffer I/O error on device zram0, logical block 88319
[ 1115.121166] Buffer I/O error on device zram0, logical block 88319
[ 1115.121191] Buffer I/O error on device zram0, logical block 88319
[ 1115.121212] Buffer I/O error on device zram0, logical block 88319
[ 1115.121347] Buffer I/O error on device zram0, logical block 88319
[ 1115.121376] Buffer I/O error on device zram0, logical block 88319
[ 1115.121431] Buffer I/O error on device zram0, logical block 88319
[ 1127.081652] Buffer I/O error on device zram0, logical block 88319
[ 1127.081671] Buffer I/O error on device zram0, logical block 88319
[ 1127.081952] Buffer I/O error on device zram0, logical block 88319
[ 1127.081987] Buffer I/O error on device zram0, logical block 88319
[ 1127.082007] Buffer I/O error on device zram0, logical block 88319
[ 1127.082028] Buffer I/O error on device zram0, logical block 88319
[ 1127.082046] Buffer I/O error on device zram0, logical block 88319
[ 1127.082183] Buffer I/O error on device zram0, logical block 88319
[ 1127.082211] Buffer I/O error on device zram0, logical block 88319
[ 1127.082265] Buffer I/O error on device zram0, logical block 88319

See! "logical block 88319"! 88319 * 4096 = 361754624; error on bite number 361754624 (one block = 4096 bites)

I was create 361758720 bites (echo $((345*1024*1024)) returns 361758720;

3617858720 (memory, for zram) - 361754624 (number of bite with error ) = 4096!

Zram create device, with damaged last sector!

How to solve it?

mkswap --help:

....
 -c, --check check bad blocks before creating the swap area

mkswap -c /dev/zram0
one bad page
Setting up swapspace version 1, size = 353272 KiB
no label, UUID=bb7a0771-5f00-47b0-9238-723b8cd61b20

Yes, one bad page!

(you can see difference in size; - without "-c" option "size = 353276 KiB", with it - succesfully find "one bad page", and size is 4 KB less:
size = 353272 KiB

....
swapon /dev/zram0 -p 10

and, dmesg:

 1546.782347] Buffer I/O error on device zram0, logical block 88319
[ 1546.782369] Buffer I/O error on device zram0, logical block 88319
[ 1546.782500] Buffer I/O error on device zram0, logical block 88319

....

You can see, time(in beginning of "dmesg"): time not when you run "swapon", time = ti...

Read more...

kitten_geek (kitty-nice) wrote :

And some explanation (I use another bugreports)
Just comment here, #25 : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1215513/comments/25

...
[ 1.659040] Adding 1025396k swap on /dev/zram0. Priority:100 extents:1 across:1025396k SS
[ 20.719906] Buffer I/O error on device zram0, logical block 256349
...

Lets try to calculate:

(what a pity, I can not see bites, only kilobites! but...)

(block ) 256349 * 4 (four kilobites per page) = 1025396k (place of "bad" sector); as you can see, it is the last sector : ""[1.659040] Adding 1025396k swap on /dev/zram0"

Time of this error can be random.... When OS try to use this "last zram sector", it freezes or somethind else...

kitten_geek (kitty-nice) wrote :

here is solution:

run mkswap with "-c"
mkswap -c /dev/zram0

(you can fix this in package zram-config, if you want :)
cat /etc/init/zram-config.conf | grep mkswap
    mkswap /dev/zram${DEVNUMBER}

you need to change this line to:
    mkswap -c /dev/zram${DEVNUMBER}

Explanation:

See:

modprobe zram zram_num_devices=1

( dmesg | grep zram ) :

[ 979.545213] zram: module is from the staging directory, the quality is unknown, you have been warned.
[ 979.550810] zram: Creating 1 devices ...

Next:

echo $((345*1024*1024)) > /sys/block/zram0/disksize

(dmesg still silent....)

Next:

mkswap /dev/zram0
Setting up swapspace version 1, size = 353276 KiB
no label, UUID=a95522e0-fb27-4dce-8006-3e3cbf9a6f75

(dmesg:)

[ 1115.120823] Buffer I/O error on device zram0, logical block 88319
[ 1115.120838] Buffer I/O error on device zram0, logical block 88319
[ 1115.121111] Buffer I/O error on device zram0, logical block 88319
[ 1115.121145] Buffer I/O error on device zram0, logical block 88319
[ 1115.121166] Buffer I/O error on device zram0, logical block 88319
....

See! "logical block 88319"! 88319 * 4096 = 361754624; error on bite number 361754624 (one block = 4096 bites)

I was create 361758720 bites (echo $((345*1024*1024)) returns 361758720;

3617858720 (memory, for zram) - 361754624 (number of bite with error ) = 4096!

Zram create device, with damaged last sector!

How to solve it?

mkswap --help:

....
 -c, --check check bad blocks before creating the swap area

mkswap -c /dev/zram0
one bad page
Setting up swapspace version 1, size = 353272 KiB
no label, UUID=bb7a0771-5f00-47b0-9238-723b8cd61b20

Yes, one bad page!

(you can see difference in size; - without "-c" option "size = 353276 KiB", with it - succesfully find "one bad page", and size is 4 KB less:
size = 353272 KiB

....
swapon /dev/zram0 -p 10

and, dmesg:

 1546.782347] Buffer I/O error on device zram0, logical block 88319
[ 1546.782369] Buffer I/O error on device zram0, logical block 88319
[ 1546.782500] Buffer I/O error on device zram0, logical block 88319

....

You can see, time(in beginning of "dmesg"): time not when you run "swapon", time = time when "mkswap" check partition"; so, your "swap" is safe and stable!

Enjoy!

And some explanation (I use another bugreports)
Just comment here, #25 : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1215513/comments/25

...
[ 1.659040] Adding 1025396k swap on /dev/zram0. Priority:100 extents:1 across:1025396k SS
[ 20.719906] Buffer I/O error on device zram0, logical block 256349
...

Lets try to calculate:

(what a pity, I can not see bites, only kilobites! but...)

(block ) 256349 * 4 (four kilobites per page) = 1025396k (place of "bad" sector); as you can see, it is the last sector : ""[1.659040] Adding 1025396k swap on /dev/zram0"

Time of this error can be random.... When OS try to use this "last zram sector", it freezes or somethind else...

Kenneth Parker (sea7kenp) wrote :

I love cats and, especially Kittens, but (Kitten_Geek), could you PLEASE give an "executive summary" of your last three posts, in Command Line System Admin language (as opposed to Kernel Hacker language)? Thanks most Kindly! Kenneth Parker, Seattle, WA

Kenneth Parker (sea7kenp) wrote :

kitten_geek, would it help if I [figuratively] stroke your back? :-)

One line of a summary I recently published on # 1217189:

    > zram: There is little point creating a zram of greater than twice the size of
    > memory since we expect a 2:1 compression ratio. Note that zram uses about
    > 0.1% of the size of the disk when not in use so a huge zram is wasteful.

It seems that the issue might be in the initial "guess" at the size of the Ram Disk, right?

Thank you and best regards,

Kenneth Parker, an EXTREMELY overworked Linux 12.0.4 LTS Server Admin, here in Seattle, WA

Kenneth Parker (sea7kenp) wrote :

Sorry, but what is the line length, to avoid word wrap?

kitten_geek (kitty-nice) wrote :

I'm sorry, - my last posts duplicates; can somebody delete at least comment 45? (it will be a good idea, to make possible to edit or delete posts...)

Mr. Kennet Parker,
      > It seems that the issue might be in the initial "guess" at the size of the Ram Disk, right?
I'm not sure, I am not specialist as you....
      > "executive summary" of your last three posts, in Command Line System Admin language (as opposed to Kernel Hacker language)?

Sorry, I'm not a profi.... I try to more shortly:

last 4096 bites of /dev/zram0 seems to be broken; so, you need to check it after swapon (for example, by

# mkswap -c /dev/zram0

if you use zram-config, change in cat /etc/init/zram-config.conf line

mkswap /dev/zram${DEVNUMBER}

 to

mkswap -c /dev/zram${DEVNUMBER}

#####

But anyway, software can "eat" your memory faster than kernel swapped it from RAM to /dev/zram....

kitten_geek (kitty-nice) wrote :

Sorry again! some typo: "before swapon". And "change in /etc/init/zram-config.conf"....

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.2.0-54.82

---------------
linux (3.2.0-54.82) precise; urgency=low

  [Steve Conklin]

  * Release Tracking Bug
    - LP: #1223490

  [ Upstream Kernel Changes ]

  * Revert "zram: use zram->lock to protect zram_free_page() in swap free
    notify path"
    - LP: #1215513
  * x86 thermal: Delete power-limit-notification console messages
    - LP: #1215748
  * x86 thermal: Disable power limit notification interrupt by default
    - LP: #1215748
  * ARM: 7810/1: perf: Fix array out of bounds access in
    armpmu_map_hw_event()
    - LP: #1216442
    - CVE-2013-4254
  * ARM: 7809/1: perf: fix event validation for software group leaders
    - LP: #1216442
    - CVE-2013-4254
  * xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
    - LP: #1151527
    - CVE-2013-1819
  * cifs: don't instantiate new dentries in readdir for inodes that need to
    be revalidated immediately
    - LP: #1222442
 -- Steve Conklin <email address hidden> Tue, 10 Sep 2013 12:54:53 -0500

Changed in linux (Ubuntu Precise):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (5.7 KiB)

This bug was fixed in the package linux - 3.5.0-41.64

---------------
linux (3.5.0-41.64) quantal; urgency=low

  [Brad Figg]

  * Release Tracking Bug
    - LP: #1223451

  [ Upstream Kernel Changes ]

  * kernel-doc: bugfix - multi-line macros
    - LP: #1223920
  * Revert "zram: use zram->lock to protect zram_free_page() in swap free
    notify path"
    - LP: #1215513
  * x86 thermal: Delete power-limit-notification console messages
    - LP: #1215748
  * x86 thermal: Disable power limit notification interrupt by default
    - LP: #1215748
  * ARM: 7810/1: perf: Fix array out of bounds access in
    armpmu_map_hw_event()
    - LP: #1216442
    - CVE-2013-4254
  * ARM: 7809/1: perf: fix event validation for software group leaders
    - LP: #1216442
    - CVE-2013-4254
  * veth: reduce stat overhead
    - LP: #1201869
  * veth: extend device features
    - LP: #1201869
  * veth: avoid a NULL deref in veth_stats_one
    - LP: #1201869
  * veth: fix a NULL deref in netif_carrier_off
    - LP: #1201869
  * veth: fix NULL dereference in veth_dellink()
    - LP: #1201869
  * Bluetooth: Add support for Atheros [0cf3:3121]
    - LP: #1202477
  * efivars: explicitly calculate length of VariableName
    - LP: #1217745
  * xfs: fix _xfs_buf_find oops on blocks beyond the filesystem end
    - LP: #1151527
    - CVE-2013-1819
  * drm/i915/lvds: ditch ->prepare special case
    - LP: #1221791
  * serial: mxs: fix buffer overflow
    - LP: #1221791
  * fs/proc/task_mmu.c: fix buffer overflow in add_page_map()
    - LP: #1221791
  * af_key: initialize satype in key_notify_policy_flush()
    - LP: #1221791
  * vm: add no-mmu vm_iomap_memory() stub
    - LP: #1221791
  * iwl4965: set power mode early
    - LP: #1221791
  * iwl4965: reset firmware after rfkill off
    - LP: #1221791
  * ASoC: cs42l52: Reorder Min/Max and update to SX_TLV for Beep Volume
    - LP: #1221791
  * can: pcan_usb: fix wrong memcpy() bytes length
    - LP: #1221791
  * ALSA: 6fire: make buffers DMA-able (pcm)
    - LP: #1221791
  * ALSA: 6fire: make buffers DMA-able (midi)
    - LP: #1221791
  * jbd2: Fix use after free after error in jbd2_journal_dirty_metadata()
    - LP: #1221791
  * USB-Serial: Fix error handling of usb_wwan
    - LP: #1221791
  * USB: mos7840: fix big-endian probe
    - LP: #1221791
  * USB: adutux: fix big-endian device-type reporting
    - LP: #1221791
  * USB: ti_usb_3410_5052: fix big-endian firmware handling
    - LP: #1221791
  * m68k/atari: ARAnyM - Fix NatFeat module support
    - LP: #1221791
  * m68k: Truncate base in do_div()
    - LP: #1221791
  * usb: add two quirky touchscreen
    - LP: #1221791
  * USB: mos7720: fix broken control requests
    - LP: #1221791
  * USB: keyspan: fix null-deref at disconnect and release
    - LP: #1221791
  * MIPS: Expose missing pci_io{map,unmap} declarations
    - LP: #1221791
  * microblaze: Update microblaze defconfigs
    - LP: #1221791
  * sound: Fix make allmodconfig on MIPS
    - LP: #1221791
  * sound: Fix make allmodconfig on MIPS correctly
    - LP: #1221791
  * alpha: makefile: don't enforce small data model for kernel builds
    - LP: #1221791
  * MIPS: Rewrite pfn_valid to work in modules, too.
    - L...

Read more...

Changed in linux (Ubuntu Quantal):
status: Fix Committed → Fix Released
Launchpad Janitor (janitor) wrote :
Download full text (15.3 KiB)

This bug was fixed in the package linux - 3.8.0-31.46

---------------
linux (3.8.0-31.46) raring; urgency=low

  [ Brad Figg ]

  * Release Tracking Bug
    - LP: #1223406

  * UBUNTU: [Config] KUSER_HELPERS=y for armhf

  [ Upstream Kernel Changes ]

  * Revert "cpuidle: Quickly notice prediction failure in general case"
    - LP: #1221794
  * Revert "cpuidle: Quickly notice prediction failure for repeat mode"
    - LP: #1221794
  * Revert "zram: use zram->lock to protect zram_free_page() in swap free
    notify path"
    - LP: #1215513
  * x86 thermal: Delete power-limit-notification console messages
    - LP: #1215748
  * x86 thermal: Disable power limit notification interrupt by default
    - LP: #1215748
  * mwifiex: do not create AP and P2P interfaces upon driver loading
    - LP: #1212720
  * ARM: 7810/1: perf: Fix array out of bounds access in
    armpmu_map_hw_event()
    - LP: #1216442
    - CVE-2013-4254
  * ARM: 7809/1: perf: fix event validation for software group leaders
    - LP: #1216442
    - CVE-2013-4254
  * veth: reduce stat overhead
    - LP: #1201869
  * veth: extend device features
    - LP: #1201869
  * veth: avoid a NULL deref in veth_stats_one
    - LP: #1201869
  * veth: fix a NULL deref in netif_carrier_off
    - LP: #1201869
  * veth: fix NULL dereference in veth_dellink()
    - LP: #1201869
  * Bluetooth: Add support for Atheros [0cf3:3121]
    - LP: #1202477
  * uvcvideo: quirk PROBE_DEF for Dell SP2008WFP monitor.
    - LP: #1217957
  * usb: dwc3: gadget: don't prevent gadget from being probed if we fail
    - LP: #1221794
  * usb: dwc3: fix wrong bit mask in dwc3_event_type
    - LP: #1221794
  * ASoC: max98088 - fix element type of the register cache.
    - LP: #1221794
  * ata: Fix DVD not dectected at some platform with Wellsburg PCH
    - LP: #1221794
  * Tools: hv: KVP: Fix a bug in IPV6 subnet enumeration
    - LP: #1221794
  * ALSA: usb-audio: 6fire: return correct XRUN indication
    - LP: #1221794
  * usb: serial: cp210x: Add USB ID for Netgear Switches embedded serial
    adapter
    - LP: #1221794
  * USB: storage: Add MicroVault Flash Drive to unusual_devs
    - LP: #1221794
  * USB: misc: Add Manhattan Hi-Speed USB DVI Converter to sisusbvga
    - LP: #1221794
  * USB: option: append Petatel NP10T device to GSM modems list
    - LP: #1221794
  * usb: cp210x support SEL C662 Vendor/Device
    - LP: #1221794
  * USB: cp210x: add MMB and PI ZigBee USB Device Support
    - LP: #1221794
  * USB: EHCI: Fix resume signalling on remote wakeup
    - LP: #1221794
  * drm/radeon: fix endian issues with DP handling (v3)
    - LP: #1221794
  * drm/radeon: Another card with wrong primary dac adj
    - LP: #1221794
  * drm/radeon: improve dac adjust heuristics for legacy pdac
    - LP: #1221794
  * drm/radeon: fix combios tables on older cards
    - LP: #1221794
  * ARM: footbridge: fix overlapping PCI mappings
    - LP: #1221794
  * [SCSI] isci: Fix a race condition in the SSP task management path
    - LP: #1221794
  * [SCSI] qla2xxx: Properly set the tagging for commands.
    - LP: #1221794
  * [SCSI] sd: fix crash when UA received on DIF enabled device
    - LP: #1221794
  * nfsd: nfsd_open: when dentry_ope...

Changed in linux (Ubuntu Raring):
status: Fix Committed → Fix Released
Gsus (jeguiluz) wrote :
Download full text (18.5 KiB)

I had the same problem but I installed the 3.8.36.41 and now the computer is not freeze anymore, however my computer now is power off... yes is power off when the computer is under heavy ram usage. I disabled zram ("sudo service zram-config stop") and the computer not power off in the same scenario.

I supposed there a still bug in zram.

put the syslog for reference.

Regards

Oct 1 15:13:54 jeguiluz kernel: [ 2701.906285] Adding 6136824k swap on /dev/sdb2. Priority:-1 extents:1 across:6136824k
Oct 1 15:14:10 jeguiluz kernel: [ 2717.244280] zram: module is from the staging directory, the quality is unknown, you have been warned.
Oct 1 15:14:10 jeguiluz kernel: [ 2717.244898] zram: Creating 2 devices ...
Oct 1 15:14:10 jeguiluz kernel: [ 2717.332517] Adding 1205796k swap on /dev/zram0. Priority:5 extents:1 across:1205796k SS
Oct 1 15:14:10 jeguiluz kernel: [ 2717.332886] Buffer I/O error on device zram0, logical block 301449
Oct 1 15:14:10 jeguiluz kernel: [ 2717.332906] Buffer I/O error on device zram0, logical block 301449
Oct 1 15:14:10 jeguiluz kernel: [ 2717.333088] Buffer I/O error on device zram0, logical block 301449
Oct 1 15:14:10 jeguiluz kernel: [ 2717.333100] Buffer I/O error on device zram0, logical block 301449
Oct 1 15:14:10 jeguiluz kernel: [ 2717.333110] Buffer I/O error on device zram0, logical block 301449
Oct 1 15:14:10 jeguiluz kernel: [ 2717.333120] Buffer I/O error on device zram0, logical block 301449
Oct 1 15:14:10 jeguiluz kernel: [ 2717.333134] Buffer I/O error on device zram0, logical block 301449
Oct 1 15:14:10 jeguiluz kernel: [ 2717.333204] Buffer I/O error on device zram0, logical block 301449
Oct 1 15:14:10 jeguiluz kernel: [ 2717.333215] Buffer I/O error on device zram0, logical block 301449
Oct 1 15:14:10 jeguiluz kernel: [ 2717.333240] Buffer I/O error on device zram0, logical block 301449
Oct 1 15:14:10 jeguiluz kernel: [ 2717.349381] Adding 1205796k swap on /dev/zram1. Priority:5 extents:1 across:1205796k SS
Oct 1 15:14:39 jeguiluz kernel: [ 2746.194838] CPUM: APIC 01 at 00000000fee00000 (mapped at ffffc9001119c000) - ver 0x80050010, lint0=0x10700 lint1=0x10400 pc=0x00400 thmr=0x10000
Oct 1 15:14:39 jeguiluz kernel: [ 2746.194885] CPUM: APIC 00 at 00000000fee00000 (mapped at ffffc9001119a000) - ver 0x80050010, lint0=0x10700 lint1=0x00400 pc=0x00400 thmr=0x10000
Oct 1 15:14:40 jeguiluz kernel: [ 2747.680768] device eth0 entered promiscuous mode
Oct 1 15:17:05 jeguiluz CRON[4906]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 1 15:17:29 jeguiluz kernel: [ 2916.054517] Write-error on swap-device (251:0:2411592)
Oct 1 15:17:30 jeguiluz kernel: [ 2917.495329] Write-error on swap-device (251:0:2411592)
Oct 1 15:17:32 jeguiluz kernel: [ 2919.075216] Write-error on swap-device (251:0:2411592)
Oct 1 15:17:34 jeguiluz kernel: [ 2921.122104] Write-error on swap-device (251:0:2411592)
Oct 1 15:17:35 jeguiluz kernel: [ 2922.737142] Write-error on swap-device (251:0:2411592)
Oct 1 15:17:36 jeguiluz kernel: [ 2923.524709] Write-error on swap-device (251:0:2411592)
Oct 1 15:17:37 jeguiluz kernel: [ 2924.541654] Write-error on swap-device (251:1:2411592)
Oct 1 15:17:38 jegu...

Download full text (5.5 KiB)

Posts #51 and #53 are wrong: this bug is NOT FIXED in either 3.2.0-54.82 or 3.8.0-31.46 from the 12.04 (precise) repositories. There might be something fixed about zram in there (as suggested by the changelog), but the last sector of /dev/zram0 is still bad (as reported in #43), and this will lead to crashes if /dev/zram0 is used 'blindly' to create a swap partition.

NOTE1: Before all, if you're running one of these kernels ('uname -a', last number is shown later in the line) and if 'swapon -s' shows /dev/zram0, do immediately a 'sudo swapoff /dev/zram0'. Otherwise your machine can crash at any time.

NOTE2: Potentially all 12.04 & derivative installations that include the package lupin-casper (and have not altered the zram configuration) will experience the bug.

I performed a clean install of Mint 13, followed by a switch to the 12.04.3 stack (as explained here: https://wiki.ubuntu.com/Kernel/LTSEnablementStack). I also switched an Ubuntu 12.04.2 to 12.04.3. Here are my findings. If you only want a summary of workarounds, scroll to the bottom. Ref:
http://askubuntu.com/questions/346545/how-to-detect-which-init-script-starts-zram

FINDINGS:

1. Package versions: A Mint 13 clean install is based on 12.04.(0 or 1). Using updates only gets you up to 3.2.0-54.82. A Mint 13 switched to the 12.04.3 stack gets you up to 3.8.0-31.46. An Ubuntu 12.04.2 switched to 12.04.3 get you up to 3.8.0-31.46 as well.

2. zram is broken in both 3.2.0-54.82 and 3.8.0-31.46, as explained in #43: the last sector of /dev/zram0 is bad. However, you might not notice this because zram is not always used by default (see 3 below). To check zram is broken, first make sure it's loaded (but not for swapping!! see NOTE1 above):

  lsmod | grep zram

If it doesn't show, do this to create a 1GB zram:

  sudo modprobe zram
  echo '1024^3' | bc | sudo tee /sys/block/zram0/disksize

With zram loaded, run:

  sudo dd if=/dev/zram0 of=/dev/null bs=4096

I get:
dd: reading `/dev/zram0': Input/output error
262143+0 records in
262143+0 records out
1073737728 bytes (1.1 GB) copied, 0.400378 s, 2.7 GB/s

Compare the 262143 above with:

  sudo fdisk -l /dev/zram0 | grep -o "total [0-9]* sectors"

I get:
total 262144 sectors

3. zram usage: The zram-config package is NOT installed by default by either Mint 13 or Ubuntu 12.04. Regardless of that(!), in some configurations, zram is loaded by the initramfs image. By default, Mint 13 DOES use zram, whereas Ubuntu 12.04 DOES NOT use zram.
- check if module loaded: 'lsmod | grep zram'
- check if zram is used for swap: 'swapon -s | grep zram' (if it is, see NOTE1 above)
- check if zram-config package installed: 'dpkg -l zram-config'

4. If zram is loaded by initramfs, the zram-config package is useless. By the time /etc/init/zram-config.conf runs, the zram module is already loaded, so 'modprobe' won't do anything, and setting the size by writing to /sys/block/zram0/disksize will not work. Thus, the solution in #43/#45/#49 won't work. If you really want zram-config to take over zram allocation, you need to add:

  # remove swaps
  for dev in $(swapon -s | grep /dev/zram | cut -d ' ' -f 1); do
    swapoff $dev || exit 1
  done
  # remove module
 ...

Read more...

For 1a, also uninstall zram-config if present.

I stand corrected - maybe. I wrote a program that allocates, sets, and moves around more memory than my RAM. I added the zram0 as a swap, with the bad sector included ('mkswap' without '-c'), and with higher priority than the disk swap. I ran the program once only, because it takes half an hour when it starts moving pages around. The program itself did not crash. I saw several 'Write error on swap device' messages in /var/log/kern.log. I don't know if anything else (like, a background program) crashed.

So it's possible the kernel now detects bad pages on swap devices and does not crash when it encounters them. I don't know how to test this assertion, though. The /dev/zram0 device still contains a bad 4K block at the end, but maybe it no longer matters.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers