iostat: Cannot open /proc/stat: Cannot allocate memory

Bug #1226172 reported by Ron
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

I receive an intermittent error using iostat -k 5 : "Cannot open /proc/stat: Cannot allocate memory"

This can also be simply reproduced by entering: "cat /proc/stat", one to several (~5-10) times.

strace on the "cat /proc/stat" method:
...exclude typical run-up...

fstat(3, {st_mode=S_IFREG|0644, st_size=1607664, ...}) = 0
mmap(NULL, 1607664, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fd755b77000
close(3) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 5), ...}) = 0
open("/proc/stat", O_RDONLY) = -1 ENOMEM (Cannot allocate memory)
write(2, "cat: ", 5cat: ) = 5
write(2, "/proc/stat", 10/proc/stat) = 10
open("/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2570, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fd755d09000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2570
read(3, "", 4096) = 0
close(3) = 0
munmap(0x7fd755d09000, 4096) = 0
open("/usr/share/locale/en_US.UTF-8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/en_US.utf8/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)

...exclude typical throw-error via LC_MESSAGES/libc.mo...

/proc/meminfo detail:
# cat /proc/meminfo
MemTotal: 32947584 kB
MemFree: 302664 kB
Buffers: 154064 kB
Cached: 21284268 kB
SwapCached: 1728840 kB
Active: 12576792 kB
Inactive: 17279896 kB
Active(anon): 5097984 kB
Inactive(anon): 3320844 kB
Active(file): 7478808 kB
Inactive(file): 13959052 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 187617276 kB
SwapFree: 185330320 kB
Dirty: 148 kB
Writeback: 0 kB
AnonPages: 7048592 kB
Mapped: 104312 kB
Shmem: 604 kB
Slab: 1947068 kB
SReclaimable: 1523288 kB
SUnreclaim: 423780 kB
KernelStack: 8800 kB
PageTables: 38260 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 204091068 kB
Committed_AS: 15333712 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 163292 kB
VmallocChunk: 34359457992 kB
HardwareCorrupted: 0 kB
AnonHugePages: 38912 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 333824 kB
DirectMap2M: 14313472 kB
DirectMap1G: 18874368 kB

This only seems to happens on specific hardware: Intel Server System P4000CP / S2600CP Motherboard, booted with kernel option: "pci=conf1"

This does NOT happen on very similar hardware: Intel Server R2312GZ / S2600GZ Motherboard

Is there a possible memory, or kernel memory sysctl configurable I'm missing here?
---
AlsaDevices:
 total 0
 crw-rw---T 1 root audio 116, 1 Aug 23 07:12 seq
 crw-rw---T 1 root audio 116, 33 Aug 23 07:12 timer
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.9.2-0ubuntu8.3
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 13.04
HibernationDevice: RESUME=UUID=98374fad-7836-4390-b311-614da0ca473c
InstallationDate: Installed on 2013-02-17 (210 days ago)
InstallationMedia: Ubuntu-Server 12.10 "Quantal Quetzal" - Release amd64 (20121017.2)
Lsusb:
 Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 002 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 003: ID 046b:ff10 American Megatrends, Inc. Virtual Keyboard and Mouse
MachineType: Intel Corporation S2600CP ..........
MarkForUpload: True
Package: linux 3.8.0.30.48
PackageArchitecture: amd64
PciMultimedia:

ProcEnviron:
 TERM=screen
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB:

ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.8.0-30-generic root=/dev/mapper/h01vg-root ro pci=conf1 intel_iommu=on
ProcVersionSignature: Ubuntu 3.8.0-30.44-generic 3.8.13.6
RelatedPackageVersions:
 linux-restricted-modules-3.8.0-30-generic N/A
 linux-backports-modules-3.8.0-30-generic N/A
 linux-firmware 1.106
RfKill: Error: [Errno 2] No such file or directory
Tags: raring
Uname: Linux 3.8.0-30-generic x86_64
UpgradeStatus: Upgraded to raring on 2013-05-07 (132 days ago)
UserGroups:

dmi.bios.date: 02/26/2013
dmi.bios.vendor: Intel Corp.
dmi.bios.version: SE5C600.86B.01.08.0003.022620131521
dmi.board.asset.tag: ....................
dmi.board.name: S2600CP
dmi.board.vendor: Intel Corporation
dmi.board.version: E99552-507
dmi.chassis.asset.tag: ....................
dmi.chassis.type: 17
dmi.chassis.vendor: ..............................
dmi.chassis.version: ..................
dmi.modalias: dmi:bvnIntelCorp.:bvrSE5C600.86B.01.08.0003.022620131521:bd02/26/2013:svnIntelCorporation:pnS2600CP..........:pvr....................:rvnIntelCorporation:rnS2600CP:rvrE99552-507:cvn..............................:ct17:cvr..................:
dmi.product.name: S2600CP ..........
dmi.product.version: ....................
dmi.sys.vendor: Intel Corporation

Revision history for this message
Ron (ron-neversleep) wrote :

Wong package assigned on bug create.. I believe /proc lies within base-files.

affects: procps (Ubuntu) → base-files (Ubuntu)
Steve Langasek (vorlon)
affects: base-files (Ubuntu) → sysstat (Ubuntu)
affects: sysstat (Ubuntu) → linux (Ubuntu)
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1226172

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in linux (Ubuntu):
importance: Undecided → Medium
tags: added: kernel-da-key
Revision history for this message
Ron (ron-neversleep) wrote : BootDmesg.txt

apport information

tags: added: apport-collected raring
description: updated
Revision history for this message
Ron (ron-neversleep) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Ron (ron-neversleep) wrote : Dependencies.txt

apport information

Revision history for this message
Ron (ron-neversleep) wrote : IwConfig.txt

apport information

Revision history for this message
Ron (ron-neversleep) wrote : Lspci.txt

apport information

Revision history for this message
Ron (ron-neversleep) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Ron (ron-neversleep) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Ron (ron-neversleep) wrote : ProcModules.txt

apport information

Revision history for this message
Ron (ron-neversleep) wrote : UdevDb.txt

apport information

Revision history for this message
Ron (ron-neversleep) wrote : UdevLog.txt

apport information

Revision history for this message
Ron (ron-neversleep) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Ron (ron-neversleep) wrote :

Additional memalloc failures gathered from syslog are attached. I have limited this failure data, as there is 10MB of text crash info in the syslog.

These failures are being caused by irqbalance, snmpd, vnstatd, etc. Seemingly any application which relies upon /proc

I have included a similar crash from a 2nd system, identical hardware, and same kernel.

Revision history for this message
Ron (ron-neversleep) wrote :
penalvch (penalvch)
tags: added: bios-outdated-02.01.0002 needs-upstream-testing regression-potential
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Ron (ron-neversleep) wrote :

Yes, I'm running the most-recent BIOS. This new 02.01.0002 BIOS was released, 2013/09/20, after the initial problem report. And the /proc/stat issues persist on this new BIOS as well.

My current dmidecode output is:
SE5C600.86B.02.01.0002.082220131453 \ 08/22/2013

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
penalvch (penalvch) wrote :

Ron, could you please test the latest upstream kernel available (not the daily folder, but the one all the way at the bottom) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested. For example:
kernel-fixed-upstream-v3.12-rc7

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description. As well, please remove the tag:
needs-upstream-testing

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

As well, please remove the tag:
needs-upstream-testing

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

tags: added: latest-bios-02.01.0002
removed: bios-outdated-02.01.0002
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Ron (ron-neversleep) wrote :

Christoper,

1day 23h ago, I installed kernel 3.12.0-999 nightly dated 201311110456.

At ~28h uptime, after programs and buffers/cache utilized all 64GB RAM; the problem re-occurs.

And as typical, this is now very persistent with backtraces against snmpd, and can also be quickly triggered via console using /proc/stat calls.

tags: added: kernel-bug-exists-upstream-3.12.0-999.201311110456
removed: needs-upstream-testing
Revision history for this message
Ron (ron-neversleep) wrote :

Correction, THIS server has 32GB, not 64GB as stated in previous post.

Revision history for this message
penalvch (penalvch) wrote :

Ron, did this problem not occur in a release prior to Raring?

As well, in mainline testing, please ensure you are not testing the daily folder (nightly as you called it), but the one at the bottom of the list, as outlined in https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1226172/comments/18 .

Revision history for this message
Ron (ron-neversleep) wrote :

I believe it did occur in Quantal, but I am not 100% certain. These servers had Quantal for a very shot time (~1 day), because we required kernel features present only in Raring+, for use with Ceph/Rados. Raring was still 3-4 months from release, but we were pulling Raring Kernels into Quantal. These servers have been fully Raring based since its official release.

I have loaded kernel linux-headers-3.12.0-031200-generic (trusty), "the one from the bottom", to ensure we're thorough.

Your original instructions were confusing. On the page you provided, there is a section titled "Mainline Kernel Mapping", and a single link "current" which points directly to that Daily kernel. The (fool-proof) link would be: http://kernel.ubuntu.com/~kernel-ppa/mainline

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Thanks for the feedback, Ron. I updated that wiki page.

Revision history for this message
penalvch (penalvch) wrote :

Ron, during some downtime, or maintenance window, would you have a spare server to regression test for this in Lucid via http://releases.ubuntu.com/lucid/ ?

tags: added: kernel-bug-exists-upstream-v3.12
removed: kernel-bug-exists-upstream-3.12.0-999.201311110456
Revision history for this message
Ron (ron-neversleep) wrote :

Hi Christopher, I've been trying to find time all week. I may not have time until the first week of December. The upcoming holiday has stolen two days.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.